Re: Generic helper I/O format from Alex Rousskov on 2012-07-04 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 04 Jul 2012 13:02:42 -0600

On 07/04/2012 12:02 AM, Amos Jeffries wrote:
> On 4/07/2012 4:54 p.m., Alex Rousskov wrote:
>> On 07/03/2012 08:10 PM, Amos Jeffries wrote:

>>>>> [channel-ID] (OK/ERR/BH) [key-pairs] <blob> <terminator>
>>>> How will the generic code be able to tell where key-pairs end and blob
>>>> begins?
>>> The generic code will have a format for key-pair of ( 1#token "="
>>> 1#(token / quoted-string) ) [to be finalized still] with a set of known
>>> key names. If the bytes being parsed do not match that format and keys
>>> its not part of the key-pairs.
>> There are two problems with this approach, IMO:
>>
>> * It cannot handle a body that just happened to start with supported
>> key-value pair characters.
>
> This is shared with any body which just happens to start with the "BS"
> token as well.

No, it is not. BS is required if the body is present and BS is not a
valid key name. Thus, BS cannot be confused with a start of a key-value
pair _and_ if a body starts with BS as well, there is no problem because
we already know to expect a body there (after BS).

[ but I am glad you acknowledge this issue with the BS-less approach ]

>> I think we should keep it as simple as possible without causing problems
>> with existing parsers and leaving some room for future extensions:
>>
>> * key/name: alphanumeric char followed by anything except (whitespace or
>> '=' or terminator)
>>
>> * value: either quoted-string or anything except (whitespace or
>> terminator); could be empty
>>
>> The parser strips whitespace and strips quotes around quoted strings.
>> Quoted strings use backslash to escape any octet.

> Okay. So modelling on the ssl_crtd parser instead of external_acl_type
> parser.

I am trying to model this on the above "as simple as ..." criteria. I do
not really remember the specifics of any particular parser we have (so
adjustments may be necessary).

>>>> If you want one format for all, you probably need something like
>>>>
>>>> [channel-ID] (OK/ERR/BH) [key-pairs] [BS <body>] <terminator>
>>>>
>>>> where "BS" is some special marker that starts all bodies and cannot be
>>>> found in key-pairs. Since any body-carrying message is likely to have
>>>> new lines (and, hence, would need a non-newline terminator), this BS
>>>> sequence could be something like "body:\n", I guess.
>>> We REQUIRE a format that does not rely on new special markers like that
>>> in order to cope with the existing helpers responses in a
>>> backward-compatible way without additional user configuration.
>> When your parser sees "n=v " after ERR and "n" is a known key name, how
>> would it distinguish these two possibilities:
>>
>> * This is a start of a <blob>
>> * This is a key=value pair
>
> The second. *IF* it was in the [key-pair] area directly after ERR.

And how do you decide that IF? It is exactly the same question I asked,
just phrased differently. The BS-less approach does not work (as you
have already acknowledged in the beginning of this email): The end of
"[key-pair] area" is indistinguishable from the beginning of a <blob> in
this case. In other words, you do not know whether the <blob> has
started or not. That is exactly the problem BS is solving (because it is
not a valid key).

>>> Or would you advocate moving the external_acl_type "protocol=N.N" config
>>> option into the helper child object for use by all helpers on
>>> determining what key-pairs and format is accepted?
>> I am probably missing some important use cases, but my generic parser
>> would just call a virtual method for each key=value pair found (with no
>> pre-registration of supported keys) and another virtual method for the
>> <body> blob, if any. It will parse everything using the key, value, and
>> BS formats discussed above.
>
> You mean function pointers right? not virtual methods in the HelperReply
> object?

I did mean virtual methods. The child parsers will form HelperReply's
kid objects by converting key-value pairs and other parsed strings into
helper-specific replies. Those parsers may be integrated with
HelperReply hierarchy or be separate from it. If they are integrated, it
would look like this:

    class HelperReply {
    ...
         virtual void setResult(string);
         virtual void setKeyValue(string, string);
         virtual void setBody(MemBuf);
    };
    ...

    // a sketch of a child implementation of setKeyValue()
    Ssl::CrtdReply::setKeyValue(string k, string v)
    {
         if (k == "foo")
            ... this->foo_ = StringToInt(v);
         else
            HelperReply::parseKeyValue(k,v);
    }

> and um, this is a high-use code path - both in parsing and in
> retrieving the details out of HelperReply.

Agreed. I cannot think of a significantly more efficient but still
general approach that results in helper-specific ready-to-use replies.

> That style of generic parser is good for new formats but not so much for
> backward-compatibility from old formats. I am going to have to manually
> receive things like the TT/LD/AF/NA NTLM result codes and map the
> existing format into record of the HelperReply object.

Why not have NtmlHelperReply::setResult() do the mapping? Is there
something format-incompatible with those NTLM result codes that the
generic parser cannot parse them using the following format?

[channel-ID] <result> [key-pairs] [<BS> <blob>] <terminator>

where result is a anything except whitespace, BS, or terminator.

Cheers,

Alex.
Received on Wed Jul 04 2012 - 19:02:47 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 05 2012 - 12:00:03 MDT