• Why is flex pattern-matching of NULs slow?

    From Roger L Costello@21:1/5 to All on Fri Apr 8 11:06:00 2022
    Hi Folks,

    The Flex manual says this:

    Pattern-matching of NULs is substantially slower
    than matching other characters.

    Why is that?

    /Roger
    [My recollection is that zero is used as a flag value in internal
    tables and there is some slow kludge to say that this is a nul not the
    flag, but perhaps someone who has looked at the code more recently
    will remember the details. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Sat Apr 9 21:40:45 2022
    I haven't looked at Flex in a while either, but what I remember is
    that 0 is used as end of buffer and EOF indication and that you had to
    validate against that. I don't recall whether that required an
    attempt at reading or not. It wouldn't surprise me if it were used as
    a flag also, and for a "null pointer". Depending upon how you look at
    it, C either hates 0 or loves it, but it is very often "special".

    But if you are parsing human readable ASCII text, having 0 (NUL) be an
    EOF mark is actually not a bad solution. If I recall correctly, that
    isn't even a bad choice for human readable UTF-8 (including
    non-latin-1 texts, because 2 and 3 byte sequences don't have NULs in
    them). It only becomes a pain if you want to parse binary data.

    By the way, in our lexer, we used -1, i.e. what getc used to return
    for EOF for the same condition and I don't recall how we put it in the
    buffer (or whether we even did). Being ex-PL/I and Pascal
    programmers, we used strings with lengths in many places instead of C
    strings. I don't remember whether we used Paul Abrahams clever hack
    to put the length at the end of the string which if done right also
    serves as a null byte for use as C strings.

    -- ****************************************************************************** Chris Clark email: [email protected] Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [You're right about UTF-8, where NUL is also a reasonable string terminator. UTF-8 is self-synchonizing -- the bytes of no UTF-8 code point are a prefix
    or suffix of any other code point. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)