Forum: >>> Magnum BBS <<<

Why is flex pattern-matching of NULs slow?

From Roger L Costello@21:1/5 to All on Fri Apr 8 11:06:00 2022

Hi Folks,

The Flex manual says this:

Pattern-matching of NULs is substantially slower
than matching other characters.

Why is that?

/Roger
[My recollection is that zero is used as a flag value in internal
tables and there is some slow kludge to say that this is a nul not the
flag, but perhaps someone who has looked at the code more recently
will remember the details. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Sat Apr 9 21:40:45 2022

I haven't looked at Flex in a while either, but what I remember is
that 0 is used as end of buffer and EOF indication and that you had to
validate against that. I don't recall whether that required an
attempt at reading or not. It wouldn't surprise me if it were used as
a flag also, and for a "null pointer". Depending upon how you look at
it, C either hates 0 or loves it, but it is very often "special".

But if you are parsing human readable ASCII text, having 0 (NUL) be an
EOF mark is actually not a bad solution. If I recall correctly, that
isn't even a bad choice for human readable UTF-8 (including
non-latin-1 texts, because 2 and 3 byte sequences don't have NULs in
them). It only becomes a pain if you want to parse binary data.

By the way, in our lexer, we used -1, i.e. what getc used to return
for EOF for the same condition and I don't recall how we put it in the
buffer (or whether we even did). Being ex-PL/I and Pascal
programmers, we used strings with lengths in many places instead of C
strings. I don't remember whether we used Paul Abrahams clever hack
to put the length at the end of the string which if done right also
serves as a null byte for use as C strings.

-- ****************************************************************************** Chris Clark email: [email protected] Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [You're right about UTF-8, where NUL is also a reasonable string terminator. UTF-8 is self-synchonizing -- the bytes of no UTF-8 code point are a prefix
or suffix of any other code point. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Fri Jul 31 19:41:16 2026
  from Madison, Nc via Telnet
- Rixter
  Fri Jul 31 19:29:50 2026
  from Madison, Nc via Telnet
- Rixter
  Fri Jul 31 19:18:30 2026
  from Madison, Nc via Telnet
- Bob Worm
  Fri Jul 31 15:23:30 2026
  from Wales, Uk via Telnet
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	121:40:36
Calls:	12,468
Calls today:	10
Files:	15,200
Messages:	6,538,318

Why is flex pattern-matching of NULs slow?

Who's Online

Recent Visitors

System Info