Forum: >>> Magnum BBS <<<

The remarkable similarities between Flex/Lex and XSLT

From Roger L Costello@21:1/5 to All on Fri Jun 24 10:57:51 2022

Hi Folks,

XSLT is a language for processing XML documents.

There are remarkable similarities between Flex/Lex and XSLT. Lex was created
47 years ago, long before XSLT. One wonders if some members of the XSLT 1.0 Working Group were Lex users and were influenced by its concepts?

Here are some of the similarities between Flex/Lex and XSLT:

Both are pattern-matching languages, i.e.,

pattern action
pattern action
pattern action

Both allow you to create subsets:

The XSLT "mode" allows the program to effectively be broken into a set of
mini XSLT programs
The Flex/Lex "start condition" allows the lexer to effectively be broken into
a set of mini lexers

Both have a default rule that is executed when no other rule matches

There are some differences, of course:

XSLT is primarily for processing XML but it can process plain text, whereas Flex/Lex is primarily for processing plain text (esp. source code) but it can process XML

XSLT uses XPath as its pattern-matching language, whereas Flex/Lex uses
regular expressions as its pattern-matching language

Pretty interesting, I think!

/Roger
[I would be surprised if the XSLT authors hadn't seen lex but Wikipedia suggests
it was more influenxed by awk, which also has pattern action lines. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Roger L Costello on Fri Jun 24 06:43:43 2022

On Friday, June 24, 2022 at 6:00:44 AM UTC-7, Roger L Costello wrote:

XSLT is a language for processing XML documents.

There are remarkable similarities between Flex/Lex and XSLT. Lex was created 47 years ago, long before XSLT. One wonders if some members of the XSLT 1.0 Working Group were Lex users and were influenced by its concepts?

Here are some of the similarities between Flex/Lex and XSLT:

Both are pattern-matching languages, i.e.,

I was thinking, but didn't post yet, about the different ways of writing pattern
matching languages. Well, more specifically about parsing languages,
but even more about the pattern matching part.

I wrote recently about STEP, which has an input language somewhat
different from yacc/bison for describing a parser.

And even more, if there should be a language for writing pattern
matching languages in.

That is, do we need a compiler-compiler-compiler.

It does seem rare that one starts from scratch in defining a new
computer language, even though not a general purpose
programming language.
[Pattern-action goes back at least to RPG in 1959, and it was
based on the way plugboard accounting machines work. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Sat Jun 25 18:32:07 2022

I think we should have (not sure about the word "need", "should have" is
close enough) compiler-compiler-compilers.

We know enough to write a single algorithm that can generate regular
expression recognizers as either NFAs or DFAs, LL(k) parsers, LR(k)
parsers, GLR(k) parsers, PEG parsers, and can incorporate captures, back-references, predicates, adaptive rules, permutations, dynamic
precedence rules, etc. We also know enough to include the generation of visitors, attribute evaluators, and other next-level "assistants".
Having it accept a variety of notations is also relatively easy.

And to truly make it a compiler-compiler-compiler, one needs to make the
parts separable and be able to be "generated" in the special case forms
(e.g. to be able to recreate an LL version like ANTLR or an LR version like Bison and of course, the variations in between). Circa 2000 we already had
a version of our Yacc++ that could generate something close to recursive descent code from an LR grammar, so this is just extending that concept.
ANTLR4 is doing something close to the reverse and moving to a more state-machine-like description of (mostly) LL grammars.

Thus, creating a compiler-compiler-compiler is a feasible (although
non-trivial task). It's on my "to-do" list.

-- ******************************************************************************

Chris Clark email:
[email protected]
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Roger L Costello on Sat Jun 25 09:20:54 2022

On Friday, 24 June 2022 at 09:00:44 UTC-4, Roger L Costello wrote:

Hi Folks,

XSLT is a language for processing XML documents.

There are remarkable similarities between Flex/Lex and XSLT. Lex was created 47 years ago, long before XSLT. One wonders if some members of the XSLT 1.0 Working Group were Lex users and were influenced by its concepts?

It's not really about a single tool like Lex.

Before XML there was SGML, which XML was supposed to "simplify". SGML
included a schema language (DTD), which defines the hierarchical structure of a document using regular expressions over elements. There was also a strange unnecessary constraint on these expressions called "ambiguity", which *everybody* who wrote SGML software needed to understand, and so the idea of applying formal language techniques to SGML was inevitable.

Long before XSLT, there were a variety of attempts to define languages that would allow users to specify an automatic translation from SGML into printed form. Many of these languages were context-free grammars at their core, with translation rules as actions. This is called "syntax-directed translation"
and was a well-known concept long before that.

With SGML, though, the problem of syntax-directed translation is different
than it is in other contexts, and more difficult in many ways, because the basic structures in the input are very easy to parse -- elements are delimited after all -- but the input was a semantically marked up text and the output was a published document that had to follow all the ambiguously-defined stylistic rules that people use when they actually to typography. This meant that complicated grammars, over *element trees* instead of linear text, and lots of other ideas, needed to be applied. Lots of companies put a lot of
work into it.

So by the time XSLT came around, everyone on the committee as already familiar with a lot of this history from SGML processing, which was based on a lot of work rooted in the same formal language theory that goes into lexers and parsers, and that is why some of XSLT looks a lot like Lex.

Unfortunately, XSLT kind of sucks. When the standard was written, the problem itself had not really been solved by industry in a really acceptable way (and it still hasn't been!), and the W3C committee fell into the trap of trying to innovate instead of codifying best practice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	125:13:37
Calls:	12,470
Files:	15,200
Messages:	6,538,385

The remarkable similarities between Flex/Lex and XSLT

Who's Online

System Info