Forum: >>> Magnum BBS <<<

Some questions about recursive descent?

From Johann 'Myrkraverk' Oskarsson@21:1/5 to All on Sun Feb 27 19:02:59 2022

Dear comp.compilers,

I am currently reading two compiler books, [Holub] and [Appel], and
then [Salomon] for a bit of context below,

[Holub] Compiler Design in C, 1990, by Allen Holub

[Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel

[Salomon] Assemblers and Loaders, 1993, David Salomon

and I have some questions about the construction of recursive descent
parsers. If some of my questions are adequately answered in some other publication, reference, preferably with chapter and verse, is welcome.

The reason for my questions is that I know production compilers /use/
recursive descent. GCC is probably the most famous example, but Watcom
does it too. More below as well.

My first question is: how are production recursive descent parsers con- structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with
an example or two, but there seems to be a whole lot more to it. For
instance, [Appel] mentions a parse table but mostly does not explain how
to construct one. The rest of the book seems to be focused on ML-lex
and ML-yacc, which I haven't read yet.
I have read more of [Holub] and he explains the theory much better,
in fact this seems to be the best theory introduction I remember coming
across ever. Yet, that book also does not spend too much effort on rec-
ursive descent, and goes on about implementing and using lex and yacc
like tools.
[Holub] explains the construction/algorithms for FIRST and FOLLOW
sets better than [Appel], in my opinion, but I do not recall any more
details about said parse table, which seems instrumental in the con-
struction of the final recursive descent. [Holub] does mention Q
grammars, and how they are very easy to turn into recursive descent.

Second question: why are recursive descent parsers I've come across
always using globals and construct code and/or the parse tree as side
effects, rather than, say, return the parse tree to the caller?
Is this something to do with /synthesized attributes/ that [Holub]
talks about? Where the recursive descent parser routines return values
to the caller, or is this maybe just a tradition that no one bothers to
change?

Now, to give these questions a bit of context, as a practice, I wanted
to create a recursive descent parser for the first programming exercise
in [Salomon] but found out the hard way that it's not so trivial to
figure out /how/.
My first methods were somewhat ad-hoc to create and modify the
grammar to be non-left-recursive, which I think was mostly easy due to
the assembler being very bare bones, and hence the grammar can be very
simple. Then I ran into trouble when I tried to make the parser return
the parse tree, rather than construct it with side effects.
Granted, some of my problems are from using a programming language,
ML, that I'm not familiar with, which I decided to use for another
wrinkle in the exercise. If the project is too easy, it just isn't
/fun/.
The first wrinkle I stumbled upon, when I started to read up on the subject, is that constructing the parse table, as a preliminary step,
isn't really explained in [Appel] -- at least not to my satisfaction --
and I found not much better coverage of the subject in [Holub] though
certain topics are covered better.
In the past, when I've stumbled like this, reading up on the subject
has helped a lot, but so far, almost everything I've read seems inade-
quate in different ways.

Side thought: do any of the mentioned authors read this list?

Thanks,
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Johann 'Myrkraverk' Oskarsson on Sun Feb 27 17:13:34 2022

On Sunday, February 27, 2022 at 1:37:09 PM UTC-8, Johann 'Myrkraverk' Oskarsson wrote:

I am currently reading two compiler books, [Holub] and [Appel], and
then [Salomon] for a bit of context below,

[Holub] Compiler Design in C, 1990, by Allen Holub

[Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel

[Salomon] Assemblers and Loaders, 1993, David Salomon

and I have some questions about the construction of recursive descent parsers. If some of my questions are adequately answered in some other publication, reference, preferably with chapter and verse, is welcome.

A few thoughts, which might not answer all your questions.

One is that compilers that use recursive descent often don't use it for everything. Some might use it only for the statement level, and
something else, such as operator precedence, for expressions.

One of the complications with recursive descent is error handling.
Consider that a user might forget, or misspell, a closing statement
of some block structure. All the rest of the program is then parsed
as inside the block, or at least until something that isn't allowed inside.

And that might partly get to your question about returning the parse tree. Without errors, that might be fine. But when errors occur, the compiler
has to undo much that was done, but should not have been done. That
is somewhat easier with a global tree, than one that is distributed
throughout the recursive contexts of the called routines.

Also, much of earlier compiler design was done when computer
memories were small. Designing compilers for small memory
usage is now a lost art.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans-Peter Diettrich@21:1/5 to Johann 'Myrkraverk' Oskarsson on Mon Feb 28 06:48:09 2022

On 2/27/22 8:02 PM, Johann 'Myrkraverk' Oskarsson wrote:

My first question is: how are production recursive descent parsers con- structed?

I think that most parsers are hand made, from a well suited (EBNF...)
grammar.

But also have a look at the many tools found by "LL(1) parser
generator". I don't think that all these tools are/have been used to
generate compilers for a couple of new programming languages.

I've explored and modified CoCo/R many years ago. After I understood the principles I wrote my essential parsers for OPL and C manually, based on
the fine patterns of that Compiler Compiler.

Second question: why are recursive descent parsers I've come across
always using globals and construct code and/or the parse tree as side effects, rather than, say, return the parse tree to the caller?

Don't think only of programming language compilers if you ask for a
parser. There exist many use cases for parsers like pretty printer, CSV,
RNA or other data deciphering. Not all such "compilers" need a
(complete) parse tree but interpret the parser output differently,
depending on the compilation target.

Is this something to do with /synthesized attributes/ that [Holub] talks about? Where the recursive descent parser routines return values
to the caller, or is this maybe just a tradition that no one bothers to change?

One of the nasties disadvantages of commonly used formal grammars are
the specific attributes or similar decoration for a specific compiler.
This makes it hard to use a verified grammar for construction of a tool
other than the one the grammar writer had in mind, maybe a different
compiler implementation language or framework. So one use case of a
parser generator were a decoration remover or converter from some
grammar. This were my most important argument *against* embedding the
parse tree generation into the parser.

Another very interesting experiment was cooperation between Quinn Tyler Jackson's MetaS parser program, written in C++, and my parse tree
generator, written in Delphi.

Now, to give these questions a bit of context, as a practice, I wanted
to create a recursive descent parser for the first programming exercise
in [Salomon] but found out the hard way that it's not so trivial to
figure out /how/.

Parser generators can be too picky WRT minor problems of a grammar, like
the famous dangling else. Most practical solution then is a reduced
grammar without the problematic parts and full or partial manual parser implementation.

DoDi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Johann 'Myrkraverk' Oskarsson on Mon Feb 28 07:43:39 2022

Johann 'Myrkraverk' Oskarsson <[email protected]> writes:

My first question is: how are production recursive descent parsers con- >structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with
an example or two, but there seems to be a whole lot more to it. For >instance, [Appel] mentions a parse table but mostly does not explain how
to construct one.

I don't know what such a table would be good for. My parser generator
Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
generates recursive-descent parsers, certainly does not need one.

The way Gray works is: It computes first and follow sets for all nodes
in the grammar; follow sets are only used for computing first sets and
for reporting conflicts, first sets are used for deciding how to
parse. It generates code as follows:

grammar code
terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end
a b code(a) code(b)
a|b if sym in first(a) then code(a); else code(b); end
a? if sym in first(a) do code(a); end
a* while sym in first(a) do code(a); end
a+ do code(a) while sym in first(a)
nonterm call(function(nonterm))
action action

a?, a* and a+ are regular right part grammar (RRPG) syntax; they
correspond to [a], {a} and a{a} respectively in EBNF.

Second question: why are recursive descent parsers I've come across
always using globals and construct code and/or the parse tree as side >effects, rather than, say, return the parse tree to the caller?

I have no explanation for the case of constructing the parse tree;
that's trivial to do by returning it.

Concerning code generation, constructing it as a string is inefficient
with the conventional string representation (array of chars), because concatenation takes time proportional to the length of the string
(likewise, if you generate machine code or virtual machine code as an
array of bytes). You can now add an unconventional string
representation (e.g., a rope), but that still consumes memory; or you
just use side effects for outputting the code.

The disadvantage of the latter is that once you have output the code,
you cannot take it back (or at least not easily), while with strings
you can arrange the code for the nodes in a different than processing
order, have the code for a node several times etc. without introducing additional complications in your compiler.

Is this something to do with /synthesized attributes/ that [Holub]
talks about? Where the recursive descent parser routines return values
to the caller

Synthesized attributes are trivial to pass along in recursive-descent
parsers by returning them. You can also trivially do L-attributed
grammars by passing attributes inherited from left of a nonterminal as parameters to a rule.

Then I ran into trouble when I tried to make the parser return
the parse tree, rather than construct it with side effects.
Granted, some of my problems are from using a programming language,
ML, that I'm not familiar with

I would expect that returning the parse tree is easier in ML than
using side effects (especially if you are unfamiliar with ML, because literature about ML probably does not explain side effects early on).

If you found the literature you have read up to now to be not
satisfactory, maybe Wirth's compiler book is more to your taste. IIRC
he explains how to write a recursive-descent parser for PL/0 (a simple Pascal-like language).

- anton
--
M. Anton Ertl
[email protected]
http://www.complang.tuwien.ac.at/anton/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Mon Feb 28 15:40:37 2022

I'll start with the most important response first.

It should definitely be possible to return the generated syntax tree
from a procedure implementing a parser rule (for a non-terminal) in a
recursive descent compiler. In fact, it surprises me that you say
that's not the normal case. It's *almost* the definition of recursive
descent. One calls the function implementing some non-terminal and
the non-terminal consumes its input, building internally a
representation of that input (as a single structure, i.e. an AST node)
and returns that structure to the caller, so that the caller can do
the same. Each call makes a new rooted tree (that identifies what
kind of tree it is, i.e. what non-terminal it represents) and it is
used by the caller as part of the larger tree containing it. Now, I
supposed it *could* make sense not to make that the return value for
the function implementing the non-terminal, but it certainly is a
natural value to return. More on that after I talk about tables.

It is possible to implement LL (as was as LR and precedence and other)
parsers as parser tables. Recursive descent doesn't generally do so.
It implements that parser as code, specifically nested function calls,
with some (generally, but not necessarily) small amount of logic to
determine which variant (alternative) of a non-terminal is being
processed. This logic is what uses the result of the first (and
follow) functions to determine based upon the initial tokens which
alternative matches the input. This is generally done without parsing
tables. Using parsing tables generally suggests you are implementing
a different parsing methodology than recursive descent, even if you
are doing so only as a subroutine within a recursive descent parser.
The parts not using the tables are the recursive descent part. The
part using the table is an embedded parsing technology within an
otherwise recursive descent framework.

Ok, back to the question of what a recursive descent parser's routine implementing the parsing of a non-terminal should be. In an
Object-oriented world, it would be reasonable for it to return an
"object" representing the non-terminal, with methods/member functions
that get you the sub-parts, evaluate attributes, etc. Another
reasonable return value, comes from the Rust world, it could be a
"result type", where it is either the non-terminal as discussed
before, or an error. (Result<NonTerminal, Error>). However, in the
non-error case, you generally want each parsing rule to return the
root non-terminal of the AST is parsed.

-- ****************************************************************************** Chris Clark email: [email protected] Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alain Ketterlin@21:1/5 to Anton Ertl on Mon Feb 28 21:52:54 2022

[email protected] (Anton Ertl) writes:

Johann 'Myrkraverk' Oskarsson <[email protected]> writes:

My first question is: how are production recursive descent parsers con- >>structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with
an example or two, but there seems to be a whole lot more to it. For >>instance, [Appel] mentions a parse table but mostly does not explain how
to construct one.

I don't know what such a table would be good for. My parser generator
Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
generates recursive-descent parsers, certainly does not need one.

A parse table for recursive descent/LL(1) simply tells which rule to
expand given the non-terminal currently parsed and the incoming token
type. The building of this table is described in, e.g., the dragon book
(for sure in the 2nd edition).

The way Gray works is: It computes first and follow sets for all nodes
in the grammar; follow sets are only used for computing first

You probably mean the reverse (first is used to compute follow).

sets and for reporting conflicts, first sets are used for deciding how
to parse. It generates code as follows:

grammar code
terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end a b code(a) code(b)
a|b if sym in first(a) then code(a); else code(b); end
a? if sym in first(a) do code(a); end
a* while sym in first(a) do code(a); end
a+ do code(a) while sym in first(a)
nonterm call(function(nonterm))
action action

The parsing table embodies all that logic. Parsing works by maintaining
a stack of symbols, and the table indicates what actions need to be
taken (either replace one non-terminal with a rhs on the stack, or match
one input symbol). The algorithm works with any table, and the whole
thing (algorithm+table) is certainly much shorter than generating code
for each and every rule. EBNF constructions like ?/*/+/... have to be translated first into "regular" rules, but that's trivial pre-processing
(along with left factoring/recursivity elimination).

Building a parse tree in this context is less obvious, because
non-terminal symbols seem to "disappear" during the parsing. The
solution (at least the one I came up with) consists in keeping special
entries on the stack signaling the end of a rhs. Then, special actions
(i.e., "reduce") can be taken when finding these special entries during parsing, much in the same way as it is done in bottom-up (e.g., LR)
parsing: the "semantic values" of the various symbols can be pushed onto
a separate stack, and you end up with something really similar to
LR-parsing.

-- Alain.
[I think the point is that if you have a table, you have a table driven
LL(1) parser rather than recursive descent in which the grammar is embedded
in the code. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Johann 'Myrkraverk' Oskarsson@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 01:40:33 2022

Johann 'Myrkraverk' Oskarsson <[email protected]> writes:

My first question is: how are production recursive descent parsers con-
structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with
an example or two, but there seems to be a whole lot more to it. For
instance, [Appel] mentions a parse table but mostly does not explain how >>> to construct one. ...

I think, at this point, I'd like to demonstrate what [Appel] was talking
about. I can't expect every reader of comp.compilers to have the book
at hand.

The demonstration starts with grammar 3.12, page 47.

Z -> d Y -> X -> Y
Z -> X Y Z Y -> c X -> a

where Y goes to epsilon in the centre of the first line.
Then [Appel] shows how FIRST and FOLLOW sets are computed
given this grammar, ending with

nullable FIRST FOLLOW
X yes a c a c d
Y yes c a c d
Z no a c d

where previous steps are also shown[1]. Then figure 3.14 shows a
predictive parsing table, presumably constructed using the FIRST
and FOLLOW sets.

a c d
+-----------------------------------------+
X | X -> a X -> Y X -> Y |
| Y -> Y |
| |
Y | Y -> Y -> Y -> |
| Y -> c |
| |
Z | Z -> X Y Z Z -> X Y Z Z -> d |
| Z -> X Y Z |
+-----------------------------------------+

Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
recursive descent parser; or LL(1) grammar per the book.

I am not clear on how to construct such a table from the description
in the book, and would not presume to be able to do so by hand, using
a more complex grammar.

I'll give the paragraph where [Appel] explains /recursive descent/.

Consider a recursive-descent parser. The parsing function for some nonterminal X has a clause for each X-production; it must choose one
of these clauses based on the next token T of the input. If we can
choose the right production for each (X, T), then we can write the recursive-descent parser. All the information we need can be encoded
as a two-dimensional table of productions, indexed by nonterminals X
and terminals T. This is called a /predictive parsing/ table.

So I was wondering if people who create /recursive descent/ parsers for production compilers use such a table as an intermediary step, or not.
Of course, as I mentioned before, [Holub] mentions Q grammars as an
alternative intermediary step, but only in an off hand manner.

This lack of clear directions led me to ask the first question: how are production compilers using /recursive descent/ constructed? What are
the steps involved? What do people use, or not use?

I don't know Forth, so I haven't done much more than take a cursory look
at the Gray manual; but it certainly wouldn't surprise me if people used
a skeleton generator, from [E]BNF grammar, and then filled in the
details later.

I tried to make the question open-ended, because I am pretty sure people
use methods not described in either book. I have not looked this up in
the dragon book, because currently my copy is in a different country,
and the shipping cost might rival the price of a new copy. Given a
chapter and verse in that book, I can probably read it with a little bit
of help, and I am totally up to get myself further material if it's
useful.

[1] I think overall the steps are better explained in [Holub] but I did
not try to following along with this particular grammar, using [Holub]'s method.

P.S. I'll try to reply to the other threads later on.

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

[Appel is showing how you turn a normal grammar into an LL(1) grammar. The algorithm on page 49 tells you how to make the first and follow sets, then
the informal algorithm in the second para on page 51 turns it into the parse table. Then he says gee, the table has more then one rule in some of the
cells so he spends a few more pages turning it into an LL(1) grammar on
pages 53-54. In my experience most people writing RD parsers do this informally, e.g., to deal with left recursion in expressions you know
that you eat the first token, then peek at the next token and eat it
if you're in the correct rule, otherwise return and let the caller
deal with it. If this seems like more trouble than it's worth, now
you know why we use parser generators. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 08:01:24 2022

Johann 'Myrkraverk' Oskarsson <[email protected]d> writes:

So I was wondering if people who create /recursive descent/ parsers for >production compilers use such a table as an intermediary step, or not.

Gray doesn't. For finding out about conflicts, it just looks at the
nodes involved. E.g., for a|b, it produces a warning if both a and b
can produce epsilon, if the first sets of a and b are not disjoint,
and if a|b may produce epsilon and its first set and its follow set
are not disjoint.

I don't know Forth, so I haven't done much more than take a cursory look
at the Gray manual; but it certainly wouldn't surprise me if people used
a skeleton generator, from [E]BNF grammar, and then filled in the
details later.

Gray is not a skeleton generator. It takes an RRPG (regular right
part grammar, similar to EBNF) with actions, and produces an
executable parser (i.e., not as source code). You then run that
parser, you do not fill in the details later; you provide all the
details in the source code before you generate the parser.

BTW, an updated version of Wirth's book that I mentioned is available
online for free: <https://people.inf.ethz.ch/wirth/CompilerConstruction/CompilerConstruction1.pdf>.
The sections of interest to you seem to be Section 4.1 and 7.2,
probably also 7.3.

[ [...] In my experience most people writing RD parsers do this
informally, e.g., to deal with left recursion in expressions you know
that you eat the first token, then peek at the next token and eat it
if you're in the correct rule, otherwise return and let the caller
deal with it. If this seems like more trouble than it's worth, now
you know why we use parser generators. -John]

Or you start with EBNF or RRPG where you can express repetition
without recursion, so left recursion is rare.

I happen to look at Exercise 7.2 at this moment, which suggests
another way to deal with difficulties:

|7.2. Where is the Oberon syntax not LL(1), that is, where is a
|lookahead of more than one symbol necessary? Change the syntax in such
|a way that it satisfies the LL(1) property.

- anton
--
M. Anton Ertl
[email protected]
http://www.complang.tuwien.ac.at/anton/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alain Ketterlin@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 23:13:25 2022

Johann 'Myrkraverk' Oskarsson <[email protected]d> writes:

Johann 'Myrkraverk' Oskarsson <[email protected]> writes:

My first question is: how are production recursive descent parsers con- >>>> structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with >>>> an example or two, but there seems to be a whole lot more to it. For
instance, [Appel] mentions a parse table but mostly does not explain how >>>> to construct one. ...

[...]

The demonstration starts with grammar 3.12, page 47.

Z -> d Y -> X -> Y
Z -> X Y Z Y -> c X -> a

where Y goes to epsilon in the centre of the first line.
Then [Appel] shows how FIRST and FOLLOW sets are computed
given this grammar, ending with

nullable FIRST FOLLOW
X yes a c a c d
Y yes c a c d
Z no a c d

where previous steps are also shown[1]. Then figure 3.14 shows a
predictive parsing table, presumably constructed using the FIRST
and FOLLOW sets.

a c d
+-----------------------------------------+
X | X -> a X -> Y X -> Y |
| Y -> Y |
| |
Y | Y -> Y -> Y -> |
| Y -> c |
| |
Z | Z -> X Y Z Z -> X Y Z Z -> d |
| Z -> X Y Z |
+-----------------------------------------+

Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
recursive descent parser; or LL(1) grammar per the book.

I am not clear on how to construct such a table from the description
in the book, and would not presume to be able to do so by hand, using
a more complex grammar.

That's a 2-dimensional table M[N,t] where N is a non-terminal, and t
is a terminal symbol. It is build by the following algorithm:

for each rule L -> R1...Rk
for each terminal f in FIRST (R1...Rk)
add the rule to M[L,f]
if R1...Rk is nullable
for each terminal f in FOLLOW (L)
add the rule to M[L,f]

In essence: FIRST lets you know when to expand a non-terminal, and
FOLLOW lets you know when to apply a rule that matches nothing in
the input.

The grammar is LL(1) is all entries of the table contain at most one
rule. That is not the case in your example.

Parsing goes as follows: maintain a stack on which you initially push
the start symbol. Then repeat:

if top(stack) is a terminal symbol
if it matches the next input
pop it from the stack and advance input
else
signal error
else
if M[top(stack),next_input] is empty
signal error
else // M[...] is a rule L -> R1...Rk
pop an element from the stack and push Rk, ..., R1

(note that you push the rhs in reverse order, such as the leftmost
symbol appears on the top of the stack).

The parsing succeeds if you end up with an empty stack and an empty
input.

I don't have my copy of the dragon book with me, but from pearson's web
site I find this is the object of chapter 4 section 4.

So I was wondering if people who create /recursive descent/ parsers for production compilers use such a table as an intermediary step, or not.

[...]

As John says, if your language fits the requirements (that is, it has a
LL(1) grammar, or LR(1)), you are better off using a generator. It turns
out that "big" compiler parsers are often written by hand, because 1)
grammars are often ambiguous (sometimes at the lexical level), 2)
ambiguities are easier to resolve "by hand" because you can take more
context into account, and 3) hand-written parsers give you more control
on error recovery. (Also because memory is now big enough to hold a
parse tree.)

For C/C++ parsers, add the fact that there are several grammatical
layers (e.g., OpenMP pragmas), and/or you need details that automated
parsing usually omits (e.g., for diagnostics and tools). That's why both
gcc and clang parsers are hand-written.

The official Python implementation used an LL(1) (auto-generated) parser
until recently, where they switched to a PEG-parser (I have to admit I
don't understand their motivation for doing this -- as exposed in a
document called PEP 617). It is still auto-generated as far as I
understand.

-- Alain.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans-Peter Diettrich@21:1/5 to Johann 'Myrkraverk' Oskarsson on Wed Mar 2 05:52:46 2022

On 3/1/22 2:40 AM, Johann 'Myrkraverk' Oskarsson wrote:

where previous steps are also shown[1]. Then figure 3.14 shows a
predictive parsing table, presumably constructed using the FIRST
and FOLLOW sets.

          a            c             d
    +-----------------------------------------+
X | X -> a        X -> Y        X -> Y     |
    | Y -> Y                                 |

Just a typo correction: In (X,a) the second alternative should read
X -> Y
not Y -> Y.

DoDi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Robinson@21:1/5 to [email protected] on Sat Mar 5 14:46:08 2022

On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson <[email protected]> asked in Newsgroup comp.compilers:

| I have some questions about the construction of
| recursive descent parsers. The reason for my questions
| is that I know production compilers /use/ recursive
| descent. My first question is: how are production
| recursive descent parsers constructed?

If you want to see another example of this, since I added features to
it, is the XDPascal compiler, available from https://github.com/electric-socket/xdpw or https://XDpascal.com. Now,
what I'm going to do is explain why and how this is done.

Pascal syntax requires that, after a procedure or function signature,
and any constants, types or variables are declared, it must have a
block. This is a BEGIN, zero or more statements, each separated by a
semicolon, then an optional semicolon, then END and a semicolon if it's
a procedure or function, a period if it's the end of a program. A syntax diagram may help here:

Block: ^______________>|
| V

BEGIN---+-->Statement---+-->V--------->^----> END --->

^_______ ; <____V |__> ;_>__|

In the code example below, any identifier ending
in TOK is the token for that symbol or keyword,
e.g. DOTOK for DO, DOTTOK for period, etc.

So at the end of the compile, the compiler would
have a piece of code similar to this:

block;
nexttoken;
if thistoken<>DOTTOK then error('Period expected');
finishcompiling;

Procedure "block" would have something similar to
the following:

repeat
statement;
until thistoken = ENDTOK or END_OF_FILE ;

Procedure "Statement" would look like this:

nextToken;
Case thisToken of
Identifier : Identstmt;
FORTOK: ForStmt
REPEATTOK: RepeatStmt;
BEGINTOK: Block; // [1]
...
WHILETOK: WhileStmt
ELSE
Error('Syntax error');

FOR statement:
----> FOR --> := --> ident ------> expression -->|
|<-----------------------------------------------V
V--V------- TO ----->----> expression --> DO --|
V____> DOWNTO -->| |
|
|<---------------------------------------------V
V--------- statement ----------------->

Now, procedure ForStmt might be coded like this:

nexttoken;
if thistok <> BECOMESTOK then
error(' := expected');
nexttoken;
if thistok <> identTok then
error(' Identifier expected ');
...

and so on until:

nexttoken;
if thistok <> DOTOK then
error(' DO expected');
nexttoken;
statement;

Now, if you look, at // [1], the case selector for
the token BEGIN calls procedure block again, while
it itself was called by block earlier. And, after the
DO token in FOR, statement calls itself.

If you notice, recursive descent works because each
statement in the program is processed until a "trigger"
token exits that state back to where it was called.
On return, syntax checking continues with the next
token after the one that caused a block or statement
procedure to be called.

I hope this example helps you to understand recursive
descent parsing.

XDPascal.com: The (new) home of the XDPascal Compiler.
----
"The lessons of history teach us - if they teach us anything - that no
one learns the lessons that history teaches us."
Paul Robinson <[email protected]> / <[email protected]>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH
- Centurion
  Fri Jul 31 00:59:56 2026
  from Berea, Ohio via Telnet
- Rixter
  Fri Jul 31 00:00:46 2026
  from Madison, Nc via Telnet
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet
- Rixter
  Thu Jul 30 14:17:17 2026
  from Madison, Nc via Telnet
- Krenn
  Thu Jul 30 13:16:49 2026
  from Sydney, Nsw via Telnet
- Bob Worm
  Thu Jul 30 09:03:28 2026
  from Wales, Uk via Telnet
- Bob Worm
  Thu Jul 30 08:47:34 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	107:23:28
Calls:	12,461
Calls today:	3
Files:	15,197
Messages:	6,538,088

Some questions about recursive descent?

Who's Online

Recent Visitors

System Info