• Some questions about recursive descent?

    From Johann 'Myrkraverk' Oskarsson@21:1/5 to All on Sun Feb 27 19:02:59 2022
    Dear comp.compilers,

    I am currently reading two compiler books, [Holub] and [Appel], and
    then [Salomon] for a bit of context below,

    [Holub] Compiler Design in C, 1990, by Allen Holub

    [Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel

    [Salomon] Assemblers and Loaders, 1993, David Salomon

    and I have some questions about the construction of recursive descent
    parsers. If some of my questions are adequately answered in some other publication, reference, preferably with chapter and verse, is welcome.

    The reason for my questions is that I know production compilers /use/
    recursive descent. GCC is probably the most famous example, but Watcom
    does it too. More below as well.

    My first question is: how are production recursive descent parsers con- structed? From these two books, and other things I have read in the
    past, recursive descent is so simple it's mostly just brushed off with
    an example or two, but there seems to be a whole lot more to it. For
    instance, [Appel] mentions a parse table but mostly does not explain how
    to construct one. The rest of the book seems to be focused on ML-lex
    and ML-yacc, which I haven't read yet.
    I have read more of [Holub] and he explains the theory much better,
    in fact this seems to be the best theory introduction I remember coming
    across ever. Yet, that book also does not spend too much effort on rec-
    ursive descent, and goes on about implementing and using lex and yacc
    like tools.
    [Holub] explains the construction/algorithms for FIRST and FOLLOW
    sets better than [Appel], in my opinion, but I do not recall any more
    details about said parse table, which seems instrumental in the con-
    struction of the final recursive descent. [Holub] does mention Q
    grammars, and how they are very easy to turn into recursive descent.

    Second question: why are recursive descent parsers I've come across
    always using globals and construct code and/or the parse tree as side
    effects, rather than, say, return the parse tree to the caller?
    Is this something to do with /synthesized attributes/ that [Holub]
    talks about? Where the recursive descent parser routines return values
    to the caller, or is this maybe just a tradition that no one bothers to
    change?

    Now, to give these questions a bit of context, as a practice, I wanted
    to create a recursive descent parser for the first programming exercise
    in [Salomon] but found out the hard way that it's not so trivial to
    figure out /how/.
    My first methods were somewhat ad-hoc to create and modify the
    grammar to be non-left-recursive, which I think was mostly easy due to
    the assembler being very bare bones, and hence the grammar can be very
    simple. Then I ran into trouble when I tried to make the parser return
    the parse tree, rather than construct it with side effects.
    Granted, some of my problems are from using a programming language,
    ML, that I'm not familiar with, which I decided to use for another
    wrinkle in the exercise. If the project is too easy, it just isn't
    /fun/.
    The first wrinkle I stumbled upon, when I started to read up on the subject, is that constructing the parse table, as a preliminary step,
    isn't really explained in [Appel] -- at least not to my satisfaction --
    and I found not much better coverage of the subject in [Holub] though
    certain topics are covered better.
    In the past, when I've stumbled like this, reading up on the subject
    has helped a lot, but so far, almost everything I've read seems inade-
    quate in different ways.

    Side thought: do any of the mentioned authors read this list?

    Thanks,
    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Johann 'Myrkraverk' Oskarsson on Sun Feb 27 17:13:34 2022
    On Sunday, February 27, 2022 at 1:37:09 PM UTC-8, Johann 'Myrkraverk' Oskarsson wrote:
    I am currently reading two compiler books, [Holub] and [Appel], and
    then [Salomon] for a bit of context below,

    [Holub] Compiler Design in C, 1990, by Allen Holub

    [Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel

    [Salomon] Assemblers and Loaders, 1993, David Salomon

    and I have some questions about the construction of recursive descent parsers. If some of my questions are adequately answered in some other publication, reference, preferably with chapter and verse, is welcome.

    A few thoughts, which might not answer all your questions.

    One is that compilers that use recursive descent often don't use it for everything. Some might use it only for the statement level, and
    something else, such as operator precedence, for expressions.

    One of the complications with recursive descent is error handling.
    Consider that a user might forget, or misspell, a closing statement
    of some block structure. All the rest of the program is then parsed
    as inside the block, or at least until something that isn't allowed inside.

    And that might partly get to your question about returning the parse tree. Without errors, that might be fine. But when errors occur, the compiler
    has to undo much that was done, but should not have been done. That
    is somewhat easier with a global tree, than one that is distributed
    throughout the recursive contexts of the called routines.

    Also, much of earlier compiler design was done when computer
    memories were small. Designing compilers for small memory
    usage is now a lost art.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to Johann 'Myrkraverk' Oskarsson on Mon Feb 28 06:48:09 2022
    On 2/27/22 8:02 PM, Johann 'Myrkraverk' Oskarsson wrote:

    My first question is: how are production recursive descent parsers con- structed?

    I think that most parsers are hand made, from a well suited (EBNF...)
    grammar.

    But also have a look at the many tools found by "LL(1) parser
    generator". I don't think that all these tools are/have been used to
    generate compilers for a couple of new programming languages.

    I've explored and modified CoCo/R many years ago. After I understood the principles I wrote my essential parsers for OPL and C manually, based on
    the fine patterns of that Compiler Compiler.


    Second question: why are recursive descent parsers I've come across
    always using globals and construct code and/or the parse tree as side effects, rather than, say, return the parse tree to the caller?

    Don't think only of programming language compilers if you ask for a
    parser. There exist many use cases for parsers like pretty printer, CSV,
    RNA or other data deciphering. Not all such "compilers" need a
    (complete) parse tree but interpret the parser output differently,
    depending on the compilation target.


        Is this something to do with /synthesized attributes/ that [Holub] talks about?  Where the recursive descent parser routines return values
    to the caller, or is this maybe just a tradition that no one bothers to change?

    One of the nasties disadvantages of commonly used formal grammars are
    the specific attributes or similar decoration for a specific compiler.
    This makes it hard to use a verified grammar for construction of a tool
    other than the one the grammar writer had in mind, maybe a different
    compiler implementation language or framework. So one use case of a
    parser generator were a decoration remover or converter from some
    grammar. This were my most important argument *against* embedding the
    parse tree generation into the parser.

    Another very interesting experiment was cooperation between Quinn Tyler Jackson's MetaS parser program, written in C++, and my parse tree
    generator, written in Delphi.


    Now, to give these questions a bit of context, as a practice, I wanted
    to create a recursive descent parser for the first programming exercise
    in [Salomon] but found out the hard way that it's not so trivial to
    figure out /how/.

    Parser generators can be too picky WRT minor problems of a grammar, like
    the famous dangling else. Most practical solution then is a reduced
    grammar without the problematic parts and full or partial manual parser implementation.

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Johann 'Myrkraverk' Oskarsson on Mon Feb 28 07:43:39 2022
    Johann 'Myrkraverk' Oskarsson <[email protected]> writes:
    My first question is: how are production recursive descent parsers con- >structed? From these two books, and other things I have read in the
    past, recursive descent is so simple it's mostly just brushed off with
    an example or two, but there seems to be a whole lot more to it. For >instance, [Appel] mentions a parse table but mostly does not explain how
    to construct one.

    I don't know what such a table would be good for. My parser generator
    Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
    generates recursive-descent parsers, certainly does not need one.

    The way Gray works is: It computes first and follow sets for all nodes
    in the grammar; follow sets are only used for computing first sets and
    for reporting conflicts, first sets are used for deciding how to
    parse. It generates code as follows:

    grammar code
    terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end
    a b code(a) code(b)
    a|b if sym in first(a) then code(a); else code(b); end
    a? if sym in first(a) do code(a); end
    a* while sym in first(a) do code(a); end
    a+ do code(a) while sym in first(a)
    nonterm call(function(nonterm))
    action action

    a?, a* and a+ are regular right part grammar (RRPG) syntax; they
    correspond to [a], {a} and a{a} respectively in EBNF.

    Second question: why are recursive descent parsers I've come across
    always using globals and construct code and/or the parse tree as side >effects, rather than, say, return the parse tree to the caller?

    I have no explanation for the case of constructing the parse tree;
    that's trivial to do by returning it.

    Concerning code generation, constructing it as a string is inefficient
    with the conventional string representation (array of chars), because concatenation takes time proportional to the length of the string
    (likewise, if you generate machine code or virtual machine code as an
    array of bytes). You can now add an unconventional string
    representation (e.g., a rope), but that still consumes memory; or you
    just use side effects for outputting the code.

    The disadvantage of the latter is that once you have output the code,
    you cannot take it back (or at least not easily), while with strings
    you can arrange the code for the nodes in a different than processing
    order, have the code for a node several times etc. without introducing additional complications in your compiler.

    Is this something to do with /synthesized attributes/ that [Holub]
    talks about? Where the recursive descent parser routines return values
    to the caller

    Synthesized attributes are trivial to pass along in recursive-descent
    parsers by returning them. You can also trivially do L-attributed
    grammars by passing attributes inherited from left of a nonterminal as parameters to a rule.

    Then I ran into trouble when I tried to make the parser return
    the parse tree, rather than construct it with side effects.
    Granted, some of my problems are from using a programming language,
    ML, that I'm not familiar with

    I would expect that returning the parse tree is easier in ML than
    using side effects (especially if you are unfamiliar with ML, because literature about ML probably does not explain side effects early on).

    If you found the literature you have read up to now to be not
    satisfactory, maybe Wirth's compiler book is more to your taste. IIRC
    he explains how to write a recursive-descent parser for PL/0 (a simple Pascal-like language).

    - anton
    --
    M. Anton Ertl
    [email protected]
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Mon Feb 28 15:40:37 2022
    I'll start with the most important response first.

    It should definitely be possible to return the generated syntax tree
    from a procedure implementing a parser rule (for a non-terminal) in a
    recursive descent compiler. In fact, it surprises me that you say
    that's not the normal case. It's *almost* the definition of recursive
    descent. One calls the function implementing some non-terminal and
    the non-terminal consumes its input, building internally a
    representation of that input (as a single structure, i.e. an AST node)
    and returns that structure to the caller, so that the caller can do
    the same. Each call makes a new rooted tree (that identifies what
    kind of tree it is, i.e. what non-terminal it represents) and it is
    used by the caller as part of the larger tree containing it. Now, I
    supposed it *could* make sense not to make that the return value for
    the function implementing the non-terminal, but it certainly is a
    natural value to return. More on that after I talk about tables.

    It is possible to implement LL (as was as LR and precedence and other)
    parsers as parser tables. Recursive descent doesn't generally do so.
    It implements that parser as code, specifically nested function calls,
    with some (generally, but not necessarily) small amount of logic to
    determine which variant (alternative) of a non-terminal is being
    processed. This logic is what uses the result of the first (and
    follow) functions to determine based upon the initial tokens which
    alternative matches the input. This is generally done without parsing
    tables. Using parsing tables generally suggests you are implementing
    a different parsing methodology than recursive descent, even if you
    are doing so only as a subroutine within a recursive descent parser.
    The parts not using the tables are the recursive descent part. The
    part using the table is an embedded parsing technology within an
    otherwise recursive descent framework.

    Ok, back to the question of what a recursive descent parser's routine implementing the parsing of a non-terminal should be. In an
    Object-oriented world, it would be reasonable for it to return an
    "object" representing the non-terminal, with methods/member functions
    that get you the sub-parts, evaluate attributes, etc. Another
    reasonable return value, comes from the Rust world, it could be a
    "result type", where it is either the non-terminal as discussed
    before, or an error. (Result<NonTerminal, Error>). However, in the
    non-error case, you generally want each parsing rule to return the
    root non-terminal of the AST is parsed.

    -- ****************************************************************************** Chris Clark email: [email protected] Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alain Ketterlin@21:1/5 to Anton Ertl on Mon Feb 28 21:52:54 2022
    [email protected] (Anton Ertl) writes:

    Johann 'Myrkraverk' Oskarsson <[email protected]> writes:
    My first question is: how are production recursive descent parsers con- >>structed? From these two books, and other things I have read in the
    past, recursive descent is so simple it's mostly just brushed off with
    an example or two, but there seems to be a whole lot more to it. For >>instance, [Appel] mentions a parse table but mostly does not explain how
    to construct one.

    I don't know what such a table would be good for. My parser generator
    Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
    generates recursive-descent parsers, certainly does not need one.

    A parse table for recursive descent/LL(1) simply tells which rule to
    expand given the non-terminal currently parsed and the incoming token
    type. The building of this table is described in, e.g., the dragon book
    (for sure in the 2nd edition).

    The way Gray works is: It computes first and follow sets for all nodes
    in the grammar; follow sets are only used for computing first

    You probably mean the reverse (first is used to compute follow).

    sets and for reporting conflicts, first sets are used for deciding how
    to parse. It generates code as follows:

    grammar code
    terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end a b code(a) code(b)
    a|b if sym in first(a) then code(a); else code(b); end
    a? if sym in first(a) do code(a); end
    a* while sym in first(a) do code(a); end
    a+ do code(a) while sym in first(a)
    nonterm call(function(nonterm))
    action action

    The parsing table embodies all that logic. Parsing works by maintaining
    a stack of symbols, and the table indicates what actions need to be
    taken (either replace one non-terminal with a rhs on the stack, or match
    one input symbol). The algorithm works with any table, and the whole
    thing (algorithm+table) is certainly much shorter than generating code
    for each and every rule. EBNF constructions like ?/*/+/... have to be translated first into "regular" rules, but that's trivial pre-processing
    (along with left factoring/recursivity elimination).

    Building a parse tree in this context is less obvious, because
    non-terminal symbols seem to "disappear" during the parsing. The
    solution (at least the one I came up with) consists in keeping special
    entries on the stack signaling the end of a rhs. Then, special actions
    (i.e., "reduce") can be taken when finding these special entries during parsing, much in the same way as it is done in bottom-up (e.g., LR)
    parsing: the "semantic values" of the various symbols can be pushed onto
    a separate stack, and you end up with something really similar to
    LR-parsing.

    -- Alain.
    [I think the point is that if you have a table, you have a table driven
    LL(1) parser rather than recursive descent in which the grammar is embedded
    in the code. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann 'Myrkraverk' Oskarsson@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 01:40:33 2022
    Johann 'Myrkraverk' Oskarsson <[email protected]> writes:
    My first question is: how are production recursive descent parsers con-
    structed? From these two books, and other things I have read in the
    past, recursive descent is so simple it's mostly just brushed off with
    an example or two, but there seems to be a whole lot more to it. For
    instance, [Appel] mentions a parse table but mostly does not explain how >>> to construct one. ...

    I think, at this point, I'd like to demonstrate what [Appel] was talking
    about. I can't expect every reader of comp.compilers to have the book
    at hand.

    The demonstration starts with grammar 3.12, page 47.

    Z -> d Y -> X -> Y
    Z -> X Y Z Y -> c X -> a

    where Y goes to epsilon in the centre of the first line.
    Then [Appel] shows how FIRST and FOLLOW sets are computed
    given this grammar, ending with

    nullable FIRST FOLLOW
    X yes a c a c d
    Y yes c a c d
    Z no a c d

    where previous steps are also shown[1]. Then figure 3.14 shows a
    predictive parsing table, presumably constructed using the FIRST
    and FOLLOW sets.

    a c d
    +-----------------------------------------+
    X | X -> a X -> Y X -> Y |
    | Y -> Y |
    | |
    Y | Y -> Y -> Y -> |
    | Y -> c |
    | |
    Z | Z -> X Y Z Z -> X Y Z Z -> d |
    | Z -> X Y Z |
    +-----------------------------------------+

    Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
    recursive descent parser; or LL(1) grammar per the book.

    I am not clear on how to construct such a table from the description
    in the book, and would not presume to be able to do so by hand, using
    a more complex grammar.

    I'll give the paragraph where [Appel] explains /recursive descent/.

    Consider a recursive-descent parser. The parsing function for some nonterminal X has a clause for each X-production; it must choose one
    of these clauses based on the next token T of the input. If we can
    choose the right production for each (X, T), then we can write the recursive-descent parser. All the information we need can be encoded
    as a two-dimensional table of productions, indexed by nonterminals X
    and terminals T. This is called a /predictive parsing/ table.

    So I was wondering if people who create /recursive descent/ parsers for production compilers use such a table as an intermediary step, or not.
    Of course, as I mentioned before, [Holub] mentions Q grammars as an
    alternative intermediary step, but only in an off hand manner.

    This lack of clear directions led me to ask the first question: how are production compilers using /recursive descent/ constructed? What are
    the steps involved? What do people use, or not use?

    I don't know Forth, so I haven't done much more than take a cursory look
    at the Gray manual; but it certainly wouldn't surprise me if people used
    a skeleton generator, from [E]BNF grammar, and then filled in the
    details later.

    I tried to make the question open-ended, because I am pretty sure people
    use methods not described in either book. I have not looked this up in
    the dragon book, because currently my copy is in a different country,
    and the shipping cost might rival the price of a new copy. Given a
    chapter and verse in that book, I can probably read it with a little bit
    of help, and I am totally up to get myself further material if it's
    useful.


    [1] I think overall the steps are better explained in [Holub] but I did
    not try to following along with this particular grammar, using [Holub]'s method.

    P.S. I'll try to reply to the other threads later on.

    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk

    [Appel is showing how you turn a normal grammar into an LL(1) grammar. The algorithm on page 49 tells you how to make the first and follow sets, then
    the informal algorithm in the second para on page 51 turns it into the parse table. Then he says gee, the table has more then one rule in some of the
    cells so he spends a few more pages turning it into an LL(1) grammar on
    pages 53-54. In my experience most people writing RD parsers do this informally, e.g., to deal with left recursion in expressions you know
    that you eat the first token, then peek at the next token and eat it
    if you're in the correct rule, otherwise return and let the caller
    deal with it. If this seems like more trouble than it's worth, now
    you know why we use parser generators. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 08:01:24 2022
    Johann 'Myrkraverk' Oskarsson <[email protected]d> writes:
    So I was wondering if people who create /recursive descent/ parsers for >production compilers use such a table as an intermediary step, or not.

    Gray doesn't. For finding out about conflicts, it just looks at the
    nodes involved. E.g., for a|b, it produces a warning if both a and b
    can produce epsilon, if the first sets of a and b are not disjoint,
    and if a|b may produce epsilon and its first set and its follow set
    are not disjoint.

    I don't know Forth, so I haven't done much more than take a cursory look
    at the Gray manual; but it certainly wouldn't surprise me if people used
    a skeleton generator, from [E]BNF grammar, and then filled in the
    details later.

    Gray is not a skeleton generator. It takes an RRPG (regular right
    part grammar, similar to EBNF) with actions, and produces an
    executable parser (i.e., not as source code). You then run that
    parser, you do not fill in the details later; you provide all the
    details in the source code before you generate the parser.

    BTW, an updated version of Wirth's book that I mentioned is available
    online for free: <https://people.inf.ethz.ch/wirth/CompilerConstruction/CompilerConstruction1.pdf>.
    The sections of interest to you seem to be Section 4.1 and 7.2,
    probably also 7.3.

    [ [...] In my experience most people writing RD parsers do this
    informally, e.g., to deal with left recursion in expressions you know
    that you eat the first token, then peek at the next token and eat it
    if you're in the correct rule, otherwise return and let the caller
    deal with it. If this seems like more trouble than it's worth, now
    you know why we use parser generators. -John]

    Or you start with EBNF or RRPG where you can express repetition
    without recursion, so left recursion is rare.

    I happen to look at Exercise 7.2 at this moment, which suggests
    another way to deal with difficulties:

    |7.2. Where is the Oberon syntax not LL(1), that is, where is a
    |lookahead of more than one symbol necessary? Change the syntax in such
    |a way that it satisfies the LL(1) property.

    - anton
    --
    M. Anton Ertl
    [email protected]
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alain Ketterlin@21:1/5 to Johann 'Myrkraverk' Oskarsson on Tue Mar 1 23:13:25 2022
    Johann 'Myrkraverk' Oskarsson <[email protected]d> writes:

    Johann 'Myrkraverk' Oskarsson <[email protected]> writes:
    My first question is: how are production recursive descent parsers con- >>>> structed? From these two books, and other things I have read in the
    past, recursive descent is so simple it's mostly just brushed off with >>>> an example or two, but there seems to be a whole lot more to it. For
    instance, [Appel] mentions a parse table but mostly does not explain how >>>> to construct one. ...

    [...]
    The demonstration starts with grammar 3.12, page 47.

    Z -> d Y -> X -> Y
    Z -> X Y Z Y -> c X -> a

    where Y goes to epsilon in the centre of the first line.
    Then [Appel] shows how FIRST and FOLLOW sets are computed
    given this grammar, ending with

    nullable FIRST FOLLOW
    X yes a c a c d
    Y yes c a c d
    Z no a c d

    where previous steps are also shown[1]. Then figure 3.14 shows a
    predictive parsing table, presumably constructed using the FIRST
    and FOLLOW sets.

    a c d
    +-----------------------------------------+
    X | X -> a X -> Y X -> Y |
    | Y -> Y |
    | |
    Y | Y -> Y -> Y -> |
    | Y -> c |
    | |
    Z | Z -> X Y Z Z -> X Y Z Z -> d |
    | Z -> X Y Z |
    +-----------------------------------------+

    Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
    recursive descent parser; or LL(1) grammar per the book.

    I am not clear on how to construct such a table from the description
    in the book, and would not presume to be able to do so by hand, using
    a more complex grammar.

    That's a 2-dimensional table M[N,t] where N is a non-terminal, and t
    is a terminal symbol. It is build by the following algorithm:

    for each rule L -> R1...Rk
    for each terminal f in FIRST (R1...Rk)
    add the rule to M[L,f]
    if R1...Rk is nullable
    for each terminal f in FOLLOW (L)
    add the rule to M[L,f]

    In essence: FIRST lets you know when to expand a non-terminal, and
    FOLLOW lets you know when to apply a rule that matches nothing in
    the input.

    The grammar is LL(1) is all entries of the table contain at most one
    rule. That is not the case in your example.

    Parsing goes as follows: maintain a stack on which you initially push
    the start symbol. Then repeat:

    if top(stack) is a terminal symbol
    if it matches the next input
    pop it from the stack and advance input
    else
    signal error
    else
    if M[top(stack),next_input] is empty
    signal error
    else // M[...] is a rule L -> R1...Rk
    pop an element from the stack and push Rk, ..., R1

    (note that you push the rhs in reverse order, such as the leftmost
    symbol appears on the top of the stack).

    The parsing succeeds if you end up with an empty stack and an empty
    input.

    I don't have my copy of the dragon book with me, but from pearson's web
    site I find this is the object of chapter 4 section 4.

    So I was wondering if people who create /recursive descent/ parsers for production compilers use such a table as an intermediary step, or not.
    [...]

    As John says, if your language fits the requirements (that is, it has a
    LL(1) grammar, or LR(1)), you are better off using a generator. It turns
    out that "big" compiler parsers are often written by hand, because 1)
    grammars are often ambiguous (sometimes at the lexical level), 2)
    ambiguities are easier to resolve "by hand" because you can take more
    context into account, and 3) hand-written parsers give you more control
    on error recovery. (Also because memory is now big enough to hold a
    parse tree.)

    For C/C++ parsers, add the fact that there are several grammatical
    layers (e.g., OpenMP pragmas), and/or you need details that automated
    parsing usually omits (e.g., for diagnostics and tools). That's why both
    gcc and clang parsers are hand-written.

    The official Python implementation used an LL(1) (auto-generated) parser
    until recently, where they switched to a PEG-parser (I have to admit I
    don't understand their motivation for doing this -- as exposed in a
    document called PEP 617). It is still auto-generated as far as I
    understand.

    -- Alain.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to Johann 'Myrkraverk' Oskarsson on Wed Mar 2 05:52:46 2022
    On 3/1/22 2:40 AM, Johann 'Myrkraverk' Oskarsson wrote:

    where previous steps are also shown[1].  Then figure 3.14 shows a
    predictive parsing table, presumably constructed using the FIRST
    and FOLLOW sets.

              a            c             d
        +-----------------------------------------+
      X |  X -> a        X -> Y        X -> Y     |
        |  Y -> Y                                 |

    Just a typo correction: In (X,a) the second alternative should read
    X -> Y
    not Y -> Y.

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Robinson@21:1/5 to [email protected] on Sat Mar 5 14:46:08 2022
    On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson <[email protected]> asked in Newsgroup comp.compilers:

    | I have some questions about the construction of
    | recursive descent parsers. The reason for my questions
    | is that I know production compilers /use/ recursive
    | descent. My first question is: how are production
    | recursive descent parsers constructed?

    If you want to see another example of this, since I added features to
    it, is the XDPascal compiler, available from https://github.com/electric-socket/xdpw or https://XDpascal.com. Now,
    what I'm going to do is explain why and how this is done.

    Pascal syntax requires that, after a procedure or function signature,
    and any constants, types or variables are declared, it must have a
    block. This is a BEGIN, zero or more statements, each separated by a
    semicolon, then an optional semicolon, then END and a semicolon if it's
    a procedure or function, a period if it's the end of a program. A syntax diagram may help here:

    Block: ^______________>|
    | V
    BEGIN---+-->Statement---+-->V--------->^----> END --->
    ^_______ ; <____V |__> ;_>__|

    In the code example below, any identifier ending
    in TOK is the token for that symbol or keyword,
    e.g. DOTOK for DO, DOTTOK for period, etc.

    So at the end of the compile, the compiler would
    have a piece of code similar to this:

    block;
    nexttoken;
    if thistoken<>DOTTOK then error('Period expected');
    finishcompiling;

    Procedure "block" would have something similar to
    the following:

    repeat
    statement;
    until thistoken = ENDTOK or END_OF_FILE ;


    Procedure "Statement" would look like this:

    nextToken;
    Case thisToken of
    Identifier : Identstmt;
    FORTOK: ForStmt
    REPEATTOK: RepeatStmt;
    BEGINTOK: Block; // [1]
    ...
    WHILETOK: WhileStmt
    ELSE
    Error('Syntax error');


    FOR statement:
    ----> FOR --> := --> ident ------> expression -->|
    |<-----------------------------------------------V
    V--V------- TO ----->----> expression --> DO --|
    V____> DOWNTO -->| |
    |
    |<---------------------------------------------V
    V--------- statement ----------------->

    Now, procedure ForStmt might be coded like this:

    nexttoken;
    if thistok <> BECOMESTOK then
    error(' := expected');
    nexttoken;
    if thistok <> identTok then
    error(' Identifier expected ');
    ...

    and so on until:

    nexttoken;
    if thistok <> DOTOK then
    error(' DO expected');
    nexttoken;
    statement;

    Now, if you look, at // [1], the case selector for
    the token BEGIN calls procedure block again, while
    it itself was called by block earlier. And, after the
    DO token in FOR, statement calls itself.

    If you notice, recursive descent works because each
    statement in the program is processed until a "trigger"
    token exits that state back to where it was called.
    On return, syntax checking continues with the next
    token after the one that caused a block or statement
    procedure to be called.

    I hope this example helps you to understand recursive
    descent parsing.


    XDPascal.com: The (new) home of the XDPascal Compiler.
    ----
    "The lessons of history teach us - if they teach us anything - that no
    one learns the lessons that history teaches us."
    Paul Robinson <[email protected]> / <[email protected]>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)