On Saturday, 15 April 2017 15:48:51 UTC+9,
[email protected] wrote:
i have an EBNF grammar for the language;
Is it LL(1) though?
it is clear that the compiler will be much faster with less backtracking if it uses that approach.
An RD parser for an LL(1) grammar should not have to backtrack.
A real sticking point in my first cut compiler which was purely recursive descent is that error recovery is very hard to do in recursive descent,
I always found it much harder to do error recovery with table driven parsers.
and a huge amount of extra code exists to exit out of deeply nested function calls.
Then you are doing something wrong.
This was always a weak spot in the Wirth code examples which i learned from.
Wirth never taught error recovery. His examples were meant to illustrate the general process of recursive descent, not its error recovery. You may want to read other authors who actually cover error recovery. For example Dick Grune.
The key to good error recovery in an RD parser for an LL(1) grammar is to
(1) implement FIRST() and FOLLOW() sets as functions the parser can call
(2) don't nest match token/set tests, but code the tests sequentially
(3) if a test fails, skip to a token that is expected by the next test
You can see this approach all over my parser. For example, take a look at the function that implements array type declaration
https://github.com/m2sf/m2bsk/blob/master/src/imp/Parser.mod#L945
arrayType :=
ARRAY valueCount OF typeIdent
;
One might be inclined to code this in a nested fashion like so
(* ARRAY *)
lookahead := Lexer.consumeSym(lexer);
(* valueCount *)
IF matchSet(FIRST(Expression)) THEN
lookahead := expression(valueCount);
(* OF *)
IF matchToken(Token.Of) THEN
lookahead := Lexer.consumeSym(lexer);
(* typeIdent *)
IF matchSet(FIRST(Qualident)) THEN
lookahead := qualident(baseType)
Of course this works, but this makes error recovery very difficult because you need to back out of your nested IFs and then figure out how the parser could continue. Generally speaking, when you are at the end of the nested block, you have progressed too
far in the code to continue and you are forced to skip even further in the hope to find a sensible resync point.
The approach I outlined above in the three bullet points doesn't have this problem. You'd code the tests in sequence like so
(* ARRAY *)
lookahead := Lexer.consumeSym(lexer);
(* valueCount *)
IF matchSet(FIRST(Expression)) THEN
lookahead := expression(valueCount);
ELSE (* resync *)
lookahead := skipToMatchTokenOrSet(Token.Of, FIRST(TypeIdent))
END; (* IF *)
(* OF *)
IF matchToken(Token.Of) THEN
lookahead := Lexer.consumeSym(lexer);
ELSE (* resync *)
lookahead := skipToMatchSetOrSet(FIRST(Qualident), FOLLOW(ArrayType))
END (* IF *)
(* typeIdent *)
IF matchSet(FIRST(Qualident)) THEN
lookahead := qualident(baseType)
ELSE (* resync *)
lookahead := skipToMatchSet(FOLLOW(ArrayType))
END (* IF *)
With this approach, when an error occurs, the parser will skip to the token that will be expected by the following test and continue. It is then just a matter of fine tuning the resync points, that is to say which tokens to skip to.
For this you will need to write a number of resync functions for different cases. In my parser I started out with just two, skipToMatchToken and skipToMatchSet, but soon found that I needed to add more, like skiptToMatchTokenOrToken,
skipToMatchTokenOrSet and skipToMatchTokenOrTokenOrSet.
The benefit of this approach is that your error recovery is then easily tunable by simply changing the resync tokens. With nested tests, you can't adjust that easily. If you want to change your resync points in a nested IF structure, you can't just
change the resync tokens, you have to change the code as well.
If you read authors who handle error recovery, like Grune, you will learn that some approaches involve inserting tokens into the stream instead of skipping tokens when an error is detected, in other words, error recovery assumes that some token is
missing the source code.
You will find that the structure of sequential tests I describe above can easily accommodate such a strategy as well. The flatter your structure, the more flexible it is.
hth
benjamin
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)