7.1.7 Restrictions for Bottom-Up Parsing

When supplying a top-down template grammar to the adaptive bottom-up parser abu-parser, it performs basic consistency checks for the grammar:

  1. All leaves of right-hand sides of non-start nonterminal symbols must be output terminal symbols.

    Example

    The following top-down template grammar is valid for parsing by abu-parser:

    :n: = [ "__ER" "__START" "_S" ] ;
    
    START: C "__START" ;
    
    
    
    C: C0 ( "_S" < "__START" >
          | "__ER" [^ :n: ] #pt-__er_consume < "__START" >
          )
    ;
    
    
    
    C0: "%a" ( "%b" #pt-_S_1T2 #pn-S-0 < "_S" >
             | "%c" "%d" #pt-_S_2T3 #pn-S-1 < "_S" >
             | #pt-__er [^ "%b" "%c" :n: ]~ #pn-__ER-0 < "__ER" >
             )
      | "%b" "%c" "%d" #pt-_S_3T3 #pn-S-2 < "_S" >
      | [^ "%a" "%b" :n: ]~ #pn-__ER-0 < "__ER" >
    ;
    

    The right-hand side of the nonterminal symbol C0 has the output terminal symbol leaves < "_S" > and < "__ER" >. The right-hand side of the nonterminal symbol C has the output terminal symbol leaves < "__START" >.

    The right-hand side of the nonterminal symbol START does not have an output terminal symbol leaf—that nonterminal symbol is the start one (i.e. the first nonterminal symbol defined in the grammar).

  2. Every input terminal symbol class in right-hand sides of non-start nonterminal symbols must conform to one condition among the following conditions:
    • The input terminal symbol class is empty.
    • The input terminal symbol class contains only virtual terminal symbols and/or the end-of-stream marker $$. Virtual terminal symbols are terminal symbols that have a prefix specified by the option --term-prefix=STR. The default prefix is ‘%’.
    • The input terminal symbol class contains only virtual nonterminal symbols. They are terminal symbols that do not have a prefix specified by the option --term-prefix=STR. The default prefix is ‘%’.

    Example 1

    Provided the terminal symbol prefix is ‘%’, the input terminal symbol class

    [ "%a" "%b" "%c" "_A" "_B" $$ ]
    

    is invalid for parsing by abu-parser.

    Valid input terminal symbol classes can be:

    [ "%a" "%b" "%c" $$ ]
    [ "_A" "_B" ]
    

    Example 2

    Provided the terminal symbol prefix is ‘%’, the input terminal symbol class

    [^ "_C" "_D" ]
    

    is valid for parsing by abu-parser if a set of terminal symbols defined by a grammar and input terminal symbol sequence is

    $$ "%e" "%f" "_C" "_D"
    

    However, that input terminal symbol class is invalid for parsing by abu-parser if a set of terminal symbols defined by a grammar and input terminal symbol sequence is

    $$ "%e" "%f" "_C" "_D" "_E"
    

    because, in this case, the input terminal symbol class contains the virtual terminal symbols ‘%e’ and ‘%f’ (along with $$) and the virtual nonterminal symbol ‘_E’.

  3. Every put-back terminal symbol class in right-hand sides of non-start nonterminal symbols must be empty or contain virtual terminal symbols and/or $$.

    Example

    Provided the terminal symbol prefix is ‘%’, the put-back terminal symbol class

    [ "%a" "%b" "%c" "_A" "_B" $$ ]~
    

    is invalid for parsing by abu-parser.

    A valid put-back terminal symbol class can be

    [ "%a" "%b" "%c" $$ ]~
    
  4. All virtual terminal symbol sequences led out by right-hand sides of non-start nonterminal symbols must terminate with #pt-… markers. See Ending a Terminal Production, for more information on #pt-… markers.

    Example

    Provided the terminal symbol prefix is ‘%’, the production

    C0: C1 ( "_A" "%e" "%e" #pt-_S_1T2 #pn-S-0 < "_S" >
           | "_B" "%f" "%f" #pn-S-1 < "_S" >
           | "_C" #pn-S-2 < "_S" >
           )
    ;
    

    is invalid for parsing by abu-parser because the virtual terminal symbol sequence "%f" "%f" does not end with a #pt-… marker.