7.8 rege-bottom-up

This tool converts a bottom-up template grammar to a top-down template grammar.

Example

$ cat sample-rege-bottom-up.rg

  S: A B
   | A C
  ;

  A: "a"
   | "a" "a"
  ;

  B: "b"
   | "b" "b"
  ;

  C: "c"
   | "c" "c"
  ;

$ rege-markup-cfg sample-rege-bottom-up.rg |
  rege-bottom-up --cfg-markup --er-vnont-scope=grammar -

  :c0: = [ "_S" ]
  ;
  :c1: = [ "_A" ]
  ;
  :c2: = [ "_B" "_C" ]
  ;
  :n: = [ "__ER" "__START" :c0: :c1: :c2: ]
  ;
  START: C "__START"
  ;

  C: C0 ( "_S" < "__START" >
        | "__ER" [^ :n: ] #pt-__er_consume < "__START" >
        )
  ;

  C0: C1 ( "_A" C2 ( "_B" #pn-S-0 < "_S" >
                   | "_C" #pn-S-1 < "_S" >
                   | "__ER" #pn-__ER-0 < "__ER" >
                   )
         | "__ER" #pn-__ER-0 < "__ER" >
         )
  ;

  C1: "%a" ( "%a" ( #pt-_A_2T2 [ "%b" "%c" ]~ #pn-A-1 < "_A" >
                  | #pt-__er [ $$ "%a" ]~ #pn-__ER-0 < "__ER" >
                  )
           | #pt-_A_1T [ "%b" "%c" ]~ #pn-A-0 < "_A" >
           | #pt-__er $$~ #pn-__ER-0 < "__ER" >
           )
    | [^ "%a" :n: ]~ #pn-__ER-0 < "__ER" >
  ;

  C2: "%b" ( "%b" ( #pt-_B_2T2 [ $$ "%a" ]~ #pn-B-1 < "_B" >
                  | #pt-__er [ "%b" "%c" ]~ #pn-__ER-0 < "__ER" >
                  )
           | #pt-_B_1T [ $$ "%a" ]~ #pn-B-0 < "_B" >
           | #pt-__er "%c"~ #pn-__ER-0 < "__ER" >
           )
    | "%c" ( "%c" ( #pt-_C_2T2 [ $$ "%a" ]~ #pn-C-1 < "_C" >
                  | #pt-__er [ "%b" "%c" ]~ #pn-__ER-0 < "__ER" >
                  )
           | #pt-_C_1T [ $$ "%a" ]~ #pn-C-0 < "_C" >
           | #pt-__er "%b"~ #pn-__ER-0 < "__ER" >
           )
    | [ $$ "%a" ]~ #pn-__ER-0 < "__ER" >
  ;

Synopsis

For usual cases, use the command line format:

rege-bottom-up [ -o OUTPUT_TD_GRAMMAR_FILE ] --cfg-markup        \
               [ --er-vnont-scope=grammar|nont ] [ --template ]  \
               INPUT_BU_GRAMMAR_FILE

where INPUT_BU_GRAMMAR_FILE is the name of a file containing an input bottom-up template grammar (see Bottom-Up Template Grammar); the filename ‘-’ means stdin.

As required by the option --cfg-markup, the bottom-up template grammar should contain source PCFG markup inserted by the rege-markup-cfg tool (see rege-markup-cfg). After inserting the source PCFG markup, the bottom-up grammar could have been converted to a factored context-free grammar by the rege-vit tool (see rege-vit).

Command-Line Options

The rege-bottom-up tool supports the following command line options:

--cfg-markup

Fetch source PCFG markup from a bottom-up (input) template grammar and insert #pt… and #pn… specifiers for postfix production markup into a top-down (output) template grammar. On omitting this option, a bottom-up template grammar must not contain source PCFG markup.

--dump-first

Indicate expected sets of terminal and nonterminal symbols in comments in an intermediate grammar dumped by the option --out-gram-interm=FILE.

--er-vnont-scope=grammar|nont|leaf

The scope of uniqueness of error virtual nonterminal symbols ‘__ER…’:

grammar

One error virtual nonterminal symbol ‘__ER’ per the entire top-down template grammar.

nont

One error virtual nonterminal symbol ‘__ERidx’ per one nonterminal symbol of a top-down template grammar, where idx is a nonterminal symbol index.

leaf

One error virtual nonterminal symbol ‘__ERidx’ per one leaf of a regular expression for a nonterminal symbol of a top-down template grammar, where idx is the ordinal number of an error virtual nonterminal symbol.

The scopes ‘grammar’ and ‘nont’ enable the use of recursive nonterminal symbols in a bottom-up (input) template grammar.

The default value is ‘leaf’.

--in-seq=FILE

Extend a set of known terminal symbols extracted from a bottom-up (input) template grammar with terminal symbols contained in a terminal symbol sequence in a specified file. A set of known terminal symbols affects the interpretation of terminal symbol placeholders ‘.’ and exclusive terminal symbol classes. As a result of this, a set of known terminal symbols may affect the number of branches generated for parsing a set of terminal symbol sequences beginning with overlapping terminal symbol classes in a bottom-up template grammar.

-o, --out-gram-final=FILE

Dump a top-down (output) template grammar to a specified file. By default, dump the grammar to stdout.

--out-gram-interm=FILE

Dump to a file an intermediate grammar converted from a bottom-up (input) template grammar. The filename ‘-’ means stdout.

--remove-la

Where possible, remove put-back (lookahead) terminal symbols from a top-down (output) template grammar by introducing branches selectable nondeterministically.

--template

Replace all virtual terminal symbols and virtual terminal symbol classes in a top-down (output) template grammar with a generic virtual terminal symbol class denoted by [ :t: ]. Get rid of error virtual nonterminal symbols ‘__ER…’.

--terse

Use condensed format to dump the productions of a top-down (output) template grammar. By default, dump the productions in indented format.