7.7 rege-vit

This tool generates a factored context-free grammar for processing entire parse units up to specified length (in terminal symbols) according to a regular expression grammar. Nonterminal symbols of the context-free grammar contain length specifications after the character ‘/’. The length specifications determine exact lengths of terminal symbol sequences consumed by the nonterminal symbols.

Example

This example uses a regular expression grammar from rege-markup-cfg.

$ rege-markup-cfg sample-rege-markup-cfg.rg | rege-vit --cfg-markup 5 -

  S/2 #l-S: A/1 B/1 #r0
  ;

  S/3 #l-S: A/1 B/2 #r0
          | A/2 B/1 #r0
  ;

  S/4 #l-S: A/1 B/3 #r0
          | A/2 B/2 #r0
          | A/3 B/1 #r0
  ;

  S/5 #l-S: A/1 B/4 #r0
          | A/2 B/3 #r0
          | A/3 B/2 #r0
          | A/4 B/1 #r0
  ;

  A/1 #l-A: _A_1T: . #r0
  ;

  A/2 #l-A: _A_2T2: . . #r1
  ;

  A/3 #l-A: _A_3T3: . . . #r2
  ;

  A/4 #l-A: _A_4T4: . . . . #r3
  ;

  B/1 #l-B: _B_1T: . #r0
  ;

  B/2 #l-B: _B_2T2: . . #r1
  ;

  B/3 #l-B: _B_3T3: . . . #r2
  ;

  B/4 #l-B: _B_4T4: . . . . #r3
  ;

Synopsis

For usual cases, use the command line format:

rege-vit [ -o OUTPUT_FACTORED_CFG_FILE ] --cfg-markup  \
         MAX_PARSE_UNIT_LENGTH INPUT_BU_GRAMMAR_FILE

where:

MAX_PARSE_UNIT_LENGTH

Maximum supported length of a parse unit counted in terminal symbols.

INPUT_BU_GRAMMAR_FILE

The name of a file containing an input bottom-up template grammar (see Bottom-Up Template Grammar). The filename ‘-’ means stdin. As required by the option --cfg-markup, the bottom-up template grammar should contain source PCFG markup inserted by the rege-markup-cfg tool (see rege-markup-cfg).

Command-Line Options

The rege-vit tool supports the following command line options:

--cfg-markup

Fetch source PCFG markup from an input regular expression grammar and generate a factored context-free grammar containing corresponding markup. On omitting this option, an input regular expression grammar must not contain source PCFG markup.

--dump-first

Indicate expected sets of terminal and nonterminal symbols in comments in an intermediate grammar dumped by the option --out-gram-interm=FILE.

-o, --out-pcfg=FILE

Dump a factored context-free grammar to a specified file. By default, dump the grammar to stdout.

--out-gram-interm=FILE

Dump to a file an intermediate grammar converted from an input regular expression grammar. The filename ‘-’ means stdout.

--template

Replace all terminal symbols and terminal symbol classes in a factored context-free grammar with the terminal symbol placeholder ‘.’.