Next: Residual Regex Grammar, Previous: Terminal Symbol Expansions, Up: Output Information [Contents][Index]

A learned PCFG is a PCFG based on an initial context-free grammar (see Initial Context-Free Grammar) generated for a template regular expression grammar. Productions in this PCFG have frequencies accumulated while parsing a training terminal symbol sequence. Use the following command-line format to dump a learned PCFG:

$ topdown --qp[=NONT1] ... --qp[=NONTn] [--op=FILE] \ [--fp=fq_min_prod] [--ft=fq_min_term] \ [--nlp=num_lower_prod] [--nlt=num_lower_term] \ [--nup=num_upper_prod] [--nut=num_upper_term] \ [--pp=prob_min_prod] [--pt=prob_min_term] \ [--fq-span=window] [--recurs=right] [--remove-putback] \ [--simplify] [--term1]REGEX_GRAM_FILESYM_SEQ_FILE

The following command-line options are applicable to dumping a learned PCFG:

`--fp=``fq_min_prod`A minimum frequency a production must have for including it in the learned PCFG. This option does not filter productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--fq-span=window|total`Event history span for accumulating production frequencies to include in the learned PCFG:

- ‘
`window`’ Event history window. See the description of

`--ww=`option in Parsing a Token Sequence.`INT`- ‘
`total`’ Entire event history.

The default value is “total”.

- ‘
`--ft=``fq_min_term`A minimum frequency a production must have for including it in the learned PCFG. This option only filters productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--nlp=``num_lower_prod`If possible, include in the learned PCFG at least a specified number of right-hand sides for every nonterminal symbol at the left-hand side. This option does not filter productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--nlt=``num_lower_term`If possible, include in the learned PCFG at least a specified number of right-hand sides for every nonterminal symbol at the left-hand side. This option only filters productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--nup=``num_upper_prod`Include in the learned PCFG at most a specified number of right-hand sides for every nonterminal symbol at the left-hand side. The parser retains the most probable right-hand sides. This option does not filter productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. No limit by default.`--nut=``num_upper_term`Include in the learned PCFG at most a specified number of right-hand sides for every nonterminal symbol at the left-hand side. The parser retains the most probable right-hand sides. This option only filters productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. No limit by default.`--op=``FILE`Write the learned PCFG to a

`FILE`. If`FILE`is ‘`-`’, write the PCFG to stdout. This option queries the learned PCFG.`--pp=``prob_min_prod`A minimum probability a production must have for including it in the learned PCFG. This option does not filter productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--pt=``prob_min_term`A minimum probability a production must have for including it in the learned PCFG. This option only filters productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). On passing the option`X`_`i`T`j``--simplify`, the parser filters productions before simplifying a PCFG. The default value is 0.`--qp[=``NONT`]Dump learned PCFG productions for a nonterminal symbol

`NONT`and auxiliary nonterminal symbols it uses to a file specified by the option`--op=`. The nonterminal symbol must belong to a set of nonterminal symbols of the template regular expression grammar. You can pass multiple options`FILE``--qp=`to dump productions for multiple nonterminal symbols. If the option`NONT``--op=`not supplied, dump queried productions to stdout. If`FILE``NONT`not supplied, dump the entire learned PCFG. This option queries the learned PCFG.`--remove-putback`Remove from the learned PCFG auxiliary nonterminal symbols for terminal symbol placeholder sequences processed in put-back mode. Those nonterminal symbols have the suffix ‘

`~`’ in the right-hand sides of productions:`_`

or`X`_`i`T~`_`

. See Put-back Terminal Symbols for more information. By default, do not remove nonterminal symbols for terminal symbol placeholder sequences processed in put-back mode.`X`_`i`T`j`~`--term1`For every nonterminal symbol at the left-hand side, retain most probable right-hand sides beginning with unique terminal symbols. This mode is only applicable to productions representing possible terminal symbol sequences for terminal symbol placeholder sequences (with nonterminal symbols

`_`

and`X`_`i`T`_`

at the left-hand side). By default, the right-hand sides of productions with a specific nonterminal symbol at the left-hand side can start with duplicate terminal symbols.`X`_`i`T`j`

See Initial Context-Free Grammar for the description of `--recurs=left|right` option.
See Iterative Determinization for the description of `--simplify` option.

The example of dumping a learned PCFG is below.
See Terminal Symbol Expansions for the content of `expan.rg` and `expan1.seq` files.

$ topdown -N10 --qp expan.rg expan1.seq S: A A [0.73446328] // 130 | _S_1T5 [0.26553672] // 47 ; // 177 A: _A_1T3 // 258 ; _A_1T3: "a" "b" "c" [0.68650794] // 173 0.77216188 | "c" "a" "b" [0.25000000] // 63 0.03027511 | "b" "c" "a" [0.06349206] // 16 0.00044992 ; // 252 _S_1T5: "a" "b" "c" "a" "b" [0.39130435] // 18 0.31862270 | "c" "a" "b" "c" "a" [0.30434783] // 14 0.09068896 | "b" "c" "a" "b" "c" [0.30434783] // 14 0.09068895 ; // 46

See pcfg-generate-seq for the PCFG format.
See Initial Context-Free Grammar for the format of generated nonterminal symbol names beginning with ‘`_`’.

A comment at the end of each right-hand side contains its frequency.
A comment after ‘`;`’ contains the sum of frequencies of all right-hand sides.

A fractional number after the frequency of a right-hand side representing a possible terminal symbol sequence for a terminal symbol placeholder sequence (with a nonterminal symbol `_`

or `X`_`i`T`_`

at the left-hand side) is the score of this right-hand side.
The parser gets rid of right-hand sides with less scores during iterative determinization.
`X`_`i`T`j`

Note:simplifying a learned PCFG by passing the option--simplifymight remove scores from right-hand sides of productions.