pcfg-generate-seq ¶This tool generates a pseudo-random terminal symbol sequence according to a specified PCFG. The tool expands the start nonterminal symbol of the PCFG to produce a parse unit. If sequence limit not reached, the tool repeats expanding the start nonterminal symbol.
$ cat sample.pcfg
S: "a" [0.5]
| "b" "b" [0.33]
| "c" "c" "c" D_1 [0.17]
;
D_1: "delta"
| "delta" D_1
;
$ pcfg-generate-seq -i1 -n40 -o sample.seq sample.pcfg $ cat sample.seq a c c c delta delta delta a a c c c delta a a a a a a b b a c c c delta delta delta a b b b b a a c c c delta a
In the usual case, use the command line format:
pcfg-generate-seq -iRANDOM_SEED -nLENGTH -o OUTPUT_SEQ_FILE \
[ --separate-parse-units ] INPUT_PCFG_FILE
where INPUT_PCFG_FILE is the name of a file containing a PCFG (see PCFG Format); the filename ‘-’ means stdin.
The pcfg-generate-seq tool supports the following command line options:
The maximum length of an output terminal symbol sequence, in characters. That length includes newline characters and delimiters between terminal symbols. No limit by default.
The maximum length of an output terminal symbol sequence, in terminal symbols. No limit by default.
The maximum length of an output terminal symbol sequence, in parse units. No limit by default.
Output a generated terminal symbol sequence to a specified file. By default, output the sequence to stdout.
If possible, limit the length of every line in an output terminal symbol sequence by a specified number of characters. Special value 0 means no right margin. The default value is 70.
A seed for the pseudo-random number generator. The default value is 0.
If there is a right margin, separate generated parse units with empty lines. If there is no right margin, start every parse unit on a new line. The option -R, --margin-right=INT sets or removes the right margin. By default, do not separate generated parse units in a special way.
Separate terminal symbols using a specified string. By default, separate terminal symbols by spaces.
The mode of truncation of a generated character sequence if its length exceeds maximum length specified by the options -l INT, -n INT, and -N INT:
Ensure that the generated character sequence ends with a complete parse unit.
Permit the truncation of the last parse unit in the generated character sequence but ensure that the sequence ends with a complete terminal symbol name.
Permit the truncation of the last terminal symbol name in the generated character sequence.
On omitting the option argument, the tool uses --truncate=parse-unit. On omitting the option, the tool ensures that a generated character sequence ends with a complete parse unit.