7.9 pcfg-generate-seq

This tool generates a pseudo-random terminal symbol sequence according to a specified PCFG. The tool expands the start nonterminal symbol of the PCFG to produce a parse unit. If sequence limit not reached, the tool repeats expanding the start nonterminal symbol.

Example

$ cat sample.pcfg

  S: "a"              [0.5]
   | "b" "b"          [0.33]
   | "c" "c" "c" D_1  [0.17]
  ;

  D_1: "delta"
     | "delta" D_1
  ;

$ pcfg-generate-seq -i1 -n40 -o sample.seq sample.pcfg
$ cat sample.seq

  a c c c delta delta delta a a c c c delta a a a a a a b b a c c c
  delta delta delta a b b b b a a c c c delta a

Synopsis

In the usual case, use the command line format:

pcfg-generate-seq -iRANDOM_SEED -nLENGTH -o OUTPUT_SEQ_FILE   \
                  [ --separate-parse-units ] INPUT_PCFG_FILE

where INPUT_PCFG_FILE is the name of a file containing a PCFG (see PCFG Format); the filename ‘-’ means stdin.

Command-Line Options

The pcfg-generate-seq tool supports the following command line options:

-l INT

The maximum length of an output terminal symbol sequence, in characters. That length includes newline characters and delimiters between terminal symbols. No limit by default.

-n INT

The maximum length of an output terminal symbol sequence, in terminal symbols. No limit by default.

-N INT

The maximum length of an output terminal symbol sequence, in parse units. No limit by default.

-o FILE

Output a generated terminal symbol sequence to a specified file. By default, output the sequence to stdout.

-R, --margin-right=INT

If possible, limit the length of every line in an output terminal symbol sequence by a specified number of characters. Special value 0 means no right margin. The default value is 70.

-i, --seed=INT

A seed for the pseudo-random number generator. The default value is 0.

--separate-parse-units

If there is a right margin, separate generated parse units with empty lines. If there is no right margin, start every parse unit on a new line. The option -R, --margin-right=INT sets or removes the right margin. By default, do not separate generated parse units in a special way.

--separator-term=STR

Separate terminal symbols using a specified string. By default, separate terminal symbols by spaces.

--truncate[=no|parse-unit|term]

The mode of truncation of a generated character sequence if its length exceeds maximum length specified by the options -l INT, -n INT, and -N INT:

no

Ensure that the generated character sequence ends with a complete parse unit.

parse-unit

Permit the truncation of the last parse unit in the generated character sequence but ensure that the sequence ends with a complete terminal symbol name.

term

Permit the truncation of the last terminal symbol name in the generated character sequence.

On omitting the option argument, the tool uses --truncate=parse-unit. On omitting the option, the tool ensures that a generated character sequence ends with a complete parse unit.