Previous: , Up: Output Information   [Contents][Index]

Predicted Token Sequence

Predicted token sequence is a terminal symbol sequence the parser predicts while parsing a training terminal symbol sequence. For every processed symbol from the training terminal symbol sequence, the parser predicts the next symbol in this sequence and appends a predicted symbol to the predicted terminal symbol sequence. For a currently processed symbol from the training terminal symbol sequence, the parser predicts the next terminal symbol in the following way:

The parser supports processing multiple template regular expression grammars to generate a predicted terminal symbol sequence using an ensemble method. The output parameter prob_epredict printed on passing the option --oo=LOG_FILE in terminal symbol prediction mode is the probability of a terminal symbol correctly predicted using the ensemble method.

Use the following command-line format to predict a terminal symbol sequence:

$ topdown [--os=COMBINED_SEQ_FILE] [--predict] --oo=LOG_FILE    \

The arguments REGEX_GRAM_FILE_1, ..., REGEX_GRAM_FILE_n specify template regular expression grammars for applying the ensemble method.

The following command-line options are applicable to predicting a terminal symbol sequence:


Write a combined sequence to a COMBINED_SEQ_FILE. If COMBINED_SEQ_FILE is ‘-’, write the sequence to stdout. The combined sequence consists of an actual (training) terminal symbol sequence and predicted terminal symbol sequence. See the description of COMBINED_SEQ_FILE argument of pcfg-predict-eval program in Invocation for the format of a combined sequence. This option turns on terminal symbol prediction mode.


Turn on terminal symbol prediction mode.


$ cat >predict_in.pcfg <<EOF
S: "a" "b" "c" "c" "b" "a" ;
$ pcfg-generate-seq -i1 -n100 -o predict_in.seq predict_in.pcfg
$ cat >predict_L5.rg <<EOF
S: . . . . . ;
$ cat >predict_L6.rg <<EOF
S: . . . . . . ;
$ cat >predict_L7.rg <<EOF
S: . . . . . . . ;
$ topdown -n2000 --os=predict_out.seq --oo=- predict_L[567].rg predict_in.seq
[0]: prob_gram 0.40871026, prob_term 0.16667084, prob_wpredict 0.76012888,
     prob_npredict 0.69000000, cycle_period 30
[1]: prob_gram 1.00000000, prob_term 1.00000000, prob_wpredict 0.99550000,
     prob_npredict 0.99550000, cycle_period 0
[2]: prob_gram 0.40890063, prob_term 0.16667590, prob_wpredict 0.81901080,
     prob_npredict 0.69750000, cycle_period 42
prob_epredict 0.90650000
$ head -n5 predict_out.seq
"a"!=? "b"!=? "c"!=? "c"!=? "b"!=? "a"!=? "a"!="b" "b"=="b" "c"=="c" "c"=="c"
"b"!="a" "a"=="a" "a"=="a" "b"=="b" "c"!="b" "c"!="a" "b"=="b" "a"=="a"
"a"=="a" "b"=="b" "c"!="a" "c"!="b" "b"!="c" "a"=="a" "a"=="a" "b"=="b"
"c"!="a" "c"=="c" "b"!="c" "a"!="c" "a"=="a" "b"!="a" "c"=="c" "c"=="c"
"b"=="b" "a"=="a" "a"!="b" "b"=="b" "c"=="c" "c"=="c" "b"!="a" "a"=="a"

See Examples for the use of predicting training terminal symbol sequences to evaluate the quality of learned regular expression grammars.

Previous: , Up: Output Information   [Contents][Index]