9.3 pcfg-reach

The purpose of this program is debugging removing unreachable productions from a PCFG and simplifying it.

To remove unreachable productions from a PCFG, use the following command line format:

qsmm-example-pcfg-reach [ --dump-prob ] PCFG_FILE [ RETAIN_NONT_1 ... RETAIN_NONT_n ]

where PCFG_FILE is the name of a file with the PCFG (see PCFG Format), and RETAIN_NONT_1, ..., RETAIN_NONT_n are nonterminal symbols to retain in a resulting PCFG even if they are unreachable. The option --dump-prob turns on printing probabilities of productions in a resulting PCFG.

Example

The file reachable.pcfg contains the following PCFG:

$ cat reachable.pcfg

  S: "s" A    [0.3]
   | A A      [0.7]
  ;

  A: "b"      [0.6]
   | "a" "b"  [0.4]
  ;

  B: "c"      [0.5]
   | "b" C    [0.5]
  ;

  C: "e"      [0.8]
   | "d" B    [0.2]
  ;

This command prints a PCFG with unreachable productions removed:

$ qsmm-example-pcfg-reach --dump-prob reachable.pcfg

  S: "s" A  [0.3]
   | A A    [0.7]
  ;

  A: "b"      [0.6]
   | "a" "b"  [0.4]
  ;

To remove unreachable productions from a PCFG and simplify it, use the command line format:

qsmm-example-pcfg-reach --simplify PCFG_FILE [ RETAIN_NONT_1 ... RETAIN_NONT_n ]

where PCFG_FILE is the name of a file with the PCFG, and RETAIN_NONT_1, ..., RETAIN_NONT_n are nonterminal symbols to retain in a resulting PCFG even if they are subject to removal because of PCFG simplification.

Example

The file simplify.pcfg contains the following PCFG:

$ cat simplify.pcfg

  S: "a"
   | B
   | C "s" C "s" C
   | F F
  ;

  B: "b"
   | "b" "b"
  ;

  C: "c" ;
  D: ;
  E: "e" "e" "e" ;

  F: D "d" D
   | "d"
   | "f" "f" E "f" "f"
  ;

This command prints a simplified PCFG with unreachable productions removed:

$ qsmm-example-pcfg-reach --simplify simplify.pcfg

  S: "a"
   | "b"
   | "b" "b"
   | "c" "s" "c" "s" "c"
   | F F
  ;

  F: "d"
   | "f" "f" "e" "e" "e" "f" "f"
  ;