The APLR(1) Algorithm for Generating Compact LR(1) Parsers is Simpler and More Capable than IELR(1)

20 points by je

edwintorok

the Basis library ubiquitously uses OCaml’s boxed 64-bit integers, which puts an epic load on OCaml’s automatic memory management subsystem

Would be interesting to see how the performance would look like without using the boxed 64 bit integers.

A comparison with Menhir would also be interesting https://cambium.inria.fr/~fpottier/menhir/

je

Agreed, performance results using 63-bit integers would be interesting, and it was a really tempting option. Unfortunately that would require a big refactor that moves Hemlock bootstrapping in the wrong direction, and Hocc is already plenty fast enough to generate the Hemlock parser now that it implements APLR(1). This technical report is a compromise between sharing the algorithm in a useful format and getting back to Hemlock before I lose the plot.

Menhir and Hocc share only {LALR(1), PGM LR(1), canonical LR(1)} modes, so performance comparisons would be of limited use in the context of APLR(1). The one rough comparison I have on hand is for the OCaml grammar in canonical LR(1) mode, for which Menhir is ~25X faster than Hocc. But extrapolation from that comparison is of limited use; similar comparisons with Bison show Hocc much more favorably (only ~2.5X slower) because Bison's canonical LR(1) mode is not nearly as optimized as its LALR(1) and IELR(1) modes.