Introducing PetitParser2

I am happy to announce the availability of PetitParser2, a redesign of the original PetitParser developed by Jan Kurš. The version is already present in the latest Moose 6.1 development image.

The main goals of the redesign is performance and flexibility. The speedup ranges between 2-5x. This is achieved through the idea of a kind of a parser compiler that performs a static analysis of the parse tree to identify patterns that can be optimized and then changes the actual parser execution.

The original implementation of the compiler was actually implemented in PetitParser. However, the problem was that the resulting parser was not easy to understand. This was the main reason that prompted the redesign of PetitParser. In PetitParser2 the concept of a Parser is split in two:

  • Nodes: These essentially provides only the AST of the parser, but without actual logic inside. They mirror the previous PPParser classes.
  • Visitors. These provide the actual parsing behavior. Optimizations are implemented in this hierarchy.

For a typical external user, PetitParser2 has the same structure as PetitParser. The main difference is that asParser is changed to asPParser. For example, the following original snippet:

#letter asParser, #any asParser star

Becomes:

#letter asPParser, #any asPParser star

In this way, the two versions of PetitParser can co-exist in the same image to ease the transition. This transition is needed for parsers that rely on deeper constructs from the original PetitParser, such as concrete names of classes.

Another new feature is the availability of streams. The original PetitParser required the whole string to be present in memory to allow for indefinite backtracking. PetitParser2 offers stream utilities that apply parsers on a stream by keeping a buffer in memory for backtracking purposes.

Of course, the new version comes with all development tools that we got used with PetitParser. The nice thing is that the tools work even when we are using the optimized version of a parser.

For example, the picture below shows the PetitParser browser opened on the Smalltalk parser.

Pp2-default.png

The one addition is a larger run button which performs the optimized parsing. Triggering it for our case renders the result as shown below. The only difference is that in the above version, there were 907 individual parsers involved, while the optimized version only used 258 of them.

Pp2-optimized.png

To get a better feeling of how the magic happens, let’s look at a small sequence parser.

#any asPParser star

Inspecting the parser allows us to play with the sampler.

Pp2-simple-default.png

The unoptimized version of the parser triggers independent parsers for each of the letters, and then concatenates the results in another collection. However, triggering the optimized version only involves one parser that knows how to consume the entire sequence in one collection.

Pp2-optimized-default.png

All in all, PetitParser2 is an exciting development. Take a look at let us know what you think.

Posted by Tudor Girba at 8 November 2016, 5:07 pm with tags moose, tooling link
|