Posted on Jun 30, 2017
We are working hard developing the version 1.0.0 release of Rosie Pattern Language.
Our development branch is roughly 2x faster than the already-fast v0.99k. We
optimized the JSON output generation, which was the low hanging fruit. There
are more opportunities to find speed, including that the current Rosie CLI is
single-threaded. (Users of librosie
can launch multiple OS or green threads.)
In the graph below, Rosie and Grok are parsing log file entries. The X axis shows how many log entries were read, and the Y axis shows the time taken to parse all of them. So, lower times are better.
Both Grok and Rosie were configured to generate JSON output, and both used
pattern definitions that are included with their respective distributions. Grok
is fastest when running in jruby, at least for large inputs, and that
configuration is shown as jgrok
in the graph.
For a log file with around 550k entries, Grok needed almost 30 seconds – a task that the development version of Rosie performed in around 6 seconds.
Note: Grok, whether running in native Ruby or jruby, Grok threw an exception when it encountered an invalid UTF-8 byte sequence in the middle of my 2-million-entry test file. The test file is a real log file from a running cloud application, and clearly it is possible for strange (non-)characters to appear in real logs. I could have removed the offtending byte sequence, but (1) the logfile is an intact file that has not been edited in any way; (2) it was no problem for Rosie, which understands UTF-8 but is not hostage to it, and (3) the other data points are sufficient to show the performance for this relatively unscientific comparison.
Follow us on Twitter for announcements. We expect v1.0.0 to be released later this summer.