Version 1.0 is out on GitLab
Rosie v1.0 is out on gitlab.com! Please visit us there, open issues to report bugs or request information, and leave a star. 😄
Oh, and in case you are not familiar with GitLab, it works with the git
command line tools just the same as GitHub does. Use the same commands like
git clone
, git pull
, and others to get Rosie. One difference is that you
find releases by looking at
the list of tags for a
GitLab project.
It took many months to go from alpha to beta to the v1.0 release because we wanted to stabilize the RPL language as much as possible. We released v1.0 expecting that we can now enhance the implementation and add new patterns to the library without changing the RPL language specification. That means you can write individual patterns and libraries of them, knowing they will continue to work well into the future, in new versions of Rosie.
And we are already working on enhancements to the Rosie implementation!
Here’s an example. Today, Rosie compiles all patterns on the fly, as needed. This includes patterns loaded from libraries like net.rpl. This takes time and memory.
For data mining, a long running process means that the Rosie pattern compilation time is insignificant, because that compilation is done once, when the process starts. But the memory use can be high, because the Rosie compiler retains lots of debugging information in memory.
Planned enhancement: The production compiler should not retain information needed for debugging Rosie itself.
We recently did an experiment,
in which we factored the matching code apart from the Rosie compiler, CLI, REPL,
and other parts. Jamie wrote new C code to replace functionality that was
implemented in Lua. The result is a small library, rpeg.a
, that can read a
compiled RPL pattern from a file, and apply it to each line of an input file.
That library is less than 50KB in size on OS X, and includes code to generate JSON output, debugging output, and “line output” (like Unix grep).
A small demonstration program
is statically linked with rpeg.a
. When configured to simply print matching
lines, like Unix grep does, the resulting program, match
, is only 32KB. While
it’s only a demonstration, it works like Unix grep, but with a pattern loaded
from a file:
$ cat resolv.conf # # This is an example file, hand-generated for testing rosie. # Last update: Wed Jun 28 16:58:22 EDT 2017 # domain abc.aus.example.com search ibm.com mylocaldomain.myisp.net example.com nameserver 192.9.201.1 nameserver 192.9.201.2 nameserver fde9:4789:96dd:03bd::1 $ ./match data/net.ipv4.rplx resolv.conf nameserver 192.9.201.1 nameserver 192.9.201.2 $
What does this mean?
It bodes well for enabling separate compilation in the future. To make the
demonstration program
work, we accessed a debugging capability in Rosie to save a compiled pattern to
disk. (Please ask if you want instructions for compiling Rosie with this
capability and how to use it!) In the example above, the file
net.ipv4.rplx
contains a compiled version of the Rosie pattern findall:net.ipv4
.
This experiment shows that the matching runtime can be quite small. Where Rosie
is perhaps 400KB on disk and can consume 20-30MB or more of memory today, the
match.c
demonstration compiles to 32KB on disk and consumes around 1.5MB of
memory (RSS peak).
Planned enhancement: Rosie should save compiled patterns automatically, and recompile them only when needed.
But wait, there’s more!
There are still more opportunities for optimization. In particular, the compiled RPL patterns are larger than they need to be. We will have more to say on this topic in future posts.
Contribute patterns, code, or questions
We are always looking for contributors to the Rosie project, whether to implement enhancements, write patterns to be shared with other users, or author blog posts showing Rosie’s usefulness.
Feedback
Edit to contact information, August 15, 2023.
We welcome feedback and contributions. Please open issues (or merge requests) on GitLab, or get in touch by email.
You can find my contact information, including Mastodon and LinkedIn coordinates, on my personal blog. The mailing list https://groups.io/g/rosiepattern has fallen out of use since we mostly use Slack, but perhaps it will be revived.