Rosie the Riveter icon

Rosie Pattern Language

Modern text pattern matching to replace regex

Version 1.0 is out on GitLab

Last updated on 18 Jul 2018

Rosie v1.0 is out on! Please visit us there, open issues to report bugs or request information, and leave a star. 😄

Oh, and in case you are not familiar with GitLab, it works with the git command line tools just the same as GitHub does. Use the same commands like git clone, git pull, and others to get Rosie. One difference is that you find releases by looking at the list of tags for a GitLab project.

It took many months to go from alpha to beta to the v1.0 release because we wanted to stabilize the RPL language as much as possible. We released v1.0 expecting that we can now enhance the implementation and add new patterns to the library without changing the RPL language specification. That means you can write individual patterns and libraries of them, knowing they will continue to work well into the future, in new versions of Rosie.

And we are already working on enhancements to the Rosie implementation!

Here’s an example. Today, Rosie compiles all patterns on the fly, as needed. This includes patterns loaded from libraries like net.rpl. This takes time and memory.

For data mining, a long running process means that the Rosie pattern compilation time is insignificant, because that compilation is done once, when the process starts. But the memory use can be high, because the Rosie compiler retains lots of debugging information in memory.

Planned enhancement: The production compiler should not retain information needed for debugging Rosie itself.

We recently did an experiment, in which we factored the matching code apart from the Rosie compiler, CLI, REPL, and other parts. Jamie wrote new C code to replace functionality that was implemented in Lua. The result is a small library, rpeg.a, that can read a compiled RPL pattern from a file, and apply it to each line of an input file.

That library is less than 50KB in size on OS X, and includes code to generate JSON output, debugging output, and “line output” (like Unix grep).

A small demonstration program is statically linked with rpeg.a. When configured to simply print matching lines, like Unix grep does, the resulting program, match, is only 32KB. While it’s only a demonstration, it works like Unix grep, but with a pattern loaded from a file:

$ cat resolv.conf 
# This is an example file, hand-generated for testing rosie.
# Last update: Wed Jun 28 16:58:22 EDT 2017
nameserver fde9:4789:96dd:03bd::1

$ ./match data/net.ipv4.rplx resolv.conf 

What does this mean?

It bodes well for enabling separate compilation in the future. To make the demonstration program work, we accessed a debugging capability in Rosie to save a compiled pattern to disk. (Please ask if you want instructions for compiling Rosie with this capability and how to use it!) In the example above, the file net.ipv4.rplx contains a compiled version of the Rosie pattern findall:net.ipv4.

This experiment shows that the matching runtime can be quite small. Where Rosie is perhaps 400KB on disk and can consume 20-30MB or more of memory today, the match.c demonstration compiles to 32KB on disk and consumes around 1.5MB of memory (RSS peak).

Planned enhancement: Rosie should save compiled patterns automatically, and recompile them only when needed.

But wait, there’s more!

There are still more opportunities for optimization. In particular, the compiled RPL patterns are larger than they need to be. We will have more to say on this topic in future posts.

Contribute patterns, code, or questions

We are always looking for contributors to the Rosie project, whether to implement enhancements, write patterns to be shared with other users, or author blog posts showing Rosie’s usefulness.


Edit to contact information, August 15, 2023.

We welcome feedback and contributions. Please open issues (or merge requests) on GitLab, or get in touch by email.

You can find my contact information, including Mastodon and LinkedIn coordinates, on my personal blog. The mailing list has fallen out of use since we mostly use Slack, but perhaps it will be revived.