It’s easy to make a mistake when entering a regular expression on the command line. And, sometimes, we make a hard-to-spot error in a regular expression that is part of a program. Usually, those errors are not caught at compile time — but of course we want to catch as many errors as we can at compile time.
When a regex appears as a string in a Java program, as it might in other languages, the Java compiler doesn’t understand that the string is a regex, and can’t check for mistakes that will throw run-time exceptions later.
One analysis found 56 regex syntax errors in an examination of five open source
Java projects. These projects would compile and run, but eventually, when
the offending regex was used, it would throw a
PatternSyntaxException or an
The researchers that did the analysis proposed a solution for Java using annotations, but generalizing this approach is fraught. First, it’s easy to find and analyze string constants, but much harder to analyze strings that have been constructed via string operations, like concatenation and substitution. Second, an approach that works for Java may not port easily to other languages, especially the popular dynamic languages.
Rosie Pattern Language (RPL) provides a way to avoid run-time errors by
pre-compiling and automatically testing patterns, thus finding more errors at
compile time. Here’s an example of the
rosie test command:
Pre-compiling RPL patterns
As shown in the image above, RPL definitions can be (optionally) pre-compiled to find errors before deployment. Note that mistakes related to numbered captures are not possible in RPL, which names all captures. Syntax and other errors, such as missing dependencies are caught by the RPL compiler. (A linter is on the drawing board to detect subtle errors, and possibly also to evaluate tests for “pattern coverage”.)
The image above shows the
rosie test command in action, compiling a directory
of RPL files (and running embedded tests — more on that in a minute).
You can, of course, write RPL expressions as string literals directly into large programs or small scripts. But for production use, you can move those expressions into their own file. Rosie can then pre-compile your patterns as part of your build, when other code is compiled, catching errors at compile time, not later at run time.
In the image above, an error was deliberately injected into the file
for illustration. The error is in the statement
foo = bar on line 64, which
fails to compile because no pattern named
bar is defined.
Catching errors early embodies the “shift left” concept, as popularized by the recent DevOps trend. And, Rosie goes beyond pre-compilation to include a lightweight test capability as well.
Rosie’s built-in lightweight test facility
RPL files can contain comments (which start with a double dash
comments are just as valuable for patterns as they are for code. Comments can
explain the intention behind the pattern, its intended use. Specially formatted
comments in RPL are read by the
rosie test command and executed as pattern tests.
Such comments begin with the word
test, followed by the name of the pattern.
The next word indicates the type of test. There are currently three types:
contains. The first two should be obvious, and they
are applied in turn to each quoted input string that follows. (The
test requires an additional argument, which is the name of a sub-pattern, and is
used to ensure that the expected sub-pattern appears in a match.)
In the package
date (in the file
date.rpl), the pattern called
the way that dates are commonly written in the United States. The comment below
triggers three tests when
rosie test is run:
-- test us accepts "April 1, 1900", "Jan 23 2017", "Apr 8"
Tests like this serve two purposes. They contribute to the documentation of the pattern by giving explicit examples of input that matches (is accepted) and input that should not match (is rejected). And because the tests are executable, they function as regression tests, ensuring that the patterns continue to work as intended despite changes that might be made to the RPL code in that file, or to any dependency.
The figure below shows the part of the file
date.us are defined. The comments following each pattern are executed by the
rosie test command.
Rosie was created for scalable pattern matching
Scalability goals for Rosie include big data, large (complex) patterns, and many developers. Pre-compilation and built-in executable tests are steps towards those goals. They are steps towards bringing the tools and techniques we use in general programming to the patterns we write to process raw data.
Pre-compilation and built-in testing are key parts of #modernpatternmatching.
 Spishak, Dietl, and Ernst, “A type system for regular expressions”. In Proceedings of the 14th Workshop on Formal Techniques for Java-like Programs, ECOOP 2012
rosie test command is implemented in a working prototype in branch
the Rosie Pattern Language repository on gitlab. This
prototype is evolving by steps to become release v1.0.0.
Follow us on Twitter for announcements. We expect v1.0.0 to be released late this summer.