Regex are hard to debug when they fail to match what you think they should match (and vice versa). That’s why there are so many websites offering regex debugging tools. Rosie expressions can likewise be hard to debug at times, and I think for the same reason: Pattern matchers (parsers, generally) are algorithms with a very large number of states, essentially all of which influence the next step to be taken. There are many ways that a human being’s mental model of the algorithm’s state can be wrong.

Plus, debugging any program (not just pattern matchers) can be hard!

Rosie provides a number of features that aid debugging, including:

  • Symbols: Patterns have names, and names enable symbolic debugging
  • Compositon: Patterns are composed from other patterns, enabling divide and conquer, where you separately debug the pieces that make up your pattern
  • Built-in tests: You can optionally declare tests in your comments, and Rosie will execute them, helping to find errors early (and to understand the author’s intention of what was supposed to match in the first place)
  • Read-eval-print loop: Rosie’s repl let’s you build up patterns interactively, and test against real data
  • Trace command: It’s easy to peek inside the Rosie pattern engine to see exactly how your pattern is being applied to your input

The read-eval-print loop of Rosie version 1.0 will be the subject of a future blog post. (You can read about the existing repl in, e.g. this blog post).

In this post, we will take a very quick look at the trace command, which works just like Rosie’s match command, except it outputs a trace of the matching process in the form of a tree (reflecting how patterns are composed from other patterns).

Here is an example in which Rosie’s date.any pattern fails to match “12 Agosto 2017”, and how we might discover that “Agosto” is not one of the month names defined in the date package. (The rest of this post will explore this transcript in detail.)

The screen capture shows the unix date command being piped into rosie, where the rosie command is to match 'date.any'.  It matches.  Next, the command 'date -R' is piped into rosie, and it also matches.  Then, the string '12 Agosto 2017' is echoed into rosie, and it fails to match the pattern 'date.any'. To find out why the match failed, the same command is executed again, with the rosie command 'trace' instead of 'match'.  The output is a tree that shows each component of 'date.any' and how each alternative matched (partially), so the user can see exactly what part of the pattern failed, and where it failed on the input string '12 Agosto 2017'.  It fails when the pattern 'month_name' fails to match 'Agosto'.

Using the match command

The Rosie CLI can be used the way grep is used, to quickly find information in files. And like most unix tools, Rosie can read from standard input (which requires a single dash in place of the filename argument).

So we can pipe the output of the unix date command into Rosie as shown below. On OS X and Linux, date -R will print the date in the RFC 5322 format, long known as the RFC 2822 Internet Messaging format.

Naturally, we can also echo sample input and pipe that into Rosie:

The screen capture shows the unix date command being piped into rosie, where the rosie command is to match 'date.any'.  It matches.  Next, the command 'date -R' is piped into rosie, and it also matches.  Then, the string '12 Agosto 2017' is echoed into rosie, and it fails to match the pattern 'date.any'.

Note: We are only parsing the date in these examples, not the entire timestamp, in order to keep the examples short.

The first 3 commands above succeeded, which we can see because the date was printed by Rosie (and printed in blue, the default color for dates and times in Rosie). If we had added -o json to the commands, we would have seen that “Sat Aug 12” matched date.us; “Sat, 12 Aug 2017” matched date.rfc2822, and “12 August 2017” also matched date.rfc2822.

The last command in the transcript above failed (there was no output). The input was “12 Agosto 2017”. Let’s see how exactly it failed by using the trace command.

Using the trace command

Simply replace match with trace in the rosie invocation. The output, given our sample input, will look like the screen capture below. Here, I’ve cut out the nested part (replaced with “…”) so that we can see the top level of the tree.

This screen capture shows the same output of the 'trace' command from the other screen captures in this blog post.  The focus here is on the part of the trace that steps through the definition of 'date.any'.  The trace shows that the definition is 6 alternatives, and that each alternative failed.  They are: 'date.us', 'date.eur', 'date.dashed', 'date.slashed', 'date.rfc2822', and 'date.rfc3339'.

The root of the tree, at the top in the thin green box, is the expression that is the definition of date.any. You can see that date.any is defined as an ordered choice between 6 different date formats.

The first level of nodes under the root show the evaluation of each date format in turn, starting with date.us and ending, at the bottom, with date.rfc3339. Each alternative concludes with “No match”, which we expect, because we know the pattern date.any failed to match this input.

One alternative, date.rfc2822, has a subtree shown. The root of that subtree is the definition of date.rfc2822, which begins with the expression {day_name ~ ","}?. The question mark at the end indicates, as in regex, that the expression must match 0 or 1 time. And the expression is a sequence of 3 other expressions: day_name, ~ (the Rosie “word boundary” expression), and "," (a literal comma).

We will explain Rosie’s ability to automatically insert the boundary pattern in a future blog post. (If you’re curious, see this documentation for Rosie v0.99k.)

Back to our trace output… Let’s look at the details of how the pattern date.rfc2822 was processed against the inpout “12 Agosto 2017”. Here is the full transcript:

This screen capture shows the same output of the 'trace' command from the other screen captures in this blog post.  The focus here is on the part of the trace that steps through the definition of 'date.rfc2822', which is one of the alternative patterns in 'date.any'.  It fails when the pattern 'month_name' fails to match 'Agosto'.

The first sub-expression matches 0 characters, because although there is no day_name in the input, the first sub-expression is optional. The next sub-expression is the word boundary, ~, which in this case matches 0 characters because we are at the start of the input (character position 1). The next sub-expression, day, matches 2 characters (“12”) and the next one, a boundary, matches the space after “12”.

Next is the sub-expression month_name, and the engine is now at position 4 of the input, looking at “Agosto 2017”. As we see, there is “No match” (next to the red arrow in the screen capture).

The remaining two parts of the sequence, ~ and year, are not attempted, because the sequence has already failed. Popping back to the first level of child nodes, Rosie goes on to try to match date.rfc3339, which also fails, and so date.any (the root expression) fails.

Even more detail is available

Rosie’s trace command is smart enough to choose how much detail to show.[1] The output contains the most relevant matching steps taken along the path in the tree that was the most productive; that is, the path that consumed the most input. This is a good heuristic (though not perfect) when a parser has to guess which of the various alternatives was the one that user wanted to succeed.

If you want to see a complete trace, in which every possible branch is explained in detail, try the same command with the --verbose flag added. The tree output will be longer, but hopefully still readable once you get a feel for what you are looking at.

Rosie was created for scalable pattern matching

Scalability goals for Rosie include big data, large (complex) patterns, and many developers. The ability to trace a match, at varying levels of detail, is a key debugging feature. Much like reading a stack trace, you find a lot of information. But again, like reading a stack trace, you quickly get used to understanding them.

Tracing, without having to paste your patterns and data into some random website, is how #modernpatternmatching is done.


[1] The output shown in this post is from a working version of Rosie in branch tranche-3 of the Rosie Pattern Language repository on gitlab. This prototype is evolving by steps to become release v1.0.0.


Follow us on Twitter for announcements. We expect v1.0.0 to be released late this summer.