Posted on Nov 25, 2017
In 5 minutes, you’ll have Rosie installed in a local directory like /tmp
or
/home/whomever
. In 10 minutes, you’ll be using the standard pattern library
to extract from your own data a variety of common patterns like network
addresses, dates, times, and more.
In 15 minutes, you’ll be writing your own RPL patterns on the command line or at the REPL.
Install Rosie
(1) Download by visiting the Rosie repository and click Clone or download. Or:
git clone http://gitlab.com/rosie-pattern-language
(2) Build Rosie by running make
in the directory containing Rosie:
cd rosie-pattern-language
make
You can use Rosie now, by running the executable bin/rosie
:
rosie-pattern-language$ bin/rosie --version 1.0.0-alpha-6 rosie-pattern-language$
(3) Optionally, install Rosie by running make install
. The default
installation directory is /usr/local
, and the installation will consist of:
/usr/local/bin/rosie executable
/usr/local/lib/rosie directory with additional rosie files
/usr/local/lib/librosie.so shared library, e.g. for Python and other languages
If you want to call Rosie from Python at some point, you’ll need to copy
src/librosie/python/rosie.py
to wherever you keep your Python libraries. More
on this in a future post, but meanwhile there’s a
test program
that illustrates the basics.
Getting help
The Rosie CLI takes a command (like match
) and optional switches. One of the
commands is help
:
rosie-pattern-language$ bin/rosie help Usage: rosie [--version] [--verbose] [--rpl <rpl>] [-f <file>] [--libpath <libpath>] [-o <output>] [<command>] ... Rosie 1.0.0-alpha-6 Options: --version Print rosie version --verbose Output additional messages --rpl <rpl> Inline RPL statements -f <file>, --file <file> Load an RPL file --libpath <libpath> Directories to search for rpl modules -o <output>, --output <output> Output style, one of: none, subs, line, byte, json, matches, default, color Commands: help Print this help message config Print rosie configuration information list List patterns, packages, and macros grep In the style of Unix grep, match the pattern anywhere in each input line match Match the given RPL pattern against the input repl Start the read-eval-print loop for interactive pattern development and debugging test Execute pattern tests written within the target rpl file(s) expand Expand an rpl expression to see the input to the rpl compiler trace Match while tracing all steps (generates MUCH output) The RPL 'import' statement will search these directories in order (this is the libpath): /Users/jennings/Projects/rosie-pattern-language/rpl rosie-pattern-language$
Help for individual commands, like match
, is available with -h
, as in:
rosie-pattern-language$ bin/rosie match -h Usage: rosie match [-o <output>] [-w] [-a] [-F] [-h] <pattern> [<filename>] [<filename>] ... Match the given RPL pattern against the input Arguments: pattern RPL pattern filename Input filename (default: -) Options: -o <output>, Output style, one of: jsonpp, bool, line, subs, json, data, color, byte --output <output> -w, --wholefile Read the whole input file as single string -a, --all Output non-matching lines to stderr -F, --fixed-strings Interpret the pattern as a fixed string, not an RPL pattern -h, --help Show this help message and exit. rosie-pattern-language$
The RPL language reference is in the code repository at doc/rpl.md.
Match all the things!
There’s a useful pattern in the ‘all’ package called ‘things’ that matches a few dozen common items. Try it out with some sample data from the rosie test directory…
$ rosie match all.things test/logfile Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka[68878]): Service exited with abnormal code: 1 Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka): Service only ran for 8 seconds. Pushing respawn out by 2 seconds. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Endpoint at '/Applications/Meeting.app' is latest version (4732), skipping. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Next Update Check at 2016-04-09 02:22:03 +0000 $
A few things to notice:
- The CLI automatically executes
import all
upon seeing use of the patternall.things
. Files of RPL code must explicitly include theimport X
statement to use patterns from packageX
. - The output style is
color
, which is the default for thematch
command. The default output style for thegrep
command is to output every line that matches, like the Unixgrep
does. - Pattern names from the standard library are assigned default color and font styles. Soon these will be customizable.
The rosie list
command will show the patterns loaded, and what color, if any,
has been assigned. To see patterns in the network
packages, you have to tell
rosie to import that package:
rosie-pattern-language$ bin/rosie --rpl 'import net' list net.* Rosie 1.0.0-alpha-6 Name Cap? Type Color Source ------------------------------ ---- ---------- --------------- ------------------------------ $ pattern red (default) . pattern red (default) MAC Yes pattern underline;green ...attern-language/rpl/net.rpl MAC_cisco Yes pattern red (default) ...attern-language/rpl/net.rpl MAC_common Yes pattern red (default) ...attern-language/rpl/net.rpl MAC_windows Yes pattern red (default) ...attern-language/rpl/net.rpl ^ pattern red (default) any Yes pattern red (default) ...attern-language/rpl/net.rpl authority Yes pattern red (default) ...attern-language/rpl/net.rpl authpath Yes pattern red (default) ...attern-language/rpl/net.rpl ci macro email Yes pattern red (default) ...attern-language/rpl/net.rpl error function find macro findall macro first macro fqdn Yes pattern red ...attern-language/rpl/net.rpl fqdn_strict Yes pattern red (default) ...attern-language/rpl/net.rpl fqdn_strict_alias pattern red (default) ...attern-language/rpl/net.rpl halt pattern red (default) host Yes pattern red ...attern-language/rpl/net.rpl http_command Yes pattern red (default) ...attern-language/rpl/net.rpl http_command_name Yes pattern red (default) ...attern-language/rpl/net.rpl http_version Yes pattern red (default) ...attern-language/rpl/net.rpl ip Yes pattern red (default) ...attern-language/rpl/net.rpl ip_literal Yes pattern red (default) ...attern-language/rpl/net.rpl ipv4 Yes pattern red (default) ...attern-language/rpl/net.rpl ipv6 Yes pattern red;underline ...attern-language/rpl/net.rpl ipv6_mixed pattern red (default) ...attern-language/rpl/net.rpl keepto macro last macro message function name Yes pattern red (default) ...attern-language/rpl/net.rpl path Yes pattern green ...attern-language/rpl/net.rpl port Yes pattern red (default) ...attern-language/rpl/net.rpl registered_name Yes pattern red (default) ...attern-language/rpl/net.rpl scheme Yes pattern red (default) ...attern-language/rpl/net.rpl uri Yes pattern red (default) ...attern-language/rpl/net.rpl url Yes pattern red (default) ...attern-language/rpl/net.rpl userinfo Yes pattern red (default) ...attern-language/rpl/net.rpl ~ pattern red (default) 41/41 names shown rosie-pattern-language$
Another way to explore the RPL standard library is to examine the files in the
rpl
directory. In each file, you’ll find comments and test cases that show
what kinds of input each pattern is expected to accept and reject.
Remember to start at the beginning!
There are a small number of important differences between Rosie expressions (PEGs, generally) and regex. The one that trips up people who are most familiar with regex is that PEGs start matching at the first character of the input.
rosie-pattern-language$ bin/rosie -o line match '"brown"' test/quick.txt brown fox in field wants to sleep brown fox in brush wants to sleep rosie-pattern-language$
To find all the lines in test/quick.txt
that contain the word “brown” anywhere
in the line, Rosie has a grep
command:
rosie-pattern-language$ bin/rosie grep '"brown"' test/quick.txt the quick brown the quick brown fox the quick brown fox jumped over the lazy (but adorable) dog brown fox in field wants to sleep brown fox in brush wants to sleep rosie-pattern-language$
Aside
In case you are curious about how Rosie's `grep` command is implemented, it is equivalent to applying the `findall` macro to the pattern argument and using the `match` command. (And specifying the `line` output format, which is the default for `grep`.)rosie-pattern-language$ bin/rosie -o line match 'findall:"brown"' test/quick.txt the quick brown the quick brown fox the quick brown fox jumped over the lazy (but adorable) dog brown fox in field wants to sleep brown fox in brush wants to sleep rosie-pattern-language$Peeling away one more layer, the `findall` macro is a repetitive form of the `find` macro, which takes a pattern argument and does essentially this: While not looking at the target pattern, consume a character and repeat. Finally, match the target pattern.rosie-pattern-language$ bin/rosie -o line match '{!"brown" .}* "brown"' test/quick.txt the quick brown the quick brown fox the quick brown fox jumped over the lazy (but adorable) dog brown fox in field wants to sleep brown fox in brush wants to sleep rosie-pattern-language$
Experiment at the CLI or the REPL
The Rosie CLI
Here are some suggestions for experimenting on your own data using the Rosie CLI.
- Use
match all.things
to see which items within your data are already recognized by Rosie. - Switch to
grep <pat>
to find specific items, e.g. usedate.any
ornet.any
for<pat>
. - Add
-o color
to your command to make the output easier to read. (The default for Rosiegrep
is to simply echo the matching lines, like Unixgrep
does.) - Compose a pattern on the command line. Don’t forget to enclose the pattern in single quotes to shield it from interpretation by the shell!
- Change the output option to
-o json
to see the structure in the matches. Pipe the output into a json pretty-printer to increase readability.
rosie-pattern-language$ bin/rosie grep ts.any test/logfile Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka[68878]): Service exited with abnormal code: 1 Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka): Service only ran for 8 seconds. Pushing respawn out by 2 seconds. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Endpoint at '/Applications/Meeting.app' is latest version (4732), skipping. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Next Update Check at 2016-04-09 02:22:03 +0000 rosie-pattern-language$ bin/rosie -o color grep 'ts.any id.any' test/logfile Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka[68878]): Service exited with abnormal code: 1 Apr 8 09:42:24 Js-MacBook-Pro com.apple.xpc.launchd[1] (homebrew.mxcl.kafka): Service only ran for 8 seconds. Pushing respawn out by 2 seconds. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Endpoint at '/Applications/Meeting.app' is latest version (4732), skipping. Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Next Update Check at 2016-04-09 02:22:03 +0000 rosie-pattern-language$ bin/rosie -o color grep 'ts.any id.any find:ts.any' test/logfile Apr 8 10:10:18 Js-MacBook-Pro.local MUpdate[69707]: Next Update Check at 2016-04-09 02:22:03 +0000 rosie-pattern-language$
The Read-Eval-Print Loop (REPL)
If you have developed in Lisp or Scheme, you have seen the power of the REPL as a development tool. Even Python supports a REPL these days to enable incremental code development. And so does Rosie.
There are three things you can enter at the Rosie>
REPL prompt:
- Commands, like
.match
,.trace
, and.load
; - RPL statements, e.g. definitions like
d = [:digit:]
; and - RPL identifiers (to see their definitions).
rosie-pattern-language$ bin/rosie repl Rosie 1.0.0-alpha-6 Rosie> d Repl: undefined identifier d Rosie> d = [:digit:] Rosie> d [:digit:] Rosie> .match d "4" {"data": "4", "e": 2, "s": 1, "type": "d"} Rosie> .match d+ "4321" {"data": "4321", "e": 5, "s": 1, "subs": [{"data": "4", "e": 2, "s": 1, "type": "d"}, {"data": "3", "e": 3, "s": 2, "type": "d"}, {"data": "2", "e": 4, "s": 3, "type": "d"}, {"data": "1", "e": 5, "s": 4, "type": "d"}], "type": "͙"} Rosie> import net Rosie> net <environment: 0x7fa00a7b54c0> Rosie> net.ipv4 {ipv4_component {{"." ipv4_component} {"." ipv4_component} {"." ipv4_component}}} Rosie> .match net.ipv4 "192.67.1.100" {"data": "192.67.1.100", "e": 13, "s": 1, "type": "net.ipv4"} Rosie> .match findall:net.ipv4 "Hello 192.67.1.100" {"data": "Hello 192.67.1.100", "e": 19, "s": 1, "subs": [{"data": "192.67.1.100", "e": 19, "s": 7, "type": "net.ipv4"}], "type": "*"} Rosie> Exiting rosie-pattern-language$
Note that sample data for the match
and trace
commands must be enclosed in
double quotes.
Using the REPL is a good way to develop RPL patterns. Because Rosie is happy to match just a portion of the input data (starting at the first character), you can begin with a pattern that matches just the first item in the data, and then extend the pattern incrementally to match more and more of the sample input.
Coming up: Rosie and Python
In a forthcoming post, I’ll show how to call Rosie from Python using rosie.py
,
which uses librosie.so
.
Discussion on reddit
A Rosie subreddit has been created for discussion of these posts and for questions about Rosie and RPL. See you there!
Follow us on Twitter for announcements about the RPL approach to #modernpatternmatching.