About the Rosie Project
Mission Statement
The Rosie Pattern Language (RPL) is intended to replace regular expressions (regex) in situations where:
- many regex are used in an application, project, or organization; or
- some regex are used by many people, or by a few people over a long period of time; or
- regex are used in production systems, where the cost of run-time errors is high.
The advantages conferred by RPL in cases 1 and 2 derive from the RPL syntax and
expressive power. The syntax resembles a programming language, making
expressions easier to read and understand, and compatible with tools like
diff
.
RPL is based on Parsing Expression Grammars, which are more powerful than regex, obviating the need for “recursive regex” or “regex grammars”, both of which are ad hoc extensions and are not commonly supported in regex libraries.
All three situations listed above involve maintainability. RPL is easier to maintain than regex (which are considered “write only” by most developers). The Rosie project provides both a command line and REPL for development and debugging.
Also, RPL supports executable unit tests, making it possible to:
- have a suite of regression tests
- test expressions independent of the code that uses them
- compile and test expressions at build time, avoiding run-time errors in production
Rosie is like regex, but better
Rosie is a supercharged alternative to Regular Expressions (regex), matching patterns against any input text. Rosie ships with a standard library of patterns for matching timestamps, network addresses, email addresses, CSV files, JSON, and many more common syntactic forms.
Rosie/RPL scales in ways that regex do not
Scale to many patterns
RPL is readable and maintainable. It is structured like a programming language, so you can:
- Build complex patterns out of simple ones
- Write patterns that others can read and understand, using whitespace, comments, and built-in test expressions
- Create libraries of reusable patterns
- Import pattern libraries built by other people
Scale to big data
The Rosie Pattern Engine is small and fast.
- The entire run-time takes less than 400KB of disk space, and around 20MB of resident memory
- Basic patterns take linear time to match, whereas most modern regex engines can exponentially backtrack
- Recursive patterns are available when needed, to match recursive data like html or json
- Current speed (v1.1.0 release) is approximately 5x faster than the regex-based grok
Scale for productivity
Rosie is flexible and extensible.
- Unlike most regex tools, Rosie can generate structured (JSON) output, making its output easy to store or to consume by downstream processes
- An alternate compressed output format can be selected to reduce data transfer volume
- The CLI uses different colors for dates, times, network addresses, etc., so that you don’t have to read JSON when working interactively
- Plain text (not JSON) output can be selected when using rosie to replace grep
- Rosie is extensible with new patterns, libraries, color assignments, output formats, and macros
- Rosie has an interactive pattern development mode to help write and debug patterns
- Rosie supports UTF-8 natively, but input text can be in any encoding; Rosie can even handle invalid codepoints gracefully
Rosie is released under the MIT license
You can download Rosie from gitlab