Posted on Dec 18, 2017
Rosie v1.0.0 is in alpha release now. Our intention is to release a beta version in early 2018, with a frozen feature set, API, CLI, and REPL interfaces. The beta will be a release candidate for the proper version 1.0.0.
Over the course of the alpha releases, we have been adding features and making minor changes. Three important features will be added before the beta: thread safety; unicode predicates; and enhanced customization.
Thread safety
Note (edit): Thread safety arrived in librosie
in v1.0.0-alpha-7. See
src/librosie/C/mt.c
for an example of a multi-threaded Rosie program.
The librosie
in v1.0.0-alpha-6 (and earlier) is not thread-safe. Some changes
to the API and to the compilation of the library will make it safe for
multi-threading. An enforced limitation is that a matching engine may be used
by only one thread at a time. In other words, for parallel execution, you must
create a matching engine for each thread. Fortunately, matching engines are
fairly lightweight.
Unicode predicates
Rosie v1.0.0 has a powerful feature for defining character sets, but the alpha releases (thus far) lack support for Unicode predicates like “characters in the Hebrew script”; or “one of the nearly 1000 characters labeled as Numeric” in Unicode 10.0.0; or the intersection, union, or difference of these.
An upcoming alpha release will add predicates for Unicode scripts and properties.
Rosie supports only the UTF-8 character encoding today. No requests for other encodings have been received, but we are interested to know if the need exists.
Note (edit): [2018-01-08] We posted today on the design of character sets in RPL.
Customization, and the Rosie Prelude
Rosie is often used from the command line, and a common request is for Rosie
to load an initialization file (e.g. ~/.rosierc
) so that frequently used
options can be specified there.
Another command-line customization is the ability to assign colors (for printing matches in colorized text) to patterns. This feature is also forthcoming.
And, in some installations, a user of the CLI, REPL, or API may want to have
some patterns pre-loaded automatically. This customization is likely to take the form
of import
statements listed in an initialization file.
Finally, some users may wish to customize the Rosie Prelude1,
i.e. the set of patterns and functions/macros that form the base set on top of
which other patterns are built. Rosie patterns like .
(dot, matching any
character), $
(dollar, matching end of input), and ^
(caret, matching start
of input) are among the identifiers present in the Prelude.
The Rosie .
matches any UTF-8 encoded character. Some users may wish to
redefine .
to match any ASCII character instead, as part of an ASCII-only
configuration.
Similarly, the Rosie boundary pattern (bound to the tilde symbol, ~
) is
defined to match a variety of word and other boundaries. To customize the
boundary for all imported packages and all loaded RPL files, the definition must
be modified in the Prelude.
Note: A redefinition of ~
in an ordinary (non-Prelude) RPL file will affect
all of the patterns defined in that file, but not any outside of that file.
Since pattern libraries are compiled in the environment of the Prelude, modifications to the Prelude will affect all patterns subsequently loaded (imported, compiled), and this can break RPL packages which depend in some way on the standard Rosie Prelude. The ability to use a custom Prelude is intended for specialized use cases, and also to allow the Rosie user community to evolve Rosie independently of official releases.
[1] The name prelude was borrowed from the Haskell language, where (to my understanding) the Haskell Prelude serves a similar function:
Prelude is a module that contains a small set of standard definitions and is included automatically into all Haskell modules.
Rosie was created for scalable pattern matching
Rosie was created to address pattern matching in the large: big data; great variety of data formats; many patterns; and many developers.
Thread safety is a therefore requirement. Also, the world of data and programmers and projects is much larger than the United States, and includes so many languages beyond English. Thus, Unicode character predicates are a requirement. Finally, the CLI and REPL are not only an easy way to get started with Rosie – they are, for many users, the primary way of running Rosie. So, the ability to customize the user experience is also a requirement. These requirements will be addressed in the near-term roadmap.
Pattern matching in the large is #modernpatternmatching.
Please contribute discussion and questions to the Rosie subreddit.
Follow us on Twitter for announcements. We expect v1.0.0-beta to be released in the first months of 2018.