Is your data structured for humans, not for easy processing? Do you have data
CSC316 from which you want to extract the department (CSC) and
the course number (316)? But you have other data in geo-coordinates like
(35.7692755,-78.6786137). And then there are also lists of items usually
separated by commas, but sometimes by semi-colons. A single Rosie pattern can
destructure all of these and more.
The destructure package
Rosie community group on GitLab,
we have started a repository for working with “raw data”. The first
contribution is the
destructure package, which originated in the Pixiedust
Rosie project. One parent of
that project, Pixiedust, is a very
cool productivity tool for notebooks. Pixiedust makes it very easy to explore,
visualize, and manipulate data.
The addition of Rosie Pattern Language gives Pixiedust more capabilities, such
as automatically destructuring data – that is, recognizing when a column
contains entries like
(35.7692755,-78.6786137) and offering to break up such a
column into two new “synthetic” columns, one with the first coordinate and one
with the second. Similarly, when a column contains alphanumeric codes like
MAE214, Pixiedust+Rosie will offer to split those codes into its alpha and
numeric parts, each in their own column.
The Rosie package
has patterns for recognizing a variety of structured data. And the pattern
destructure.tryall does what it says: it tries all the various destructuring
Here’s an example of
destructure.tryall at work:
You can see by the color output that Rosie recognized all of the structured patterns in the input: the lists that use semi-colons, commas, and dashes between items; the lists in parentheses or braces; and the items that have alphanumeric structure. The latter are displayed with the alpha part in blue and the numeric part in cyan.
The color and libpath settings in my
~/.rosierc file tell Rosie where to find
the destructure library, and what colors to use. From my
you can see that I have cloned two community repositories,
-- ~/.rosierc libpath = "/usr/local/lib/rosie/rpl" libpath = "/Users/jennings/Projects/community/lang" libpath = "/Users/jennings/Projects/community/rawdata" colors="destructure.find.<search>=red:destructure.alpha=blue:destructure.num=cyan"
Contribute patterns, code, or questions
The Rosie Community group on gitlab.com was created for contributions of patterns and tools. There are just a few repositories there now, but we expect this group to grow.
Please post comments on the Rosie subreddit.
You can also: