Skip to main content

Matching & Tags

The most important concept in sequence analytics is matching. It allows you to find complex event patterns based on the order of events and to label them, so you can work with them later in your query. Matching is similar to using regular expressions, but works on sequences of events instead of strings of characters and has a more user-friendly and readable syntax.

In Sequence Operations Language (SOL), matching is done using the match operation.

match gif

match takes a match pattern as an argument. The simplest match pattern consists of a single event name, for example:

// Match the first event called “purchase”
match purchase

To match a pair of consecutive events you need to put a >> connector between them:

// Match the first “product_page” directly followed by "purchase"
match product_page >> purchase

You can use “quantifiers” from regular expressions to match a range of events. They have to immediately follow the event they apply to:

// Match 1 or more “product_page” events followed by "purchase"
match product_page+ >> purchase

// Available quantifiers:
// * - match zero or more times
// + - match one or more times
// ? - match zero or one time
// {n} - match exactly n times
// {n,} - match at least n times
// {,n} - match at most n times
// {n,m} - match from n to m times

To match a pair of events with any number of any events in between them, you can use a “wildcard” *:

// Match "search” followed by "purchase" with any number of any events in between
match search >> * >> purchase

A key concept in SOL is a tag. It is a label attached to one or more contiguous events of interest, which is created by tagging events while executing match. Tags are used to access events of interest and their dimensions. They have names and are visualized by lines above events. Tags are automatically created when matching events by name as in the examples above - they are assigned the same names as underlying events. Sometimes it is important to explicitly name a tag. To do this you need to enclose event names in parentheses and to add tag's name immediately before the opening bracket:

// Label consecutive “product_page” events with a "ViewProducts" tag and "purchase" event with a "Buy" tag
match ViewProducts(product_page)+ >> Buy(purchase)

Tags can be used to specify conditions on the match patterns in the if clause of the match operation as described in the “Conditions & Filtering” section.

match is used to create tags, which are used by subsequent SOL operations to access, transform and remove underlying events of interest.

Tags are ephemeral: each new match operation completely removes tags from the previous match.

It is important to understand that matching doesn’t filter out sequences without matches, only assigns tags to the matched ones.

You can specify included events (via |s) and excluded events (via ^ and ,s) of event names in match patterns:

// Match "search_page" eventually followed by a "purchase_movie" or "rent_movie" event without having "home_page" or "favorites_page" events in between
match search_page >>
(^home_page, favorites_page)* >>
Conversion(purchase_movie | rent_movie)

In this case event names have to be enclosed in ().

You can match to the beginning or to the end of sequences:

// Match sequences, which start with a "login" event and end with a "logout" one
match start >> FirstEvent(login) >> * >> LastEvent(logout) >> end

After each match, in addition to user-defined tags SOL also creates several implicit tags (not case sensitive), which provide convenient access to all events around the match:

  • PREFIX - sub-sequence of all events before the first matched event
  • MATCHED - sub-sequence of all matched events
  • SUFFIX - sub-sequence of all events after the last matched event.

There is one more implicit tag SEQ, which is always available to SOL operations (even without prior matching) and encompasses all events in a sequence.