Replacing
Sequence Operations Language (SOL) borrows another important concept from
regular expressions - replacing event patterns in sequences through the
replace
operation.
replace
provides powerful data wrangling capabilities for transforming and
coarsening data tailored to answering specific business questions.
The simplest replace
operation is removing unwanted events by replacing them
with null
:
match split UnwantedEvent(event1 | event2)
// Replace matched events with null to remove
replace UnwantedEvent with null
// Combine sub-sequences into original sequences
combine
In general, replace
takes one previously defined tag as an argument, followed
by a with
clause, which specifies a new sub-sequence to substitute in the
place of events labelled with that tag, and finally an optional dims
clause,
which defines how to pass event dimensions.
match A(a) >> B(^a)*
if duration(A, B) < 1d
// Replace the whole sequence with just event tagged "A" followed by events tagged "B"
replace SEQ with A >> B
// Insert a new event "churn" after events tagged "B"
replace B with B >> (churn)
// Combine all events in tag "B" into one event called "non_a" and labelled with tag "C"
replace B with C(non_a) dims
C.event_num = length(B),
C.duration = B[-1].ts - B[0].ts
// Duplicate event tagged "A" and label the 2nd copy with tag "C"
replace A with A >> C(@A)
The substitute sub-sequence in the with
clause mostly follows the same syntax
as match patterns in the match
operation with a few exceptions:
- can’t use event quantifiers
- can’t use event include lists and exclude lists
- can reference existing tags as
A
(insert all tagged events and the tag) and(@A)
(insert tagged events only)
If the tag, which is asked to be replaced, does not exist in a given sequence,
replace
doesn’t do anything with that sequence.
You can find more common examples for using replace
in the
SOL recipes.