Loading local files
You can use Motif in Local Mode on smaller datasets. In Local Mode, the data will be fully stored and processed on your machine and never sent to Motif (or any other) servers.
Note that the uncompressed dataset will need to fit within roughly a third of the memory available in browser tab (for reference, Google Chrome allows to use ~1GB of memory).
Navigate to the Local tab on the Datasets page, click + Upload:
Motif accepts data in CSV, TSV, Parquet, and JSON formats. If you can, we recommend you use Parquet, as it's the smallest, fastest to load and takes least resources during processing.
Data file format
Input data file should come in as rows of structured events. Each event consists of columns of event dimensions.
Every event can have any number of dimensions, but they must contains the following 3 (under any names):
- Actor field - used for grouping events into separate sequences. No need to break actor sequences into “sessions” - you will be able to do it during analysis in Motif. Examples: user id, order id, item id, etc.
- Time field - used for ordering events within each sequence. Examples: timestamp, date, step number, etc.
- Event Name field - used as event name in Motif UI. You will be able to re-map to a different field or change event names on the fly later in Motif.
Field names should only include letters, numbers, and underscores. If Actor or Event Name values contain certain rare characters, they will be automatically replaced with underscores during import. String dimensions are trimmed at 100 characters.
The data file should be structured as follows:
- The first row of the file should be column headers
- Each column is a dimension of an event
- Columns can be integers, floats, strings, timestamps, booleans, or arrays
- Columns containing objects will not be imported.
Note that fields can be left blank if they do not apply to a given event.
user_id, timestamp, event_name, product, price, cart_id, total_sale
1031, 04-26-2023 13:00:12, add_cart, jeans, 100.00,,
1031, 04-26-2023 13:01:35, checkout,,,83720,250.00
1031, 04-26-2023 13:05:28, enter_credit_card,,,,
File types
Parquet files
Parquet format is a compressed data format for storing and sharing highly compressed data. It is the preferred format for loading data into Motif, as it has a smaller storage footprint, allows faster processing, and supports a wide variety of data types. A number of tools will export data into Parquet, including DuckDB, standalone CLI tools and Pandas.
CSV and TSV files
CSV and TSV are text-based formats, and so can be very easy to build sample data with, or to export from existing data structures.
Some CSV exporters may occasionally produce invalid output. Ensure that your CSV file is exported using UTF-8 format, and remove non-standard characters like newlines and tabs.
JSON files
You can load your data in Motif as a JSON array or using NDJSON format. Each object in the array should be a flat object, with at least the required fields. Empty dimensions may be omitted.
As with CSV files, you can use only simple data types: integers, floats, strings, booleans and timestamps. Motif will drop arrays and attempt to flatten dictionaries. Timestamps should be encoded as unix epoch integers in seconds, milliseconds or microseconds, or as strings.
[
{
"user_id": 1031,
"timestamp": "04-26-2023 13:00:12",
"event_name": "add_cart",
"product": "jeans",
"price": 100.00,
},
{
"user_id": 1031,
"timestamp": "04-26-2023 13:01:35",
"event_name": "checkout",
"card_id": 83720,
"total_sale": 250.00,
},
{
"user_id": 1031,
"timestamp": "04-26-2023 13:05:28",
"event_name": "enter_credit_card",
}
...etc...
]
Data Import Limitations
Motif's data loader will flatten nested objects, joining fields with an underscore.
As a result, the object
{
"items": [1031, 1032, 1033],
"event_name": "enter_credit_card",
"meta": {
"count": 152,
"size": "large"
}
}
will be read as if it were:
{
"items": [1031, 1032, 1033],
"event_name": "enter_credit_card",
"meta_count": 152,
"meta_size": "large"
}
In addition, there are several reserved internal field names. Field names that collide with reserved names will be slightly modified, and you will be notified of any changes we need to make.
Size limits
Motif Local Mode stores and processes data in browser memory. This means that the dataset size is bound by the browser memory limits.
We recommend dataset sizes under 2 million events and under 20 event dimensions.
If your dataset is larger than that, consider:
- Excluding frequent, unimportant events
- Excluding unimportant event dimensions
- Limiting date range (e.g. 1-4 weeks of data)
You may also wish to sample your data. If you do so, sample at the actor (sequence) level, not the event level. For sequence analysis it is better to have every event from half of your users, than to have half events for all users.
To work with larger datasets, learn more about our Cloud offering.