Skip to main content

Loading CSV / JSON / Parquet Files

You can use Motif in Local Mode on smaller datasets. In Local Mode, the data will be fully stored and processed on your machine and never sent to Motif (or any other) servers.

Note that the uncompressed dataset will need to fit within roughly a third of the memory available in browser tab (for reference, Google Chrome allows to use ~1GB of memory).

Click "+ Load Dataset Locally" on the top-left dropdown menu: load data locally

Motif accepts data in CSV, TSV, Parquet, and JSON formats. If you can, we recommend you use Parquet, as it's the smallest, fastest to load and takes least resources during processing.

Data file format

Input data file should come in as rows of structured events. Each event consists of columns of event dimensions.

Every event can have any number of dimensions, but they must contains the following 3 (under any names):

  • Actor field - used for grouping events into separate sequences. No need to break actor sequences into “sessions” - you will be able to do it during analysis in Motif. Examples: user id, order id, item id, etc.
  • Time field - used for ordering events within each sequence. Examples: timestamp, date, step number, etc.
  • Event Name field - used as event name in Motif UI. You will be able to re-map to a different field or change event names on the fly later in Motif.

Field names should only include letters, numbers, and underscores. If Actor or Event Name values contain certain rare characters, they will be automatically replaced with underscores during import. String dimensions are trimmed at 100 characters.

CSV, TSV, and Parquet files

CSV and TSV are text-based formats, and so can be very easy to build sample data with, or to export from existing data structures.

  • The first row of the file should be column headers
  • Each column is a dimension of an event
  • Columns can be integers, floats, strings, timestamps, or booleans
  • Columns containing arrays or objects will not be imported.

Note that fields can be left blank if they do not apply to a given event.

user_id, timestamp, event_name, product, price, cart_id, total_sale
1031, 04-26-2023 13:00:12, add_cart, jeans, 100.00,,
1031, 04-26-2023 13:01:35, checkout,,,83720,250.00
1031, 04-26-2023 13:05:28, enter_credit_card,,,,

Parquet format is a compressed data format for storing and sharing highly compressed data. A number of tools will export data into Parquet, including DuckDB, standalone CLI tools and Pandas. A Parquet file should represent the same columns as the corresponding CSV would.

JSON files

You can load your data in Motif as a JSON array or using NDJSON format. Each object in the array should be a flat object, with at least the required fields. Empty dimensions may be omitted.

As with CSV files, you can use only simple data types: integers, floats, strings, booleans and timestamps. Motif will drop arrays and attempt to flatten dictionaries. Timestamps should be encoded as unix epoch integers in seconds, milliseconds or microseconds, or as strings.

[
{
"user_id": 1031,
"timestamp": "04-26-2023 13:00:12",
"event_name": "add_cart",
"product": "jeans",
"price": 100.00,
},
{
"user_id": 1031,
"timestamp": "04-26-2023 13:01:35",
"event_name": "checkout",
"card_id": 83720,
"total_sale": 250.00,
},
{
"user_id": 1031,
"timestamp": "04-26-2023 13:05:28",
"event_name": "enter_credit_card",
}
...etc...
]

Data Import Limitations

Motif's data loader has several important design choices:

  • Currently, it does not load array structures in your data. Array data will be dropped.
  • Motif will flatten nested objects, joining fields with an underscore.

As a result, the object

{
"items": [1031, 1032, 1033],
"event_name": "enter_credit_card",
"meta": {
"count": 152,
"size": "large"
}
}

will be read as if it were:

{
"event_name": "enter_credit_card",
"meta_count": 152,
"meta_size": "large"
}

In addition, there are several reserved internal field names. Field names that collide with reserved names will be slightly modified, and you will be notified of any changes we need to make.

Size limits

Motif Local Mode stores and processes data in browser memory. This means that the dataset size is bound by the browser memory limits.

We recommend dataset sizes under 2 million events and under 20 event dimensions.

If your dataset is larger than that, consider:

  • Excluding frequent, unimportant events
  • Excluding unimportant event dimensions
  • Limiting date range (e.g. 1-4 weeks of data)

You may also wish to sample your data. If you do so, sample at the actor (sequence) level, not the event level. For sequence analysis it is better to have every event from half of your users, than to have half events for all users.

If you need to work with larger datasets, you can use Motif Cloud Offering.