Skip to main content

Demo datasets

We provide several example datasets to practice with and explore in Motif. To load them, go to Datasets, select the Demo tab and select Create workspace from the three-dot menu.

demo datasets

Data Careers

Career paths of Data Scientists, from when they start schooling, to the jobs they have. Includes company names and job titles.

Example views:

Motiflix

Motiflix is a simulated dataset that helps illustrate typical uses of Motif in a business-like setting. It simulates user behaviors on a video streaming platform, including browsing and searching for movies, bookmarking favorites, and watching trailers and movies.

Suggested exploration questions:

  • What events happen immediately before a user watches a movie?
  • Do more people start watching movies from a search result or from their favorites page?
  • Is the number of movie results returned during searching correlated with whether users end up watching a movie from the results list?

Motiflix (extended)

An enhanced version of the Motiflix dataset that contains multiple sessions per user, multiple user segments with different watch patterns, and experiment exposure logs for a new "top movies" experience.

Suggested exploration questions:

ATUS (American Time Use Survey)

ATUS is a survey run by the US Bureau of Labor Statistics. It includes self-reported sequences of activities, which respondents were engaged in during one 24 hour period of their lives.

Suggested exploration questions:

  • What do Americans spend the most time on?
  • What do Americans do most often after waking up?
  • What type of activities do Americans with long commutes give up most?

Airline Flights

Flights from carrier on-time performance data set. Each event is a separate flight, including delayed and cancelled ones.

Suggested exploration questions:

  • How many flights does an airplane do per day on average?
  • Which airlines and airports have the longest average delays?
  • Are planes, which get delayed on one flight, able to catch up to their schedule on subsequent flights?

Bluesky

Snapshot of Bluesky data through May 1, 2023, downloaded from this source. Includes follows, likes, posts, and replies.

  • When do users tend to sign up for Bluesky?
  • How long do user sessions last on average?
  • What do users tend to do after they post?

Chess

A sample of blitz games by top 100 players on Chess.com in January 2024.

  • What openings are more successful for white? For black?
  • Is long or short castling more likely to lead to draws?
  • How does Elo rating gap correlate with win percentage?

Github Issues

Events from the lifecycle of Github issues: opening, assigning, cross-referencing, commenting, closing, etc. The data was pulled from public Github repositories using Github API.

Google Analytics sample data (E-Commerce)

The e-commerce dataset is a sample export from Google Analytics. The dataset was extracted following the instructions for the sample dataset in the BigQuery UI and exported as a JSON file. Its schema is described in Google's documentation.

NFL

NFL play-by-play data for 2009-2019 NFL seasons, organized into drive/possession sequences.

Suggested exploration questions:

  • How often do teams run a ball the 3rd time after 2 runs, which don't result in a first down?
  • What is the most common play after a long pass of 20+ yards?
  • What is the most successful 3 play sequence to get a fresh set of downs (reach a new first down)?

Wikispeedia

Wikispeedia is a game of getting from one Wikipedia article to another exclusively by following links in the articles players encounter. The dataset includes successful and unsuccessful play paths with each event corresponding to a visited article.