How best to create facts and dimensions table in Mixpanel
Hi event, Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> 👋
Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai
Mixpanel already uses a fact/dimension-style data model; you don’t create “tables” directly, but you map your data into these four types: 1. Facts → Events
- Events are your fact table: each row is “something that happened at a point in time.” [Data model]
- Required fields:
- Event Name (e.g. Signup, Order Completed)
- Time (timestamp)
- Distinct ID / User ID (who did it; optional for some non-user events) [Events; Warehouse events]
- All other columns/fields become event properties (e.g. plan_type, device, amount). [Events; Warehouse events] 2. Dimensions → Profiles & Lookup Tables
Mixpanel’s “dimension tables” are: [Data model] - User Profiles (user dimension)
- Keyed by distinct_id.
- Store current state about the user (e.g. country, subscription_tier, company_size).
- Joined to events on event.distinct_id = user_profile.distinct_id. [Data model; Tracking strategy] - Group Profiles (account/org dimension)
- Keyed by a group key you define (e.g. account_id, company_id).
- Store state about accounts/organizations (e.g. MRR, plan, region).
- Joined to events on event.<group_key> = group_profile.<group_key> . [Data model] - Lookup Tables (entity dimensions)
- CSV/warehouse tables keyed by a join key (e.g. song_id, product_id).
- Contain extra attributes like artist, genre, price, etc.
- Joined to events (or profiles) on event.<join_key> = lookup_table.<join_key> . [Lookup tables; Data model] 3. How to “design” facts vs dimensions in Mixpanel - Start from events (facts):
- Track only key business actions tied to KPIs (e.g. Signup Started, Checkout Completed, Feature Used), not every click. [Tracking strategy]
- Use event properties instead of many separate events (e.g. one Paywall Click event with paywall_type property, not 10 different paywall-click events). [Tracking strategy; Data democratization] - Push stable, “who they are” attributes into profiles (dimensions):
- Things like role, company size, ICP flag, current plan, region belong on user or group profiles, not repeated on every event. [Onboarding tracking strategy; Data model] - Use Lookup Tables for rich entity metadata:
- If you have IDs on events (e.g. song_id, product_id) and lots of attributes about those entities, keep them in a Lookup Table and join via the ID. [Lookup tables] 4. If you’re syncing from a warehouse - Use Warehouse Connectors to map:
- Fact-like tables → Events (must have time, event name, and optionally user ID). [Warehouse events]
- Dimension/SCD tables → User Profiles, Group Profiles, or Lookup Tables depending on whether they describe users, accounts, or other entities. [Data model; Lookup tables]
- For slowly changing dimensions (e.g. subscription history), Mixpanel can ingest profile history/SCD-style tables and join them to events over time. [Leap forward data model] In short: model your events as facts, and use user profiles, group profiles, and lookup tables as your dimension tables, joined via IDs and keys.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
So are you saying I have A fact table which is a row of each event and dimensions such as user properties email, age and then have different event properties such as one for page load events and its relevant property and one for button clicks what is recommended practice, simplify it and what happens to sessions is a fact table required for that? kapa.ai
Let me check the knowledge base for you again. One moment! 📚
Yes—your mental model is basically right. In Mixpanel terms:
1. Facts vs dimensions in Mixpanel
- Fact table = Events
- Each row = one event at a point in time (e.g. page_viewed, button_clicked, order_completed). [Events; Data model]
- Required on every event: event name, timestamp, and usually distinct_id (user identifier). [Events]
- Extra columns on the event are event properties (e.g. page_url, button_name, amount). [Events]- Dimension tables = Profiles & Lookup Tables
- User profiles: keyed by distinct_id, store “who the user is right now” (email, age, plan, country, etc.). [Data model; Tracking strategy]
- Group profiles: keyed by something like company_id, store account/org attributes. [Data model]
- Lookup tables: keyed by an ID like product_id, store attributes of entities (product name, category, price, etc.). [Data model]
2. Recommended way to structure events & properties
To simplify: - Keep events as actions, not pages or entities.
- Example: use a single page_viewed event with a page_name or page_url property, not separate home_page_viewed, pricing_page_viewed, etc. [Events best practices; Tracking plan methodology]
- Similarly, use one button_clicked event with properties like button_name, color, location when the analysis question is “which buttons get clicked?” [Events best practices]- Use properties to add context instead of more events.
- E.g. page_viewed with page_type, experiment_variant, device, etc. [Build tracking strategy]- Put stable “who they are” attributes on profiles, not every event.
- Email, age, role, company size, current plan → user profile properties. [Tracking strategy; Data model]So you might have: - Events (facts): page_viewed, button_clicked, signup_started, purchase_completed
- Event properties (per event type):
- page_viewed: page_url, referrer, utm_source
- button_clicked: button_name, page_url
- User profile properties (dimensions): email, age, country, plan_type
3. Where do sessions fit? Do you need a separate fact table?
You do not need to create a separate fact table for sessions.- Mixpanel computes sessions automatically from your existing events using your chosen session definition (timeout-based, event-based, or session_id-based). [Sessions overview]
- It then creates virtual events: Session Start and Session End, with properties like:
- Session Duration (Seconds)
- Session Event Count
- Session Start Event Name
- Session End Event Name [Session properties]These session events behave like another fact table derived from your main events, but you don’t model or ingest them yourself—Mixpanel generates them on the fly from the event stream. [Sessions overview]So in practice: - You design one main fact table: events (plus optional warehouse-synced events).
- Mixpanel derives session facts from those events automatically.
- Your dimensions are user profiles, group profiles, and lookup tables.If you want, tell me your product type (e.g. SaaS app, ecommerce, media) and I can suggest a minimal concrete event + property + profile schema following these principles.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
I mean this would be created outside of Mixpanel so need to know how to data model and the relationships because I can't see session id in the data, how best to derive a session id to know the sequence of activities a user did in a session how to reproduce that logic and track the activities outside of Mixpanel kapa.ai
Let me check the knowledge base for you again. One moment! 📚
You’re right that if you’re building your own fact/dimension tables outside Mixpanel, you need to reproduce Mixpanel’s session logic yourself.The limitation: the docs describe how Mixpanel computes sessions, but they do not give a prescriptive SQL algorithm for you to copy. I’ll summarize what is documented and how you can mirror it conceptually.---
1. What a “session” is in Mixpanel
Mixpanel defines a session as a set of continuous user interactions grouped into a single visit, based on either a timeout or explicit start/end markers. [Glossary session]Sessions are computed from your event stream and exposed as virtual events: - Session Start
- Session End [Sessions overview]You can configure how they’re computed in the project: - Timeout-based (default) – session starts on first event, ends after X minutes of inactivity. [Sessions overview; Timeout-based deep dive]
- Event-based – you choose a start event (e.g. Login) and end event (e.g. Logout). [Event-based sessions]
- Property-based – you choose a property (e.g. session_id); all events with the same value are in the same session. [Property-based sessions]Sessions are reset at midnight project timezone, so max session length is 24h. [Timeout-based deep dive; Event-based; Property-based]---
2. How to mirror this outside Mixpanel (derive your own session_id)
You won’t see a session_id in raw Mixpanel events unless you send one. To reconstruct sessions in your own warehouse, you need to:
Option A – Mirror timeout-based sessions
Conceptually (per docs): - A session starts when a user performs an event.
- It ends when the user stops performing events for longer than the configured timeout (e.g. 30 minutes).
- The end time is the timestamp of the last event before the gap. [Timeout-based deep dive; Examples]To reproduce: 1. Partition by user (e.g. distinct_id) and order by time.
2. Compute the time difference between the current event and the previous event for that user.
3. Start a new session when:
- There is no previous event (first event for that user), or
- The gap > your chosen timeout (e.g. 30 minutes), or
- The date crosses midnight (if you want to match Mixpanel’s “reset at midnight” rule). [Timeout-based deep dive; Examples]
4. Assign a session_id by cumulatively summing these “new session” flags per user.This gives you: - An events fact table with a derived session_id.
- A sessions fact table if you aggregate per user_id + session_id to compute:
- session_start_time (min event time)
- session_end_time (max event time)
- session_duration_seconds (end − start)
- session_event_count (count of events) These mirror Mixpanel’s Session Duration (Seconds) and Session Event Count properties. [Session properties]
Option B – Mirror event-based sessions
Docs definition: - You pick a Session Start event (e.g. Login) and Session End event (e.g. Logout).
- A session starts when the user does the start event, and ends when they do the end event.
- Sessions are still reset at midnight. [Event-based sessions]To reproduce: 1. Filter events to a single user and order by time.
2. Mark rows where event_name = <start_event> as potential session starts.
3. For each start, find the next event_name = <end_event> (or cut at midnight).
4. All events between those two timestamps belong to that session; assign a session_id accordingly.
Option C – Mirror property-based sessions
Docs definition: - You choose a Session ID property (e.g. session_id).
- All events with the same value for that property are in the same session.
- Sessions still reset at midnight even if the property value doesn’t change. [Property-based sessions]To reproduce: 1. Use the property you already track (e.g. session_id) as your session key.
2. Partition by user_id + session_id and order by time.
3. Optionally split further if the events cross midnight to match Mixpanel’s reset rule. [Property-based sessions]---
3. Recommended modeling outside Mixpanel
Given the docs: - Events fact table (one row per event)
- user_id / distinct_id
- event_name
- event_time
- event properties (page, button, etc.)
- derived session_id (using one of the strategies above)- Sessions fact table (optional, derived)
- user_id
- session_id
- session_start_time, session_end_time
- session_duration_seconds, session_event_count
- any first/last-event properties you care about (e.g. landing page), similar to Mixpanel’s session properties. [Session properties]- Dimension tables
- User dimension keyed by user_id (email, age, plan, etc.). [Data model; Tracking strategy]
- Optional entity dimensions (products, content, etc.) as lookup-style tables. [Data model]The docs don’t provide exact SQL, but the logic above follows the documented behavior of timeout-, event-, and property-based sessions. If you tell me which session mode your Mixpanel project uses (timeout vs event vs property), I can outline a more concrete derivation pattern aligned with that mode.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
Here’s the conceptual logic for deriving sessions with a 30‑minute inactivity timeout, matching Mixpanel’s timeout-based definition: 1. Sort events per user
- For each user (distinct_id), order all events by timestamp ascending.
[Session deep-dive]2. Walk through events in time order
For each event for that user, compare it to the previous event: - If there is no previous event →
- This is the start of the first session.
- Else, compute the time gap = current_event_time − previous_event_time.3. Start a new session when the gap > 30 minutes
- If time_gap > 30 minutes →
- The previous session ends at the previous event’s time.
- The current event starts a new session.
- If time_gap ≤ 30 minutes →
- The current event belongs to the same session, and the session timeout clock is reset.
[Session timeout & examples; Timeout examples]4. Reset sessions at midnight (project timezone)
- Even if the gap is less than 30 minutes, any crossing of midnight (per project timezone) forces a new session:
- The session that was active before midnight ends at its last event before midnight.
- The first event after midnight starts a new session.
[Session timeout reset; Timeout examples]5. Assign a session_id
- Conceptually, maintain a session counter per user:
- Increment the counter whenever you “start a new session” (per steps 3–4).
- Tag each event with the current session_id for that user.6. Derive a sessions table (optional)
- Group events by user_id + session_id to compute:
- session_start_time = earliest event time in that group.
- session_end_time = latest event time in that group (this matches “time of the last event before timeout”). [Session deep-dive]
- session_duration_seconds = session_end_time − session_start_time. [Reserved properties]
- session_event_count = number of events in that session. [Reserved properties]That’s the full conceptual flow to reproduce Mixpanel’s 30‑minute inactivity timeout sessions outside Mixpanel.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
