I want to reproduce the logic or query Mixpanel uses to count unique sessions across multiple events in its interface what properties do I need to combine ensure that the same user across multiple sessions in the same day is transformed accurately also. kapa.ai
To reproduce the logic Mixpanel uses for counting unique sessions while ensuring users are accurately identified across multiple sessions and devices, you must combine Identity Management logic with Session Computation rules.
1. User Identity Resolution
Before calculating sessions, you must ensure that events from the same user are linked to a single canonical ID. In Mixpanel’s Simplified ID Merge system, this is done using the $user_id and $device_id properties.If you are querying raw data (e.g., from a Data Pipeline), you should use the mp_identity_mappings_data_view to resolve various IDs into a single resolved_distinct_id. Without this, a user who visits your site anonymously and then logs in might be counted as two separate users [Common SQL Queries].
2. Session Definition Logic
Mixpanel calculates sessions on the fly based on your project settings. To reproduce this, you need to apply one of the following logic types to your event stream: ** Timeout-Based (Default): A session starts with an event and ends after a period of inactivity (default is 30 minutes). Any new event within that window restarts the timer [Session Computation Deep-Dive].
** Event-Based: A session is defined by specific "Session Start" and "Session End" events [Event-Based Sessions].
** Property-Based: You use a specific session_id property sent with your events. A session persists as long as this property value remains constant [Property-Based Sessions].**Crucial Rule:* All Mixpanel sessions are reset every 24 hours at midnight in your project's timezone. Even if the activity continues or the session_id remains the same, a new session starts at midnight [Session Computation Deep-Dive; Property-Based Sessions].
3. Combining Properties for Deduplication
When querying raw events to build these sessions, you must first deduplicate the data. Mixpanel uses a combination of four properties to identify a unique event:
1. Event Name (event)
2. Distinct ID (distinct_id)
3. Timestamp (time)
4. Insert ID ($insert_id)If these four properties are identical, the records are duplicates [Event Deduplication].
Summary of the Transformation Logic
To accurately count unique sessions across multiple events for a user in a single day:
1. Map IDs: Join your events with identity mapping tables to replace the raw distinct_id with a resolved_user_id [Common SQL Queries].
2. Deduplicate: Filter out events where the four key properties (Event, ID, Time, Insert ID) are identical [Track Event Deduplication].
3. Apply Windowing: Use a window function (like LAG or LEAD in SQL) to calculate the time difference between consecutive events for each resolved_user_id.
4. Segment by Midnight: Ensure any session duration calculation or session ID resets at the 24-hour mark [Session Computation Deep-Dive].
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Where do I find the resolved distinct id or resolved user id in the API endpoint used. And why do I need to segment by midnight makes no sense to me because of elements of the date and time is factored in the transformation logic right. So once it combines all 4 and see it's a new session it assigns a randomly generated ID if the 4 are the same it gives the same session ID so maybe a look up table is required. Give me clear steps to reproduce and where to find the exact information I need. kapa.ai
Let me check the knowledge base for you again. One moment! 📚
I'm sorry, but the provided knowledge sources do not contain information regarding a "session ID" being assigned via the Ingestion API or how a lookup table for session IDs is structured. However, based on the documents provided, here is the clarification on the resolved distinct ID and why the midnight reset is a core part of Mixpanel's internal logic.
1. Where to find the Resolved ID
The knowledge sources do not mention a "resolved ID" being returned in the response of the Ingestion API (/track). Instead, the resolution happens after ingestion.** In Ingestion: You send $device_id (anonymous) and/or $user_id (authenticated). If both are present, Mixpanel's Simplified ID Merge API automatically creates a mapping [Identifying Users (Simplified)].
** In Export/Query APIs: When using Data Pipelines or Export APIs, the events are exported with the distinct_id they had at the time of ingestion. To find the "resolved" version, you must use the Identity Mappings Table (often called mp_identity_mappings_data_view in schematized exports) [Common SQL Queries; Schematized Export Pipeline].
** *Property Name: In these mapping tables, look for the field resolved_distinct_id [Json Pipelines].
2. Why Segment by Midnight?
While date and time are factored into the individual events, Mixpanel’s Session computation engine has a hard-coded rule: all sessions are reset at midnight according to your project's timezone [Session Computation Deep-Dive].** Logic: Even if a user is active at 11:55 PM and continues until 12:05 AM without a 30-minute break, Mixpanel will trigger a "Session End" at 11:59:59 PM and a "Session Start" at 12:00:00 AM [Examples]. ** Reasoning: This ensures that session metrics (like session count or duration) can be cleanly aggregated into daily buckets without a single session spanning multiple days and skewing daily reporting.
3. Steps to Reproduce Mixpanel's Logic
To reproduce the UI's session counts from raw data, follow these steps: 1. Deduplicate Events: Use the combination of event, distinct_id, time, and $insert_id. If these four are identical, ignore the duplicates [Event Deduplication].
2. Resolve Identities: Join your event table with the mp_identity_mappings_data_view. Replace the distinct_id on the event with the resolved_distinct_id from the mapping table. If no mapping exists, use the original distinct_id [Common SQL Queries].
3. Apply Midnight Boundary: Group your events by the resolved_distinct_id and the Calendar Date.
4. Calculate Session Gaps: Within each user-day group, sort events by time.
** If the gap between two events is greater than your timeout (e.g., 30 minutes), start a new session.
** If the date changes (midnight), start a new session regardless of the time gap [Timeout-Based Sessions].The knowledge sources do not explicitly mention the API returning a "Session ID" for you to look up; sessions are computed based on the event stream properties at query time.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
To achieve a clean and accurate reproduction of Mixpanel's unique session logic, you should not avoid events with only a $device_id. Instead, you should use all events and link them using the Identity Mapping table.
Why you should use both
If you only use events where a $device_id already maps to a $user_id, you will lose all historical data from the "anonymous" phase of the user's journey (e.g., browsing the site before signing up). Mixpanel’s "ID Merge" is designed specifically to retroactively join these two streams so that the conversion from an anonymous visitor to an identified user is captured in a single session flow [Identifying Users (Simplified)].
The "Best Practice" Workflow
The most accurate way to reproduce this logic is to follow these steps: 1. Do not filter out anonymous events: Keep events that only have a $device_id. These represent the start of many user journeys [Example User Flows].
2. Use the Identity Mapping Table: When querying your data (via Snowflake, BigQuery, or the Export API), join your events table with the identity mapping table (mp_identity_mappings_data_view).
3. Prioritize the Resolved ID: Use a CASE statement or COALESCE to select the resolved_distinct_id if it exists. If it does not exist (meaning the user has never been identified), fall back to the original distinct_id [Queries; User Identity Resolution].
Example SQL Logic for the "Resolved" User
This is the standard logic Mixpanel recommends for ensuring you aren't double-counting users across their anonymous and identified states: LBKSSELECT -- This creates a single "canonical" ID for the user CASE WHEN mapping.resolved_distinct_id IS NOT NULL THEN mapping.resolved_distinct_id ELSE events.distinct_id END as resolved_user_id, events.time, events.event_name FROM mp_master_event events LEFT JOIN mp_identity_mappings_data_view mapping ON events.distinct_id = mapping.distinct_idLBKS[Common SQL Queries]
Summary of Benefits
** Accuracy: By using the mapping table, events performed as "Device A" and events performed as "User 123" are attributed to the same person [Canonical Distinct ID]. ** Session Continuity: This allows a session that starts anonymously and ends with a login to be counted as one continuous session rather than being broken into two [Example User Flows]. ** *Completeness: You avoid the "orphan" data that would occur if you only looked at identified events.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
I'm sorry, but the knowledge sources do not explicitly mention a method to generate or retrieve a unique "session ID" directly from the Mixpanel Ingestion API, as Mixpanel computes sessions on the fly at the query level rather than at the ingestion level.However, based on the documentation, you can reproduce this logic in your own data pipeline by following this blueprint to generate a synthetic session_id.
Implementation Blueprint
1. Data Preparation (Deduplication & Identity)
Before assigning session IDs, you must ensure the event stream is clean and identities are resolved.
** Deduplicate: Filter your raw event stream by grouping on event, distinct_id, time, and $insert_id. Only keep one record per group [Event Deduplication].
** Resolve Identity: Join your events with the Mixpanel Identity Mapping table. Use resolved_distinct_id if it exists; otherwise, use the original distinct_id. This ensures a session that starts anonymously and ends authenticated is tied to the same "user" group [Common SQL Queries; User Identity Resolution].
2. Session Partitioning Logic
To simulate Mixpanel's session computation, you must apply three partitioning rules to your sorted event stream:
** Partition by User: Group events by the resolved_distinct_id.
** Partition by Day (The Midnight Reset): Group events by the calendar date. Mixpanel resets all sessions at midnight in the project’s timezone [Session Computation Deep-Dive].
** *Sort by Time: Order events within these groups by the time property [Timeout-Based Sessions].
3. Generating the Unique Session ID
Since Mixpanel does not provide a session ID in the raw export, you must generate one using a "Gap and Islands" analytical approach:
1. Calculate Gaps: For each event, look at the previous event's timestamp. If the difference is greater than your timeout (default: 30 minutes), mark it as a new "session start" [Session Timeout].
2. Assign IDs: Create a unique string by concatenating the resolved_distinct_id, the date, and a session_counter (an incrementing integer that increases every time a "gap" is detected or the date changes).
4. Handling Session Start and End Events
If you want to match Mixpanel's UI reporting exactly, you should generate two "virtual" events for each session group: ** Session Start: Triggered at the timestamp of the first event in the session group. ** Session End: Triggered at the timestamp of the last event in the session group [How Sessions Work].
Summary of Information Location
** Identity Mapping: Found in the mp_identity_mappings_data_view or exported via People Export Pipelines [User Identity Resolution].
** Session Rules: Configuration settings for Timeout, Event, or Property-based sessions are located in your Mixpanel Project Settings [How Sessions Work].
** *Deduplication Keys: Use event, distinct_id, time, and $insert_id [Event Deduplication].
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Is this a view I need to write the query for to create this table MP_IDENTITY_MAPPINGS_DATA or do I need to look for a field or column or table in the raw data to see if it's needed. Also the resolved distinct id or resolved user id you mentioned. Is this data available in the raw data table for event or user properties ingested and if so which one is it located inside. kapa.ai
Let me check the knowledge base for you again. One moment! 📚
The knowledge sources provide specific details on where to find identity data and how it is structured in your warehouse.
Is MP_IDENTITY_MAPPINGS_DATA a view I need to create?
No, you do not need to write the query to create this table yourself if you are using Mixpanel Data Pipelines. Mixpanel automatically creates and manages these tables for you.** For BigQuery: Mixpanel creates tables named mp_identity_mappings_data_** (with a random suffix) and automatically updates a view called mp_identity_mappings_data_view to point to the latest data. You should always query the view [BigQuery Design].
** For Snowflake: Mixpanel creates the table MP_IDENTITY_MAPPINGS_DATA directly in your schema [Snowflake Queries].
** Availability: Mixpanel automatically exports this identity mapping table when you create a people/user profile export pipeline from a project that has ID Merge enabled [User Identity Resolution].
Is the "Resolved ID" in the raw event or user property tables?
The "resolved" version of an ID is generally not present as a pre-populated column in the raw event data table.** Raw Event Table (mp_master_event): Pipelines export event data exactly as they appeared when Mixpanel ingested them. This means the distinct_id column in your raw events table contains the original identifier used at the time of the event (e.g., a $device_id for anonymous events), not the final resolved ID [User Identity Resolution].
** Identity Table: The resolved_distinct_id is located specifically inside the identity mapping tables/views mentioned above. It contains two primary columns: distinct_id (the original ID) and resolved_distinct_id (the ID after merging) [Identity Mappings Output].
Where to look for the information
To get the resolved ID for your analysis, you must perform a join in your SQL query: 1. Events Table: Use mp_master_event.
2. Identity Table: Use mp_identity_mappings_data_view.
3. The Field: Match events.distinct_id = mappings.distinct_id. The column you are looking for to use as your unique user identifier is mappings.resolved_distinct_id [Query identity mappings].If a resolved_distinct_id is not found in the mapping table for a specific record, you should fall back to the distinct_id from the events table [User Identity Resolution].
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
