how do i join mixpanel.mp_master_event_raw and mixpanel.mp_identity_mappings_data_raw without any duplicates
To join the Mixpanel master event raw table with the identity mappings data raw table without duplicates, you'll need to use the resolved distinct_id from the identity mappings table when available. This approach ensures accurate user counts that match what you see in the Mixpanel UI.Here's an example SQL query for Snowflake that demonstrates how to properly join these tables: LBKSSELECT COALESCE(mappings.DATA:resolved_distinct_id::string, events.DATA:distinct_id::string) AS resolved_distinct_id, events.DATA, COUNT(*) AS count FROM <DB_NAME> .<SCHEMA_NAME> .MP_MASTER_EVENT_RAW events FULL OUTER JOIN <DB_NAME> .<SCHEMA_NAME> .MP_IDENTITY_MAPPINGS_DATA_RAW mappings ON events.DATA:distinct_id::string = mappings.DATA:distinct_id::string GROUP BY COALESCE(mappings.DATA:resolved_distinct_id::string, events.DATA:distinct_id::string), events.DATALBKS
This query:
1. Uses COALESCE to prioritize the resolved_distinct_id from the mappings table when available
2. Falls back to the original distinct_id from the events table when no mapping exists
3. Joins the tables on the distinct_id field
4. Groups by the resolved identifier to prevent duplicatesWhen working with identity resolution in Mixpanel exports, it's important to use the identity mappings table because Mixpanel exports event data as they appear when ingested, not with the resolved identifiers that you see in the UI reporting. The identity mappings table allows you to recreate the identity clusters that Mixpanel creates internally.You may need to adjust the query based on your specific database schema and requirements.Snowflake Queries
User Identity Resolution
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
im trying to find the number of users created within a timeframe. the following query gives me different results than the UI: select count(*) ,'user_created' as "Event Name" from ( SELECT COALESCE(mappings.data:resolved_distinct_id::string, events.data:distinct_id::string) AS resolved_distinct_id, COUNT(*) AS count FROM raw_prod.mixpanel.mp_master_event_raw events FULL OUTER JOIN raw_prod.mixpanel.mp_identity_mappings_data_raw mappings ON events.data:distinct_id::string = mappings.data:distinct_id::string where events.data:event_name::string = 'User Created' and to_timestamp(events.data:time) between dateadd(day, -8, current_date) and dateadd(second, -1, current_date) GROUP BY 1 ) kapa.ai
why?
Let me check the knowledge base for you again. One moment! 📚
I can see why you're experiencing discrepancies between your SQL query results and what you see in the Mixpanel UI. There are several common reasons for these differences when querying exported Mixpanel data.Here are the most likely causes for the discrepancy: 1. Timezone differences: Your query is using current_date
which is in your database's timezone, while Mixpanel UI might be using a different timezone. Mixpanel records all events in UTC but displays them in your project timezone.2. Query caching: Mixpanel caches query results, so the UI might be showing slightly older data depending on your date range.3. Late-arriving data: Mixpanel accepts data that was triggered a while ago, and the UI might include this data while your query might not.4. Different filtering: The UI might have additional filters applied that aren't in your query.To troubleshoot this issue, try these modifications to your query: LBKSSELECT COUNT(DISTINCT COALESCE(mappings.data:resolved_distinct_id::string, events.data:distinct_id::string)) AS unique_users, 'User Created' as "Event Name" FROM raw_prod.mixpanel.mp_master_event_raw events LEFT JOIN raw_prod.mixpanel.mp_identity_mappings_data_raw mappings ON events.data:distinct_id::string = mappings.data:distinct_id::string WHERE events.data:event_name::string = 'User Created' AND TO_TIMESTAMP_NTZ(events.data:time) BETWEEN CONVERT_TIMEZONE('UTC', '<YOUR_PROJECT_TIMEZONE> ', DATEADD(day, -8, CURRENT_DATE())) AND CONVERT_TIMEZONE('UTC', '<YOUR_PROJECT_TIMEZONE> ', DATEADD(second, -1, CURRENT_DATE()))LBKS
Key changes:
- Using COUNT(DISTINCT ...)
to count unique users
- Changed to LEFT JOIN
instead of FULL OUTER JOIN
since we're filtering on events
- Added timezone conversion to match your Mixpanel project timezone
- Removed the subquery which isn't necessary for this calculationFor debugging discrepancies, I recommend: 1. Compare a specific day with significant discrepancy
2. Check if you're comparing the same metric (unique users vs. total events)
3. Verify the timezone settings in your Mixpanel project
4. Check if any data was imported or arrived late by examining the mp_processing_time_ms
propertyDebugging Discrepancies provides more guidance on resolving these issues.If you're still seeing differences, you might want to try the query examples from the Snowflake Queries documentation which are specifically designed to match Mixpanel's UI calculations.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.