Mixpanel Community Icon

Integration Challenges with Mixpanel and GBQ: Resolving Distinct IDs

·
·

We are integrating mixpanel with GBQ and we are considering the unique identifier as resolve distinct id using the below query.

SELECT

 CASE

     WHEN m.resolved_distinct_id IS NOT NULL THEN m.resolved_distinct_id

     WHEN m.resolved_distinct_id IS NULL THEN e.distinct_id

 END as resolved_distinct_id,

COUNT(*) AS count

FROM mixpanel_nessie_day_partitioned_<PROJECT_ID>.mp_master_event e FULL OUTER JOIN mixpanel_nessie_day_partitioned_<PROJECT_ID>.mp_identity_mappings_data_view m

ON e.distinct_id = m.distinct_id

AND mp_city="San Francisco"

AND DATE(e._PARTITIONTIME) <= "2024-12-03"

AND DATE(e._PARTITIONTIME) >= "2024-12-01"

GROUP BY resolved_distinct_id

LIMIT 100

However, there are multiple resolve_distinct_id values for which we need to retrieve the UUID and Email fields. The challenge we are facing is that we are unable to locate the resolve_distinct_id in the Mixpanel interface.Could you kindly let me kn0w why it is happening?

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    Hi Sonia, Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> đź‘‹

    Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    The issue you're experiencing is related to how Mixpanel handles identity resolution and how this data is represented in exported data versus the Mixpanel interface. Let me explain why you're unable to find the resolved_distinct_id in the Mixpanel interface: 1. Identity Resolution in Exports: When you export data from Mixpanel to BigQuery (or other data warehouses), Mixpanel provides an identity mappings table that includes both the original distinct_id and a resolved_distinct_id. This resolved_distinct_id is the canonical identifier that Mixpanel uses internally to represent a unique user after merging multiple identifiers. 2. Mixpanel Interface Display: In the Mixpanel interface, you don't see the resolved_distinct_id directly. Instead, Mixpanel chooses one of the identifiers from the user's identity cluster to display as the user's ID. This is explained in the [Identifying Users (Original) FAQ](https://docs.mixpanel.com/docs/tracking-methods/id-management/identifying-users-original#faq): "The canonical distinct_id is selected by Mixpanel using any of the IDs inside of a cluster. This means that even after identifying your user, Mixpanel may choose your user ID or one of the other IDs inside of the cluster (such as the device ID) to serve as the displayed ID for the user. **This is random and not user-configurable.**" 3. Best Practice for User Identification: To address this discrepancy, Mixpanel recommends setting your chosen user ID (like UUID or email) as a separate user profile property and event property. This way, you can always query and see your own chosen user ID. As stated in the same FAQ: "As a best practice, please set your chosen user ID as a separate user profile property and event property so that you can always query and see your own chosen user ID." To resolve your specific issue: 1. Make sure you're setting your UUID and email as separate user profile properties when identifying users. 2. In your BigQuery queries, join the identity mappings table with your events table as you're doing, but also include a join with the user profiles table to retrieve the UUID and email fields you've set. Here's an example of how you might modify your query to include these fields (note that this is a conceptual example and may need to be adjusted based on your exact schema):

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·
    CASE WHEN m.resolved_distinct_id IS NOT NULL THEN m.resolved_distinct_id WHEN m.resolved_distinct_id IS NULL THEN e.distinct_id END as resolved_distinct_id, p.uuid, p.email, COUNT(*) AS count FROM mixpanel_nessie_day_partitioned_<PROJECT_ID> .mp_master_event e FULL OUTER JOIN mixpanel_nessie_day_partitioned_<PROJECT_ID> .mp_identity_mappings_data_view m ON e.distinct_id = m.distinct_id LEFT JOIN mixpanel_nessie_day_partitioned_<PROJECT_ID> .mp_users p ON COALESCE(m.resolved_distinct_id, e.distinct_id) = p.distinct_id WHERE e.mp_city = "San Francisco" AND DATE(e._PARTITIONTIME) <= "2024-12-03" AND DATE(e._PARTITIONTIME) > = "2024-12-01" GROUP BY resolved_distinct_id, p.uuid, p.email LIMIT 100LBKS```LBKSLBKSThis approach allows you to work with the resolved identities in your data warehouse while still accessing the specific identifiers (UUID and email) that you've set for your users.LBKS[Identifying Users (Original)](https://docs.mixpanel.com/docs/tracking-methods/id-management/identifying-users-original#faq) [Schematized BigQuery Pipeline](https://docs.mixpanel.com/docs/data-pipelines/old-pipelines/integrations/schematized-bigquery-pipeline#querying-the-identity-mapping-table)

    kapa.ai is still learning and improving, please let me know how I did by reacting below.

    ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.