Mixpanel Community Icon

Troubleshooting Data Discrepancies Between Mixpanel and Snowflake

·
·

i user data pipeline to sync mixpanel to snowflake, and when i query data in snowflake and comparing it to mixpanel, there are big gaps. what could be the reason (same event, same date range, same filters, tried different aggregations methods such as unique users and even total events, none matches)

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    Hi Nave L., Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> đź‘‹

    Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    There are several potential reasons for discrepancies between your Mixpanel data and what you're seeing in Snowflake after using Data Pipeline. Based on the knowledge sources, here are the most likely causes:

    Common Causes of Data Discrepancies

    1. Data Sync Issues: - If Data Sync is not enabled for your pipeline, late-arriving data won't be properly synchronized. - Data sync only works for data up to 10 days in the past, so older data might not be properly synced.2. Timing of Data Processing: - It can take up to 24 hours for late-arriving data to be synced from Mixpanel to Snowflake. - For late data, it may take more than 2 days for the data to be fully in sync.3. Identity Resolution Differences: - If you have ID merge enabled, you need to use the identity mapping table to accurately count unique users. - Exported event data before sending alias events has the original user identifier, not the resolved identifier.4. Timezone Differences: - Mixpanel exports timestamps in UTC, which might differ from your project timezone settings.

    Debugging Steps

    To properly debug this issue, you should: 1. Check if Data Sync is enabled for your pipeline.2. Verify your query methodology in Snowflake: - For counting events correctly in Snowflake, use the appropriate query format. Here's an example for getting the number of events each day: LBKSSELECT TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE> ', time)) as ttime, count(**) FROM <DB_NAME> .<SCHEMA_NAME> .MP_MASTER_EVENT WHERE ttime> =TO_DATE('2021-12-03') AND ttime<=TO_DATE('2024-09-01') GROUP BY ttime ORDER BY ttime;LBKS3. For user identity issues, make sure you're using the identity mapping table correctly: LBKSSELECT COALESCE(mappings.resolved_distinct_id, events.distinct_id) AS resolved_distinct_id, COUNT(**) AS count FROM <DB_NAME> .<SCHEMA_NAME> .MP_MASTER_EVENT events FULL OUTER JOIN <DB_NAME> .<SCHEMA_NAME> .MP_IDENTITY_MAPPINGS_DATA mappings ON events.distinct_id = mappings.distinct_id AND events.properties:"$city"::STRING = 'San Francisco' AND TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE> ', events.time)) > = TO_DATE('2020-04-01') AND TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE> ', events.time)) <= TO_DATE('2024-09-01') GROUP BY COALESCE(mappings.resolved_distinct_id, events.distinct_id) LIMIT 100;LBKS4. Check for hidden events: Mixpanel exports all events to your destination, even ones that are hidden in the UI via Lexicon.If you've verified all these aspects and still see significant discrepancies, it might be worth <https://mixpanel.com/get-support|reaching out to Mixpanel Support> for further assistance with your specific implementation.

    kapa.ai is still learning and improving, please let me know how I did by reacting below.

    ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.