Mixpanel Community Icon

Syncing Historical Data with BigQuery: Pipeline Integration Explained

·
·

I have a configured integration — a pipeline that loads events into BigQuery. I added historical data for the past year via the API. Will the historical data from the past year be synchronized with the dataset in BigQuery?

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    Hi Anton B., Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> 👋

    Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai

  • Avatar of kapa.ai
    kapa.ai
    APP
    ·
    ·

    Based on the knowledge sources, whether your historical data from the past year will be synchronized with your BigQuery dataset depends on whether you have enabled the "sync" feature for your pipeline.When you create a pipeline with historical data, the sync parameter determines if changes to your Mixpanel dataset will be reflected in your exported data. According to the documentation: > A value of true updates exported data with any changes that occur in your Mixpanel dataset. These changes include deletions, late data, and imports that fall into your export window. Data Pipelines APIIf you've enabled sync for your pipeline, Mixpanel will detect changes in your data with daily granularity and replace the old data with the latest version in your BigQuery warehouse. This helps keep your data fresh and minimizes missing data points.However, there are some limitations to be aware of: 1. Data sync does not fully guarantee syncing GDPR Data Deletions 2. It will only sync data for days up to 10 days in the past 3. Mixpanel starts checking for late arriving data 24 hours after the data for a day is exported 4. It may take more than 2 days for the data in BigQuery to be in sync with the data in MixpanelThe Service Level Agreement states that the latency policy for exported events is up to 24 hours end-to-end, with an additional 24-hour allowance for late-arriving data. Late data (arriving more than two hours after the export window) is handled during a daily sync process the following day after ingestion, provided sync is enabled.If you want to verify that your historical data has been successfully imported, you can use Mixpanel's Insights report to validate the historical events by filtering by specific properties like Warehouse Import ID, Warehouse Import Job ID, etc.Data Pipelines FAQ

    kapa.ai is still learning and improving, please let me know how I did by reacting below.

    ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.