I just set up a data pipeline to bring mixpanel mp_master_event table into BigQuery. However, it only brought today’s data. How could I run an historical backfill?
Hi Elena M., Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> 👋
Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai
To run a historical backfill for your Mixpanel data pipeline into BigQuery, you need to create or update your pipeline to specify a historical date range using the from_date parameter. This parameter determines the start date for the data export and allows you to backfill up to 6 months of historical data. The backfill process can take several weeks depending on the amount of data and the number of days being exported.Here’s what you should do: 1. Edit or recreate your pipeline: When setting up the pipeline, include the from_date parameter with the desired start date (in YYYY-MM-DD format). For example, to backfill data from January 1, 2024, set from_date=2024-01-01. If you want the export to continue indefinitely, you can leave the to_date parameter empty; otherwise, specify an end date.2. Limitations: The from_date cannot be more than 6 months in the past. Also, only one date-range based pipeline can be created per 24 hours per project, and only one pipeline step runs at a time per project, so large backfills may take time to complete.3. Pipeline creation example (API):
"type": "bigquery",
"from_date": "2024-01-01",
"bq_region": "US",
"gcp_project": "<your-gcp-project-id> ",
"bq_dataset_name": "<your-dataset-name> "
// ...other required parameters
}
```LBKSLBKS4. **Monitor progress**: The completion time for a backfill depends on the data volume and date range. Larger backfills can take several weeks to complete.LBKSFor more details, see the documentation on [backfilling historical events](https://docs.mixpanel.com/docs/data-pipelines/json-pipelines#backfill-historical-events) and [BigQuery integration setup](https://docs.mixpanel.com/docs/data-pipelines/integrations/bigquery).LBKSIf you only see today’s data, it’s likely your pipeline was created without specifying a historical `from_date`, so only new data is being exported. Recreate or update the pipeline with the correct date range to backfill historical data.LBKS**Note:** If you are using the free trial, backfilled data is limited to only one day prior to the creation date of the pipeline, and data sync is unavailable. For full backfill capabilities, ensure you are not on the trial version and have the appropriate plan and permissions enabled [Free Trial Limitations](https://docs.mixpanel.com/docs/data-pipelines/old-pipelines#free-trial).LBKSReferences:
- [Backfill Historical Events](https://docs.mixpanel.com/docs/data-pipelines/json-pipelines#backfill-historical-events)
- [BigQuery Integration](https://docs.mixpanel.com/docs/data-pipelines/integrations/bigquery)
- [Schematized Export Pipeline](https://docs.mixpanel.com/docs/data-pipelines/old-pipelines/schematized-export-pipeline)
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
If you do not see an option to select a from_date when setting up your Mixpanel data pipeline to BigQuery, it is likely because the UI or workflow you are using does not expose this parameter directly. However, the knowledge sources confirm that the from_date parameter is required for backfilling historical data and must be specified when creating the pipeline—especially if you want to export data from a date in the past (up to 6 months ago) Backfill Historical Events.If the UI does not provide this option, you should use the Mixpanel Data Pipelines API to create or update your pipeline, where you can explicitly set the from_date parameter. Here’s an example of how the request body should look: LBKS{ "type": "bigquery", "from_date": "2024-01-01", "bq_region": "US", "gcp_project": "<your-gcp-project-id> ", "bq_dataset_name": "<your-dataset-name> " // ...other required parameters }LBKSIf you are limited to the UI and cannot specify from_date, you may need to delete the current pipeline and recreate it using the API, or contact Mixpanel support for further assistance. The documentation does not mention a way to set from_date after pipeline creation via the UI BigQuery Integration.Summary:
- The from_date parameter is required for backfilling but may only be available via the API.
- If the UI does not expose this option, use the API to create the pipeline with your desired from_date.
- The knowledge sources do not mention a way to set from_date in the UI.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
To retrieve historical Mixpanel event data using Python, you can use the mixpanel-utils module, which provides a convenient way to export events via the Raw Export API. This allows you to specify a date range and other filters.Here’s an example of how to use mixpanel-utils to export events to a file: LBKSfrom mixpanel_utils import MixpanelUtilsLBKS# Initialize MixpanelUtils with your credentials mputils = MixpanelUtils( 'YOUR_SERVICE_ACCOUNT_SECRET', token='YOUR_PROJECT_TOKEN', service_account_username='YOUR_SERVICE_ACCOUNT_USERNAME', project_id=YOUR_PROJECT_ID, )LBKS# Export events from a specific date range mputils.export_events( 'event_export.txt', { 'from_date': '2024-01-01', 'to_date': '2024-01-31', 'event': '["App Install"]' # Optional: specify event(s) or omit to export all } )LBKS- The exported data will be written to event_export.txt in JSONL format (one event per line).
- You can adjust the parameters to fit your needs (e.g., change the date range, event names, etc.).
- For large exports, you can use options like add_gzip_header=True to receive gzipped files.Note: You must provide your service account secret, project token, service account username, and project ID. The from_date and to_date parameters are required and must be in YYYY-MM-DD format Mixpanel-utils README.For more advanced queries, you can also use the JQL API via mixpanel-utils: LBKSscript = ''' function main() { return Events({ from_date: '2024-01-01', to_date: '2024-01-31' }) .groupBy(["name"], mixpanel.reducer.count()); }''' mputils.query_jql(script)LBKSThis will return the JSON response of the JQL query as a Python dictionary Mixpanel-utils JQL.If you need to export people (user profile) data, use export_people in a similar way.References:
- mixpanel-utils: Export events
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
The service_account_username is the username associated with a Mixpanel service account. When you create a service account in your Mixpanel organization or project settings, you are provided with both a username and a secret (password). These credentials are used for authenticating API requests as a non-human entity, such as a script or backend service.You can view and create service accounts in your organization’s Service Accounts tab in the Organization settings or manage them in your project settings. When initializing tools like mixpanel-utils, you must provide the service_account_username along with the service account secret and other required parameters (Service Accounts Documentation, mixpanel-utils README).> Example usage:
> 'ServiceAccountSecretHere',
> token='ProjectTokenHere',
> service_account_username='my-user.12345.mp-service-account',
> project_id=1695321,
> )
>
```LBKS>
> — [mixpanel-utils README](https://github.com/mixpanel/mixpanel-utils/blob/master/README.md)LBKS**In summary:**
`service_account_username` is the unique identifier (username) for your Mixpanel service account, required for API authentication.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Let me check the knowledge base for you again. One moment! 📚
The knowledge sources confirm that a service account in Mixpanel has both a username and a secret, and these are required for API authentication. However, the sources do not provide detailed instructions on exactly where to find the service account username in the Mixpanel UI. They do state that service accounts can be managed in your organization or project settings, and that you need admin or owner permissions to create or modify them Service Accounts API Overview.If you do not see a service account username in your project settings, it may be because: - You have not created a service account yet. - You are looking in the wrong section (it may be under organization settings rather than project settings). - You do not have the required permissions to view or manage service accounts.The knowledge sources do not provide further detail on the UI location or troubleshooting steps for this scenario. If you cannot find the service account username, you may need to consult Mixpanel’s official documentation or contact Mixpanel support for assistance.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
kapa.ai I am getting and error code 200 when doing this
from mixpanel_utils import MixpanelUtils
# Initialize MixpanelUtils with your credentials
mputils = MixpanelUtils(
'YOUR_SERVICE_ACCOUNT_SECRET',
token='YOUR_PROJECT_TOKEN',
service_account_username='YOUR_SERVICE_ACCOUNT_USERNAME',
project_id=YOUR_PROJECT_ID,
)
# Export events from a specific date range
mputils.export_events(
'event_export.txt',
{
'from_date': '2024-01-01',
'to_date': '2024-01-31',
'event': '["App Install"]' # Optional: specify event(s) or omit to export all
}
)