Mixpanel's Data Pipelines team works with all kinds of users to supply event and user data to various warehouse and storage platforms. Through these conversations, we've seen a set of common questions ranging topics like timestamps, data transformations, and warehouse behaviors.
This article highlights some frequent scenarios our users encounter, along with explanations and troubleshooting.
I tried to create a pipeline, but encountered an error
A few cases for errors during creation relate to trial pipeline expiration, authorization, and Google Groups. The Data Pipelines API is also a reference point for accepted parameters during pipeline creation. Below are some common errors.
{"Error":"the account associated with this project has not purchased the Data Warehouse Export package. You can still use the one-time trial to test the pipeline."}
The above error occurs when creating non-trial pipelines for a Mixpanel organization that does not have the Data Pipelines package enabled. Users may create one trial pipeline per Mixpanel project, after which point opting into the Data Pipelines Package is required to continue using the service.
{"Error":"sharing bigquery with share group failed: failed updating metadata with access: googleapi: Error 400: IAM setPolicy failed for Dataset <dataset>: Account <email> is of type \\"user\\". Please set the type prefix to be \\"user:\\"., invalid"}
The above error occurs in creating BigQuery pipelines and passing an invalid email to the bq_share_with_group
parameter. That parameter requires a Google Group email, and will throw errors if passed other forms of email accounts, like User emails or Service Account emails.
{"Error":"authorization failed"}
The Data Pipelines API uses basic authorization to process requests. Your project's api_secret is the value for the username field, meaning if your api_secret were 123456789abcdef
the format would look as follows (in this example we're creating a trial BigQuery pipeline):
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u 123456789abcdef: \
-d type="bigquery" \
-d bq_region="US_EAST_4" \
-d trial=true \
-d bq_share_with_group="bq-access-alias@somecompany.com"
Note that the authentication line ends after the colon.
What is the bq_share_with_group
parameter? How do I add users to view my Mixpanel data in BigQuery?
A BigQuery Data Pipeline will write your Mixpanel data to a shared BigQuery instance. Mixpanel uses Google Groups to provision access to the shared dataset.
You must pass a Google Group as the bq_share_with_group
parameter. You can learn how to create a new Google Group here. The Google Group passed as a parameter will have access to the dataset.
To add additional users the view the data, simply add them to the Google Group.
Can I create multiple pipelines?
The Data Pipelines package allows you to create as many pipelines as you like, provided the pipelines' names are unique. Pipeline names are a combination of the following parameters provided during initialization:
-
project_id
-
data_source
-
frequency
-
type:
-
schema_type
-
trial
The pipeline name ends up being <project_id-data_source-frequency-type-schema_type> + trial (if trial=true). If you attempt to create a pipeline which has the 6 parameters set to the same configurations as a pipeline you already have running, you will encounter the error {“Error”:”a pipeline with the same configuration already exists”}
.
Why can't I access my Mixpanel data in BigQuery?
If you see the error User does not have bigquery.jobs.create permission in project
when attempting to query your BigQuery dataset, this usually means you need to check that you have selected the proper BigQuery project in the top navigation bar.

If you still do not have permissions to query the dataset after selecting the correct project, confirm that you are a member of the Google group that has BigQuery Data Viewer permissions on the dataset.
How do I combine my Mixpanel data in BigQuery with other BigQuery datasets?
The following steps outline how to move data from the Mixpanel-shared BigQuery view to your own instance:
-
Verify that your Mixpanel BigQuery dataset is in the same region as your own BigQuery instance.
-
Create a Scheduled Query in your Mixpanel BigQuery table with a query of the data you want to bring over. There are a few steps and considerations here:
- Query the data you want to transfer. For instance, if you want to transfer the entire table query
Select * From Dataset.Tablename
. - Choose the BigQuery dataset instance in your own BigQuery account as the destination.
- Overwriting old data with new, incoming updates to the table is preferable to strictly appending new data as it arrives. It helps to avoid duplicate data and ensures consistency with deletions or queued mobile event ingestion.
- The cadence of this query can be hourly, daily, weekly, or larger frequencies. It's a good idea to schedule the query when your team is off-hours in order to not interfere with normal operations.
Once your queries complete, you'll be able to combine Mixpanel data with any of your other data in BigQuery.
Why is my exported data offset from UTC? How does Mixpanel export timestamps?
Data Pipelines exports event timestamps using epoch UTC time. This timestamp does not reflect project timezone.
While the timestamp on the event is UTC, Mixpanel partitions data using your project's timezone. This can lead to UTC timestamps from a date that differs from the partition date. If you convert the timestamp to your project time, however, there will be no more offset as the timezones match.
As a result, as you can use built-in tools that expect UTC without having to alter the timestamp. Alternatively you can write queries using the partition to easily reflect data as seen in Mixpanel.
Why don't I see a certain property or event name in my table?
Often times a missing column name is the result of the transformations Data Pipelines apply to your raw data to create warehouse-compatible naming conventions. In these cases, the data may have a reformatted name in the warehouse schema (for example, "Song Played" becomes "song_played"). You can read the full set of transformation rules here.
Why is data missing from my warehouse compared to my Mixpanel project?
There are multiple Data Pipelines behaviors that can lead to discrepancies between your warehouse and the data seen in your Mixpanel project. Below are a set of common scenarios that contribute to this difference.
Data Pipelines do not export modeling layer data
In addition to raw data sent through SDKs and HTTP calls, Mixpanel allows users to create several types of data within the product, including:
- Session Events
- Custom Events
- Custom Properties
- Merged Events / Properties
- First Seen User Properties
These computed "modeling layer" values are separate from the raw data exported via Data Pipelines, and as a result will not propagate to your warehouse. Using Lexicon is an effective way to understand what events and properties are custom transformations versus raw data.
The pipeline hasn't run its most recent sync operation yet
In addition to running exports for the most recent day or hour of your Mixpanel data, Data Pipelines may also be configured to perform syncing operations to patch in latent data.
For a variety of reasons, data in Mixpanel may end up differing from your recurring exports - examples can be found here. When those situations alter your data history, pipelines created with a "sync=true" parameter will perform the necessary exports / deletions to put the warehouse back in line with your Mixpanel data set.
The Data Pipelines API has a timeline endpoint to see when the most recent sync operations have run - often times a pending sync operation will smooth the data discrepancy.
Your Data Pipeline is running the trial version
We offer a free trials of Data Pipelines for users to sample the product before opting into the full add-on. Trial pipelines have an explicit set of parameters they allow, which you can find here. Some of these parameters will affect the set of data seen in the warehouse, e.g. sync functionality is disabled, and the pipeline does not backfill historical windows.