I'm exported some of our events using the export API, there are so many duplcate, is it an expected behaviour?
Hi Dwi A., Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> 👋
Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai
Yes, that’s expected.Mixpanel’s deduplication is applied in the UI and internal storage, not to raw exports. Raw data exported via: - the Raw Event Export API [Export methods]
- or Data Pipelines / warehouse tables [Common SQL queries]can contain duplicate events.Per the deduplication docs, raw exports are explicitly not deduplicated, and you are expected to deduplicate on your side using these four fields: event (or event_name), distinct_id, time, and $insert_id/insert_id. [Event deduplication; Important notes]If you’re querying exported data (e.g. in a warehouse), Mixpanel recommends counting distinct combinations of those four properties, for example: LBKSCOUNT(DISTINCT CONCAT(event_name, time, distinct_id, insert_id))LBKS
to get a deduplicated event count. [Common SQL queries]
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
why should we use the combination of these 4 fields?
event_name, time, distinct_id, insert_idinsert_id is already unique, no?
Let me check the knowledge base for you again. One moment! 📚
$insert_id is meant to be unique per event, but in practice it is not guaranteed to be globally unique and stable on its own, so Mixpanel’s deduplication is defined over all four fields together: event (event name) + distinct_id + time + $insert_id. [Event deduplication; How deduplication works]From the docs: - Deduplication “occurs when a subset of the event data (event name, distinct_id, timestamp, $insert_id) is identical. Other event properties are not considered.” [How deduplication works]
- Raw exports are not deduplicated, and Mixpanel recommends you “apply the same deduplication logic (event name, distinct_id, timestamp, $insert_id) to your exported data.” [Important notes]
- $insert_id is “a unique identifier for the event, used to deduplicate events that are accidentally sent multiple times,” and “Events with identical values for (event, time, distinct_id, $insert_id) are considered duplicates.” [Import validation; Reserved properties]So, even though $insert_id should be unique, the official deduplication contract (both in Mixpanel UI and for your own SQL/exports) is defined on the combination of the four fields. Using all four matches how Mixpanel itself deduplicates and protects you from edge cases where $insert_id might be reused or collide.
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
