Event Import and Deduplication

  • 15 September 2021
  • 0 replies

Events are duplicated even though they have the same distinct_id, time, $insert_id, and event name. According to the docs this shouldn’t be the case.


We require that $insert_id be specified on every event. $insert_id provides a unique identifier for the event, which we use for deduplication. Events with identical values for (event, time, distinct_id, $insert_id) are considered duplicates; if duplicates exist, our database will pick the most recently ingested one at query-time.


However, after running a process that is supposed to generate events with the same time, distinct_id, and $insert_id i find that both the original and the new event appear in insights. One has the new properties I’m attempting to retroactively add, and the other does not. 


Another strange part about this is, another process I run also via the Import API performs the exact same kind of logic. That is, it generates events of the same name, time, distinct id, and insert id however those events deduplicate perfectly. Is there some inherit difference between events imported via the Import API vs events added via Track THEN via the Import API?

