I am seeing some weird behaviour that from Mixpanel i AM GETTING data in hourly format. so Mix panel sends data Hourly to S3 bucket. Data is getting dump but i m seeing unusual that my yesterday's data file is dumped today which should be dumped on s3 from mixpanel yesterday only. And i am seeing this more and more. What can be the reasons behind that
Hi Saurabh, Please stand by as I search the Mixpanel knowledge base for an AI-powered answer. I normally take about 15 seconds or so. In case AI cannot answer your question, feel free to reach out to Mixpanel support at mixpanel.com/get-support. If you have follow-up questions for this thread, please tag me again with your question: @kapa.ai <follow-up question> 👋
Feel free to ask a follow-up question once I've answered by mentioning me with @kapa.ai
There are a few documented reasons why an “hourly” Mixpanel pipeline to S3 can appear to deliver yesterday’s data today: 1. Export timing vs. your expectation - Hourly and daily exports are targeted to run on a schedule, but not at the exact boundary you might expect. Daily exports, for example, are targeted for 00:30 in the project timezone for the previous day’s data. [Export timing] - With incremental JSON pipelines, the next run starts only after the previous one finishes, so “hourly” does not mean exactly every hour on the clock; run times can drift. [Incremental pipelines; Hourly variance]2. Late‑arriving data - Events can arrive late (e.g., mobile clients offline, imports for past dates). For incremental JSON pipelines, late‑arriving events are exported whenever they are ingested, regardless of how old they are. That means a file written “today” can legitimately contain events whose event time is “yesterday” (or earlier). [Late data behavior; Late-arriving data FAQ] - This can look like “yesterday’s data file dumped today,” but it’s actually new late data for that past period being exported now.3. Data sync / incremental behavior changes - Older pipelines with sync would re‑sync days to fix late data, sometimes replacing files for a day. With incremental JSON pipelines, Mixpanel no longer replaces files; instead, it appends new files for each run. So you will see additional files for the same date appearing later. [Incremental changes] - This is expected and is how late data is kept in sync with Mixpanel.4. SLA and allowed latency - Mixpanel’s SLA allows up to 24 hours latency for on‑time data, plus another 24 hours for late‑arriving data (for pipelines with sync). [SLA] - So it is within spec for data from “yesterday” (by event time) to be exported during a job that runs “today.”5. Timezone differences
- Data is exported in UTC, but your project and your expectations may be in another timezone. What you consider “yesterday” in local time may span two different UTC dates, so files written “today” in UTC can still correspond to “yesterday” in your local view. [Export timezone]In practice, what you’re seeing is usually a combination of: - Late‑arriving events being exported when they finally reach Mixpanel.
- Incremental export behavior appending new files for past dates.
- The allowed 24–48h export window and timezone differences.If you need to debug a specific hour/day, compare: - Event ingestion time vs. event time in your warehouse.
- The S3 object timestamps vs. the event time field (UTC).
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
but i am doing an funnel analysis on current date and hourly data through Mixpanel,S3 pipeline and if events that are associated with that hour are not there then that will look wrong in terms of analysis. that's y I wanted to have data in-sync.
Let me check the knowledge base for you again. One moment! 📚
You’re running into a real limitation of how Mixpanel pipelines work: they are not designed to be perfectly real‑time or strictly in‑sync per hour.A few key points from the docs: 1. Pipelines have latency by design - Mixpanel’s SLA allows up to 24 hours for on‑time data to be exported, plus another 24 hours for late‑arriving data (for pipelines with sync). [SLA] - So an event that happened at 10:05 today may not appear in S3 until many hours later, even if your pipeline is “hourly”.2. Late‑arriving data will always break strict hourly completeness - Late data (offline devices, backfills, imports) is common. - For incremental JSON pipelines, late events are exported whenever they arrive, regardless of how old they are. That means new files for “past” hours/days will keep appearing. [Incremental pipelines; Late-arriving data] - This behavior is intentional to maximize completeness, but it means you cannot rely on S3 to be “final” for a given hour right away.3. “Hourly” describes cadence, not analysis‑ready freshness - With incremental pipelines, the next run starts after the previous one finishes, so “hourly” is approximate and can drift. [Incremental pipelines; Hourly variance FAQ] - Combined with the SLA, this makes near‑real‑time hourly funnel analysis in S3 inherently unreliable.Given this, if you need accurate, current‑day hourly funnel analysis, the options (from the docs) are essentially: - Use Mixpanel UI / APIs directly for “live” analysis, and treat the S3 pipeline as a slightly delayed, eventually consistent copy for warehousing and offline reporting. - Or, if you must use S3, build a delay into your reporting window (e.g., only trust data up to “now minus 24h” or “now minus X hours”) so that most late data has arrived, accepting that it will never be perfectly in‑sync.The documentation does not describe any configuration that guarantees fully in‑sync, real‑time hourly exports for funnel analysis; the SLA and late‑data behavior explicitly allow the kind of lag you’re seeing. [SLA; Incremental pipelines]
kapa.ai is still learning and improving, please let me know how I did by reacting below.
ℹ️ Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
