kapa.ai Here's our current retry implementation: Proxy Retry Logic (MixpanelProxyClient) // Constants private val RETRYABLE_STATUS_CODES = setOf(429, 502, 503, 504) private const val RATE_LIMIT_BODY_MARKER = "Too many requests" private const val MAX_RETRIES = 6 private const val INITIAL_DELAY_MS = 2_000L // Start at 2s private const val MAX_DELAY_MS = 60_000L // Cap at 60s private const val JITTER_RANGE_MS = 5_000L // 1-5s random jitter // Retry loop inside forward() for (attempt in 0..MAX_RETRIES) { if (attempt > 0) { // Exponential backoff: 2s → 4s → 8s → 16s → 32s → 60s (capped) + 1-5s jitter val exponentialDelay = (INITIAL_DELAY_MS * (1 shl (attempt - 1))).coerceAtMost(MAX_DELAY_MS) val jitter = Random.nextLong(1_000L, JITTER_RANGE_MS + 1) val delayMs = exponentialDelay + jitter delay(delayMs) } val response = try { // Forward to api.mixpanel.com httpClient.post(targetUrl) { ... } } catch (e: Exception) { // Network failure → synthetic 502, will be retried MixpanelProxyResponse(statusCode = HttpStatusCode.BadGateway, body = """{"error": "proxy_error"}""") } // Normalize: 504 with "Too many requests" body → 429 // (Mixpanel returns 429 but intermediary LB/proxy surfaces it as 504) val normalizedStatus = if (status != 429 && body.contains("Too many requests")) 429 else status // Never retry 400 — validation errors always fail and waste rate-limit quota if (response.statusCodeValue == 400) return response // Success or non-retryable error → return immediately if (response.statusCodeValue !in RETRYABLE_STATUS_CODES) return response // Otherwise: retryable (429/502/503/504) → loop continues with backoff } // All 6 retries exhausted → return last response (biz layer logs + Slack alerts) return lastResponse Backoff Sequence (per attempt) ┌─────────┬──────────────┬─────────────────┬─────────────┐ │ Attempt │ Base Delay │ + Jitter (1-5s) │ Total Range │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 1 │ 2s │ 1-5s │ 3-7s │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 2 │ 4s │ 1-5s │ 5-9s │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 3 │ 8s │ 1-5s │ 9-13s │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 4 │ 16s │ 1-5s │ 17-21s │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 5 │ 32s │ 1-5s │ 33-37s │ ├─────────┼──────────────┼─────────────────┼─────────────┤ │ 6 │ 60s (capped) │ 1-5s │ 61-65s │ └─────────┴──────────────┴─────────────────┴─────────────┘ Total retry window: ~2 minutes worst case before giving up. What We Handle - 429 / 502 / 503 / 504 → retry with exponential backoff - 504 with "Too many requests" body → normalize to 429, then retry (catches LB-masked rate limits) - 400 (validation error) → never retry, return immediately (per Mixpanel docs) - Network exceptions → treated as 502, retried with same backoff - Jitter → random 1-5s per attempt to desynchronize parallel requests
Current Retry Implementation Proxy Layer (MixpanelProxyClient) - Retryable status codes: 429, 502, 503, 504 - Body-based detection: Also retries when response body contains "Too many requests" (catches 504s that are actually rate-limits from intermediary LBs) - Status normalization: If body says "Too many requests" but status is not 429, we normalize it to 429 before returning to the client - 3 retries with fixed delays: 30s → 1m → 2m (+ up to 25% random jitter) - Total retry window: ~3.5 minutes before giving up - 400 errors are never retried — logged as validation errors and returned immediately (per Mixpanel's recommendation) - Network exceptions (connection failures) are also retried with the same backoff kapa.ai Error Classification (Biz Layer) After all retries are exhausted, we classify and alert: - RATE_LIMITED — 429/503/504 or body-detected rate-limits - NETWORK_ERROR — genuine 502s (not rate-limit related) - VALIDATION_ERROR — 400s from Mixpanel (not retried) - MIXPANEL_API_ERROR — any other non-2xx What We Don't Do Yet - No gzip compression on forwarded requests - No batching/coalescing — each SDK request is forwarded 1:1 - No concurrency limiting — all incoming requests hit Mixpanel in parallel - No hot shard protection — no per-distinct_id daily event volume tracking
here I am getting this suddenly, there are a lot of errors for this [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:56 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: engage Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests ` [5:57 PM] 🔴 [PROD] Mixpanel Proxy Error: MIXPANEL_API_ERROR 🚨 <@S0ACHGFHGS0> cc: @Mani - Attention needed! Error Type: MIXPANEL_API_ERROR Path: track Message:
Mixpanel returned 504statusCode: 504 responseBody: `Too many requests `
