We do it through backend, and Mixpanel records events for each A/B test with variant as a property (I then make user cohorts based on it)
The challenge is seeing if the difference between variants is statistically significant, which is why I use Mixpanel Experiment feature, that calculates the significance
However, (as far as I know) it allows for only 2 variants, which means we have to find a way to analyse a test with involves 3 variants
Especially because the more variants there is, the more comparisons we have to make (eg. A vs B, A vs C, B vs C) increasing the chance for type 1 error, so there needs to be a correction involved
If you know a way to analysing it, I'm all ears!