Table of Contents

Problem

A/B Testing introduction for a Spanish Translation website

The problem

An e-commerce website is looking into purchases of Spanish-speaking countries. They want to know whether implementing standardized Spanish translation versus localized translation -i.e, Mexican Spanish for Mexico- has any effect on sales. Managers noticed that Spain-based users have a conversion rate higher than any other Spanish-speaking country. They suggested that one reason could be translation. All Spanish- speaking countries originally had the same translation of the site which was written by a translator from Spain.

Proposed Solution

They agreed to try a test where each country would have its one translation written by a local (Argentinian users would see a translation written by an Argentinian, Mexican users by a Mexican, and so on), replicating what happened with Spanish users. As for users from Spain, they would have no change since their translation is already localized to Spanish.

A. Hypothesis

Including a localized Spanish translation for each country's dialect will increase conversions for Spanish-speaking countries other than Spain.

B. Metric

We will be using conversion as the metric to test our hypothesis. Conversion is defined as the number of customers who sign up for the company's website, given they have been exposed to the translation.

C. Experiment

Our goal from this experiment is to understand the effect of having local translation from each country on user conversion, which is done by randomly dividing visitors into equal groups for each country, and having one group (control group) exposed to the original Spanish translation, and the other (treatment group) exposed to a more localized Spanish translation. We want to measure conversion for each group after having been exposed to respective translations, and see whether having a localized translation results in a significant difference between conversions coming from users viewing the control version versus the treatment version.

In this problem, a hypothetical A/B test shows that users in Spain has a much higher conversion rate than other Spanish speaking countries.

Test offers following logic: All Spanish translations are done by Spaniards and that is responsible for higher conversion rate.

However, when the A/B test data results come, it gives negative result. This means that non-localized translation does better.

This exercise seeks following answers:

Imports

Useful Scripts

Load the data

Data Processing

Missing Values

Combine the datasets

Sanity checks

EDA

Correlations

Unique Values

Conversion distribution among test and control group

Country

Test vs control for each country

Conversion Rates For Date

EDA of other categorical features

EDA for continuous variables

Statistics

Overall Statistics

Fishers exact test for 2 by 2 contingency table

T-test for each countries

Sample size check for power analysis

Permutation Test

Ref: http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/

Perform a two-sided permutation test to test the null hypothesis that two groups, "treatment" and "control" come from the same distribution. We specify alpha=0.05 as our significance level.

Proportion z-test

Regression Approach

We will add an intercept column and use regression modelling.

Conclusions

Total run time