Thursday, 11 December 2014

An Extended Discussion on Statistical Experimentation

I would like to add some points in the experimentation blog, blogged by @thauckzulily's in Zulily’s engineering website - Link. The experimental approach had an interesting exploration~exploitation battle for website optimization. They took an approach in the use of simulation for experimentation. I would like to add some points on the Power Discussion which was the prime factor of the discussion. As per the Power calculation (and as the graphical picture depicts), there are two ways to achieve the high points in Power:-
  • The Larger the Difference in conversion, the smaller the chances of un-detectable
  • The Larger the sample size, the smaller the chances of un-detectable ( Type II error)

Mean Difference in Conversion Rate @ a = 0.05

The need for experiment starts with the detection of improvement with the Hypothesis check on KPI's. Most of the times, the experimentation would be a check in detecting the statistically significant increment in KPI's considered. Hence, Sample size gets bigger to decrease the chances of un-detectable significance.

Consider this scenario, if the experimentation is actually “not-significant”, the probability that the shift will be detected on the first sample is 1- β, 
         the second sample is b(1-β)
         the rth sample is βr-1(1-β)
             Hence 1 / (1-β)

Nature -  In Control
Nature -  Out of Control
We Conclude - In Control
Confidence, 1 - α
Experimentation error, β
Conclude - Out Of Control
Error - α
Power, 1 - β

Above table showcases the hypothesis for Power and Confidence.

Type I error: Concluding there is no significant conversion when the actual scenario is the conversion is significant

Type II error: Concluding there is a significant conversion when the actual scenario is that it doesn’t have any conversion

Ie., P{type I error} = P{reject H0 |H0 is true}
        =P{conclude no significant conversion |although conversion is significant}
Type II error (consumer’s risk): P{type II error} = P{fail to reject H0 |H0 is false}
         =P {conversion is significant | although no significant conversion }

Power of the test: Power = 1 - β    = P {reject H0 |H0 is false}

The ultimate aims of experiments are to find a statistical significance in finding a difference between two treatments. 

Confidence:-
Consider we have started our experimentation; we are interested in knowing a minimum no. of samples where we could detect the Significance. The definition is out-of-control significance. We would like to know the minimum sample number from which we could detect the difference.
N min = 1/ α. The best analogy for confidence and Power was with the simulation result as in referred blog.

Though Power and confidence aren't related in terms of Hypothesis, the increase in confidence would result in increase in Sample size need, thereby increasing the power value.

Logistic Regression cheat sheet:-
Consider this Logit Regression result, the small trick in seeing Confidence Interval is that it the coef’s CI ~ C (treatment) T.B ranges from Positive to Negative. This Positive to Negative value ambivalence would also be reflected in Odds Ration’s CI.

It conveys that there is a Statistical insignificance in the model’s intercept which means either there is insignificance with respect to the Contol vs Treatment Method or there is no statistical evidence that Treatment is better than the Control Exposure.




Contingency Table Calculation:-
Contingency table is on testing the hypothesis of rows vs columns dependency. Consider this example (numbers are made up for illustration), we could analyze the hypothesis on how well the Advertising medium is independent of Landing Page Variable. Observed Frequency Table illustrates the Advertising medium, our user went through the Landing Page A or Landing Page B. Expected Frequency Table illustrates the expected value in the table as an ideal condition for independence.

Example: -
Observed Frequency
Landing Page
Advertisement Medium

Medium I
Medium II
Medium III
Total
Landing A
160
140
40
340
Landing B
40
60
60
160
Total
200
200
100
500

Expected Frequency
Landing Page
Advertisement Medium

Medium I
Medium II
Medium III
Total
Landing A
136
136
68
340
Landing B
64
64
32
160
Total
200
200
100
500

Consider the Hypothesis: - Landing Page is Independent of Advertising Medium at α = 0.05
After the calculation on Chi-Square with these Degrees of Freedom, we could conclude on the Hypothesis which we are testing.

Fewer the facts, Stronger the opinions. – “Arnold Glasow”

Thanks for your time in reading through the blog. Please feel free to comment on any of the terminologies, calculations as said above.