I would like to add some points in the experimentation blog, blogged by @thauckzulily's in Zulily’s
engineering website - Link. The experimental approach had an interesting exploration~exploitation
battle for website optimization. They took an approach in the use
of simulation for experimentation. I would
like to add some points on the Power Discussion which was the prime factor of
the discussion. As per the Power calculation (and as the graphical picture
depicts), there are two ways to achieve the high points in Power:-
- The Larger the Difference in
conversion, the smaller the chances of un-detectable
- The Larger the sample size, the smaller the chances of un-detectable ( Type II error)
Mean Difference in Conversion Rate @ a = 0.05
The need for experiment
starts with the detection of improvement with the Hypothesis check on KPI's.
Most of the times, the experimentation would be a check in detecting the
statistically significant increment in KPI's considered. Hence, Sample size gets
bigger to decrease the chances of un-detectable significance.
Consider this scenario, if the experimentation is
actually “not-significant”, the probability that the shift will be detected on the first sample is 1- β,
–
the second sample is b(1-β)
–
the rth sample is βr-1(1-β)
– Hence 1 / (1-β)
– Hence 1 / (1-β)
Nature - In Control
|
Nature - Out of Control
|
|
We Conclude - In Control
|
Confidence, 1 - α
|
Experimentation error, β
|
Conclude - Out Of Control
|
Error - α
|
Power, 1 - β
|
Above table showcases the hypothesis for Power and Confidence.
Type
I error: Concluding there is no significant conversion when the actual scenario
is the conversion is significant
Type
II error: Concluding there is a significant conversion when the actual scenario
is that it doesn’t have any conversion
Ie.,
P{type I error} = P{reject H0 |H0 is true}
=P{conclude no significant conversion |although
conversion is significant}
Type
II error (consumer’s risk): P{type II error} = P{fail to reject H0
|H0 is false}
=P {conversion is significant | although
no significant conversion }
Power
of the test: Power = 1 - β = P {reject
H0 |H0 is false}
The ultimate aims
of experiments are to find a statistical significance in finding a difference
between two treatments.
Confidence:-
Consider
we have started our experimentation; we are interested in knowing a minimum no.
of samples where we could detect the Significance. The definition is out-of-control
significance. We would like to know the minimum sample number from which we
could detect the difference.
N
min = 1/ α. The
best analogy for confidence and Power was with the simulation result as in
referred blog.
Though Power and confidence aren't related in terms of Hypothesis, the increase in confidence would result in increase in Sample size need, thereby increasing the power value.
Logistic Regression cheat sheet:-
Consider
this Logit Regression result, the small trick in seeing Confidence Interval is
that it the coef’s CI ~ C (treatment) T.B ranges from Positive to Negative. This
Positive to Negative value ambivalence would also be reflected in Odds Ration’s
CI.
It
conveys that there is a Statistical insignificance in the model’s intercept
which means either there is insignificance with respect to the Contol vs
Treatment Method or there is no statistical evidence that Treatment is better
than the Control Exposure.
Contingency Table
Calculation:-
Contingency
table is on testing the hypothesis of rows vs columns dependency. Consider this
example (numbers are made up for illustration), we could analyze the hypothesis
on how well the Advertising medium is independent of Landing Page Variable. Observed
Frequency Table illustrates the Advertising medium, our user went through the
Landing Page A or Landing Page B. Expected Frequency Table illustrates the
expected value in the table as an ideal condition for independence.
Example:
-
Observed Frequency
Landing
Page
|
Advertisement
Medium
|
|||
Medium
I
|
Medium
II
|
Medium
III
|
Total
|
|
Landing
A
|
160
|
140
|
40
|
340
|
Landing
B
|
40
|
60
|
60
|
160
|
Total
|
200
|
200
|
100
|
500
|
Expected Frequency
Landing
Page
|
Advertisement
Medium
|
|||
Medium
I
|
Medium
II
|
Medium
III
|
Total
|
|
Landing
A
|
136
|
136
|
68
|
340
|
Landing
B
|
64
|
64
|
32
|
160
|
Total
|
200
|
200
|
100
|
500
|
Consider
the Hypothesis: - Landing Page is Independent of Advertising Medium at α
= 0.05
After
the calculation on Chi-Square with these Degrees of Freedom, we could conclude
on the Hypothesis which we are testing.
Fewer the facts, Stronger the opinions. – “Arnold Glasow”
Thanks
for your time in reading through the blog. Please feel free to comment on any
of the terminologies, calculations as said above.
Hi Sairaam, thanks for taking the time to expand on the idea. Interesting read.
ReplyDeleteTrent, Thanks for the reply. Hope its aligned with our discussion and your blog.
Delete