I would like to add some points in the experimentation blog, blogged by @thauckzulily's in Zulily’s
engineering website  Link. The experimental approach had an interesting exploration~exploitation
battle for website optimization. They took an approach in the use
of simulation for experimentation. I would
like to add some points on the Power Discussion which was the prime factor of
the discussion. As per the Power calculation (and as the graphical picture
depicts), there are two ways to achieve the high points in Power:
 The Larger the Difference in
conversion, the smaller the chances of undetectable
 The Larger the sample size, the smaller the chances of undetectable ( Type II error)
Mean Difference in Conversion Rate @ a = 0.05
The need for experiment
starts with the detection of improvement with the Hypothesis check on KPI's.
Most of the times, the experimentation would be a check in detecting the
statistically significant increment in KPI's considered. Hence, Sample size gets
bigger to decrease the chances of undetectable significance.
Consider this scenario, if the experimentation is
actually “notsignificant”, the probability that the shift will be detected on the first sample is 1 β,
–
the second sample is b(1β)
–
the r^{th} sample is β^{r1}(1β)
– Hence 1 / (1β)
– Hence 1 / (1β)
Nature  In Control

Nature  Out of Control


We Conclude  In Control

Confidence, 1  α

Experimentation error, β

Conclude  Out Of Control

Error  α

Power, 1  β

Above table showcases the hypothesis for Power and Confidence.
Type
I error: Concluding there is no significant conversion when the actual scenario
is the conversion is significant
Type
II error: Concluding there is a significant conversion when the actual scenario
is that it doesn’t have any conversion
Ie.,
P{type I error} = P{reject H_{0} H_{0} is true}
=P{conclude no significant conversion although
conversion is significant}
Type
II error (consumer’s risk): P{type II error} = P{fail to reject H_{0}
H_{0} is false}
=P {conversion is significant  although
no significant conversion }
Power
of the test: Power = 1  β = P {reject
H_{0} H_{0} is false}
The ultimate aims
of experiments are to find a statistical significance in finding a difference
between two treatments.
Confidence:
Consider
we have started our experimentation; we are interested in knowing a minimum no.
of samples where we could detect the Significance. The definition is outofcontrol
significance. We would like to know the minimum sample number from which we
could detect the difference.
N
min = 1/ α. The
best analogy for confidence and Power was with the simulation result as in
referred blog.
Though Power and confidence aren't related in terms of Hypothesis, the increase in confidence would result in increase in Sample size need, thereby increasing the power value.
Logistic Regression cheat sheet:
Consider
this Logit Regression result, the small trick in seeing Confidence Interval is
that it the coef’s CI ~ C (treatment) T.B ranges from Positive to Negative. This
Positive to Negative value ambivalence would also be reflected in Odds Ration’s
CI.
It
conveys that there is a Statistical insignificance in the model’s intercept
which means either there is insignificance with respect to the Contol vs
Treatment Method or there is no statistical evidence that Treatment is better
than the Control Exposure.
Contingency Table
Calculation:
Contingency
table is on testing the hypothesis of rows vs columns dependency. Consider this
example (numbers are made up for illustration), we could analyze the hypothesis
on how well the Advertising medium is independent of Landing Page Variable. Observed
Frequency Table illustrates the Advertising medium, our user went through the
Landing Page A or Landing Page B. Expected Frequency Table illustrates the
expected value in the table as an ideal condition for independence.
Example:

Observed Frequency
Landing
Page

Advertisement
Medium


Medium
I

Medium
II

Medium
III

Total


Landing
A

160

140

40

340

Landing
B

40

60

60

160

Total

200

200

100

500

Expected Frequency
Landing
Page

Advertisement
Medium


Medium
I

Medium
II

Medium
III

Total


Landing
A

136

136

68

340

Landing
B

64

64

32

160

Total

200

200

100

500

Consider
the Hypothesis:  Landing Page is Independent of Advertising Medium at α
= 0.05
After
the calculation on ChiSquare with these Degrees of Freedom, we could conclude
on the Hypothesis which we are testing.
Fewer the facts, Stronger the opinions. – “Arnold Glasow”
Thanks
for your time in reading through the blog. Please feel free to comment on any
of the terminologies, calculations as said above.
Hi Sairaam, thanks for taking the time to expand on the idea. Interesting read.
ReplyDeleteTrent, Thanks for the reply. Hope its aligned with our discussion and your blog.
Delete