At the time of this experiment, Udacity courses currently have two options on the home page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot shows what the experiment looks like.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

- Start free trial to allow paid version
- Access course materials for free

- users see this popup after click
- if they're not committed, warning them.

We have two initial hypothesis.

- “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
- ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

Unit of diversion: cookie

- users in free trial tracked by user-id
- same user-id can't enroll in free trial twice
- users not enroll can't tracked.

- Number of cookies
- Number of clicks
- Click through probability

Sanity Check is useful when we want to make sure that the data filtered for experiment and control group is the same. This can be done using the right invariance metric. These three metrics shouldn't change because it's outside of the experiment, in a sense that these metric calculated all before the experiment begin.

Number of cookies who views the page should be the same when Udacity experiment. They haven't click the "Start Now" button and see "Free Trial Screener" experiment. So number of cookies can be used as invariant metrics. When users click the button, they also haven't yet see the experiment that Udacity does, so number of clicks shouldn't change between experiment and control groups.

Since the experiments only occurs ** after** the users click the "Start Now" button, its click-through-probability also have to be the same for each experiment and control group. We know that number of cookies and number of clicks has to be the same, then click-thorough-probability also has to be the same.

Besides cookie-id, there is also user-id. But user-id is not a good invariant, because Udacity also open to unregistered users to view page until after click of a button.

- Gross Conversion
- Net Conversion

We have two initial hypothesis.

- “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
- ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

For evaluation metrics, I choose Gross Conversion, Retention, and Net Conversion. All of these metrics are a good evaluation metrics since they change when the experiment change, and since each of the metrics has user-ids as the unit of analysis, should be much smaller standard error since Udacity also using it as the cookie of diversion.

Gross conversion is the number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. After the visitors click the button, they should see the screener, hence the warning. It should be makes other visitors that doesn't have serious commitment back down and cancel it right away.

Retention is number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. The experiment intend to focus the visitors that only want to make a serious commitment. The retention rate should be higher for experiment group than the control group.

Net conversion is number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. Net Conversion also true, since the experiment intend to see higher conversion rate for students to continue (at least make one payment) than the users that only click the button, that doesn't even see the warning experiment given.

Ouf of these metrics, Retention turns out have a longer duration, which is 118 days. This takes too long, and it's not something Udacity willing to give for the experiment. So Retention will be excluded.

- Gross Conversion: 0.02
- Net Conversion 0.0156

Expect analytical variance match empirical variance because unit of analysis and unit of diversion is same.

To calculate standard deviation, we use this formula

```
Formula = np.sqrt(p * (1-p) / n)
```

and using baseline data below.

In [34]:

```
baselines= """Unique cookies to view page per day: 40000
Unique cookies to click "Start free trial" per day: 3200
Enrollments per day: 660
Click-through-probability on "Start free trial": 0.08
Probability of enrolling, given click: 0.20625
Probability of payment, given enroll: 0.53
Probability of payment, given click: 0.1093125"""
lines = baselines.split('\n')
d_baseline = dict([(e.split(':\t')[0],float(e.split(':\t')[1])) for e in lines])
```

In [79]:

```
n = 5000
n_click = n * d_baseline['Click-through-probability on "Start free trial"']
n_click
```

Out[79]:

Next, standard deviation for Gross conversion is

In [78]:

```
p = d_baseline['Probability of enrolling, given click']
round(np.sqrt(p * (1-p) / n_click),4)
```

Out[78]:

and for Net Conversion,

In [77]:

```
p = d_baseline['Probability of payment, given click']
round(np.sqrt(p * (1-p) / n_click),4)
```

Out[77]:

- Gross Conversion. Baseline: 0.20625 dmin: 0.01 = 25.839 cookies who clicks.
- Net Conversion. Baseline: 0.1093125 dmin: 0.0075 = 27,411 cookies who clicks.
- Not using Bonferroni correction.
- Using alpha = 0.05 and beta 0.2

The pageviews needed then will be: 685275 impression.

We feed it into sample size calculator.

We can use bigger number, so the minimum required cookies is sufficient. The sample size is only for one group, so output from the calculator must be doubled to get the enough pageviews. Since this only the user who clicks, we calculate number of pageviews using CTP. The pageviews needed then will be:

In [89]:

```
(27411 * 2) / d_baseline['Click-through-probability on "Start free trial"']
```

Out[89]:

- Fraction: 0.8 (
*Low risk*) - Duration: 22 days (
*40000 pageviews/day*)

The fraction of experiment exposure to Udacity visitors will be 80%. The experiment isn't risky enough that may potentially leaked as blog news or article. It doesn't really big a news, as Udacity only want to put little warning to the users. Because only 40000 pageviews each day can be gathered, the duration will be 22 days.

This is where Retention metric fail for our evaluation metrics. It has a longer duration, which is 118 days. This takes too long, and it's not something Udacity willing to give for the experiment. So Retention will be excluded.

Number of Cookies:

- Bounds = (0.4988,0.5012)
- Observed = 0.5006
- Passes? Yes

Number of clicks on “Start free trial”:

- Bounds = (0.4959,0.5041)
- Observed = 0.5005
- Passes? Yes

Click-through-probability on “Start free trial”:

- Bounds = (0.0812,0.0830)
- Observed = 0.0821
- Passes? Yes

Since we have passed all of the sanity checks, we can continue to analyze the experiment.

In [4]:

```
control = pd.read_csv('control_data.csv')
experiment = pd.read_csv('experiment.csv')
```

In [5]:

```
control.head()
```

Out[5]:

In [6]:

```
experiment.head()
```

Out[6]:

Next, we count the total views and clicks for both control and experiment groups.

In [38]:

```
control_views = control.Pageviews.sum()
control_clicks = control.Clicks.sum()
experiment_views = experiment.Pageviews.sum()
experiment_clicks = experiment.Clicks.sum()
```

In [7]:

```
def sanity_check_CI(control,experiment,expected):
SE = np.sqrt((expected*(1-expected))/(control + experiment))
ME = 1.96 * SE
return (expected-ME,expected+ME)
```

Now for sanity checks confidence interval of number of cookies who views the page,

In [42]:

```
sanity_check_CI(control_views,experiment_views,0.5)
```

Out[42]:

The actual proportion is

In [60]:

```
float(control_views)/(control_views+experiment_views)
```

Out[60]:

Since we know that 0.5006 is within the interval, then experiment pass sanity checks for number of cookies.

Next, we calculate confidence interval of number of clicks at "Start free trial" button.

In [44]:

```
sanity_check_CI(control_clicks,experiment_clicks,0.5)
```

Out[44]:

And the actual proportion,

In [61]:

```
float(control_clicks)/(control_clicks+experiment_clicks)
```

Out[61]:

Again 0.5006 is within the interval, so our experiment also pass the sanity check.

In [21]:

```
ctp_control = float(control_clicks)/control_views
ctp_experiment = float(experiment_clicks)/experiment_views
```

In [3]:

```
# %%R
c = 28378
n = 345543
CL = 0.95
pe = c/n
SE = sqrt(pe*(1-pe)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
ME = z_star * SE
c(pe-ME, pe+ME)
```

Out[3]:

In [4]:

```
ctp_experiment
```

Out[4]:

- Did not use Bonferroni correction
- Gross Conversion
- Bounds = (-0.0291, -0.0120)
- Statistical Significance? Yes
- Practical Significance? Yes

- Net Conversion
- Bounds = (-0.0116,0.0019)
- Statistical Significance? No
- Practical Significance? No

In [29]:

```
get_gross = lambda group: float(group.dropna().Enrollments.sum())/ group.Clicks.sum()
get_net = lambda group: float(group.dropna().Payments.sum())/ group.Clicks.sum()
```

Keep in mind that observed_difference can be negative

In [40]:

```
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Enrollments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Enrollments.sum())
```

In [3]:

```
X_exp/N_exp
```

Out[3]:

In [4]:

```
X_cont/N_cont
```

Out[4]:

In [1]:

```
#%%R
N_cont = 17293
X_cont = 3785
N_exp = 17260
X_exp = 3423
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
```

Out[1]:

In [2]:

```
observed_diff
```

Out[2]:

In [43]:

```
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Payments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Payments.sum())
```

In [5]:

```
X_exp/N_exp
```

Out[5]:

In [6]:

```
X_cont/N_cont
```

Out[6]:

In [11]:

```
#%%R
N_cont = 17293
X_cont = 2033
N_exp = 17260
X_exp = 1945
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
```

Out[11]:

In [12]:

```
observed_diff
```

Out[12]:

- Did not use Bonferroni correction
- Gross Conversion
- p-value = 0.0026
- Statistical Significance? Yes

- Net Conversion
- p-value = 0.6776
- Statistical Significance? No

Sign Test is also a test that must be confirmed with effect size test. I'm using Online Calculator to calculate the binomial p-value, whether the probability of experiment is higher than control groups. If we simulate it, what ar the odds. If the probability is so rare, that isn't likely due to chance, then the experiment succeed, provided significance level, which I choose to be 5%.

I'm using helper function, to compare probability day-to-day whether the metric in question is smaller for group than the experiment.

In [1]:

```
compare_prob = lambda col: ((control.dropna()[col] / control.dropna().Clicks) <
(experiment.dropna()[col]/experiment.dropna().Clicks))
```

Count the gross conversion, I got,

In [4]:

```
compare_prob('Enrollments').value_counts()
```

Out[4]:

In [5]:

```
compare_prob('Payments').value_counts()
```

Out[5]:

I got p-value of 0.6776 for Net Conversion.

- Not use Benferroni correction.
- Gross Conversion need significant but Net Conversion doesn't.

- Gross Conversion: pass
Net Conversion: somehow pass, can loss potential money

decision: risky. delay for further experiment or cancel the launch.

So what does it take to decrease not-so-serious users without losing potential money? I see that on every course overview page in Udacity, they already given information about the hours spent on particular course. So really, warning them again about time commitment might be unnecessary. What we could do, is giving them an incentive after their enrollment. In an experiment, after the students enroll, they are given an information on the right side of the video material page. An incentive of offering free payment until they graduate. The deal is they have to be Udacity Code Reviewer. Udacity has this program. It gives reasonable payment per hour to whoever graduates reviewing the students' code. If they agree, they can click the button “Start debt program” below the information page.

They will be able to continue after 14-day boundary and finished the course. But in return, they have to be Code Reviewer, and finish the debt through payroll. They won't be given any salary until their debt finished.

Yes, it seems risky to Udacity. But if the users break on their agreement, for example not become Code Reviewer within two months, they will be automatically charged through their registered credit card. They will also automatically charged if they cancel the program. So it’s safe to assume that we have handled risk of potential runner, but this is not part of the experiment.

The hypothesis is that after they’re given an incentive, they become more serious and committed to complete the course. By doing this incentive, number of users who cancel early in the course is also significantly reduced, and boost them compared to ones which already committed.

The unit of diversion is an user-id. Like free trial, the same user-id can’t follow the debt program twice. User-id is more cross-platform and more represent as an user than a cookie. User-ids that don’t enroll in the program, is not tracked in the experiment. The number of user-ids that are in debt program, but cancel at the end of the free trial is also not tracked.

- Not necessary to show warning
- Start Debt Program
- Risky, users break agreement
- Not become Udacity Code Reviewer
- Cancel in midway program

- Hypothesis
- Non-serious users become more committed after incentive
- Number of users who cancel early is reduced
- Boost compared to already committed

We can use Invariant metrics for this experiment for the follow-up:

**Number of cookies**: That is, number of unique cookies to view the course overview page.**Number of clicks**: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger).**Click-through-probability**: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page**Gross conversion**: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button.

And the evaluation metric:

**Debt Conversion**: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids that enroll in the free trial.**Debt-Net conversion**: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment)**Net conversion**: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button.