A simulation study inspired by experiments in improving Wikipedia editing experience, and demonstrating multiple methodologies for analyzing data.
In these notes we will use the statistical software and programming language R, along with a few packages. RStudio IDE is highly recommended, but not required.
Use the following code to install those packages if you want to follow along:
install.packages(
c("tidyverse", "zeallot", "lme4", "car", "brms", "tidybayes", "cmdstanr"),
repos = c("https://mc-stan.org/r-packages/", getOption("repos"))
)
cmdstanr::install_cmdstan()
# Optional:
install.packages(c("gt", "gtsummary", "broom.mixed")) # tables
Suppose we’re running a test of a new editor interface for making edits to an article and we wish to know whether this new editing interface is an improvement – defined as “whether the user published the edit after starting it.” Specifically, we’re interested in a lift of 20% in the publish (successful edit attempt) probability.
We will build a model of edit attempt outcomes \(y\) – which is 1 if the edit was published and 0 if it was not. We use \(p\) to indicate the probability of a successful edit – in other words: \(p = \text{Pr}(y = 1)\). The simplest version of our model is a logistic regression:
\[ \begin{align*} y &\sim \text{Bernoulli}(p)\\ \text{logit}(p) &= \beta_0 + \beta_1 \times \text{used new interface} \end{align*} \]
where \(\beta_0\) is the intercept, and \(\beta_1\) is the slope and the effect associated with using the new interface. In other words:
\[ \begin{align*} & \text{Pr}(\text{edit published}|\text{did not use new interface}) & =~ & \text{logit}^{-1}(\beta_0)\\ & \text{Pr}(\text{edit published}|\text{used new interface}) & =~ & \text{logit}^{-1}(\beta_0 + \beta_1) \end{align*} \]
The goal of the statistical model here is to infer the values of \(\beta_0\) and \(\beta_1\).
It is important to remember that these model parameters (also called coefficients) are on the log-odds scale, and that when we interpret their estimates we must be careful and sometimes we need to apply transformations to make sense of them.
For example, (Gelman, Hill, and Vehtari 2021) suggest using the “divide-by-4” rule for convenience. Under this transformation \(\beta_1/4\) is a reasonable approximation of the maximum increase in the probability of success corresponding to a unit difference – in our case the difference between which interface was used (\(\text{used new interface} = 0\) vs \(\text{used new interface} = 1\)).
Alternatively, \(\exp(\beta_1)\) – or \(e^{\beta_1}\) – represents the multiplicative effect on the odds of an edit getting published.
To see the methods in action we will simulate users, their edit attempts, and the outcomes of those attempts – whether they were successful (edit published) or not – and then see how the models are able to recover the true values of the parameters which were used in the simulation.
where b1
is the effect of the new editor interface on
the log-odds scale (refer to the introduction above).
simple_example <- tibble(
new_interface = c(FALSE, TRUE),
formula = c("\\[\\beta_0\\]", "\\[\\beta_0 + \\beta_1\\]")
) %>%
mutate(
log_odds = b0 + b1 * new_interface,
prob = invlogit(log_odds)
)
simple_example %>%
gt() %>%
fmt(vars(new_interface), fns = function(x) ifelse(x, "New", "Old")) %>%
fmt_percent(vars(prob), decimals = 1) %>%
cols_label(
new_interface = "Editing interface",
formula = html("logit<sup>-1</sup>(p)"),
log_odds = "Log-odds scale",
prob = "Publish probability"
) %>%
tab_header(
"Logistic regression model",
html(sprintf("Using β<sub>0</sub> = %.1f and β<sub>1</sub> = %.1f", b0, b1))
)
Logistic regression model | |||
---|---|---|---|
Using β0 = -0.7 and β1 = 0.8 | |||
Editing interface | logit-1(p) | Log-odds scale | Publish probability |
Old | \[\beta_0\] | -0.7 | 33.2% |
New | \[\beta_0 + \beta_1\] | 0.1 | 52.5% |
Let’s generate some data to use for our study! We will first build a user simulator and then use that to simulate many users making edits, sometimes using the new interface and sometimes using the old interface.
simulate_user_edit_attempts <- function() {
# unique for each user; informs that user's "base publish prob."
user_intercept <- b0 + rnorm(1, 0, 0.75)
# how many edit attempts did this user make?
n_edit_attempts <- ceiling(rexp(1, 1/50))
# simulate very active users either very into new interface or not
prob_using_new_interface <- ifelse(
n_edit_attempts > 150, # 'active': 150+ edit attempts
sample(c(0.05, 0.85), 1), # for active users
rbeta(1, 1.5, 1.5) # for everyone else
)
# simulate which edit attempts were performed with which interface:
used_new_interface <- rbernoulli(n_edit_attempts, p = prob_using_new_interface)
# each edit attempt's success prob. depends on user and interface used:
user_log_odds <- user_intercept + b1 * used_new_interface
success_probabilities <- invlogit(user_log_odds)
# simulate each edit attempt's outcome, based on that edit attempt's prob.:
edit_attempts <- map_lgl(success_probabilities, rbernoulli, n = 1)
# return simulated user's edits and outcomes:
tibble(
user_intercept = user_intercept,
user_success = boot::inv.logit(user_intercept),
new_interface = used_new_interface * 1L,
success_prob = success_probabilities,
edit_success = edit_attempts * 1L
)
}
To see this simulation in action, let’s simulate one user and print the first 10 edit attempts:
# A tibble: 10 × 5
user_intercept user_success new_interface success_prob edit_success
<dbl> <dbl> <int> <dbl> <int>
1 0.247 0.561 0 0.561 1
2 0.247 0.561 1 0.740 0
3 0.247 0.561 1 0.740 1
4 0.247 0.561 1 0.740 1
5 0.247 0.561 1 0.740 0
6 0.247 0.561 0 0.561 1
7 0.247 0.561 0 0.561 0
8 0.247 0.561 0 0.561 1
9 0.247 0.561 1 0.740 1
10 0.247 0.561 0 0.561 0
Notice how the user’s individual intercept gets transformed into their base probability, how the choice of the interface either keeps it the same or lifts it, and how even using the new interface (and in this case having 74% probability of publishing the edit) did not yield successes in every instance.
That was just one user, but our analysis will require more. Let’s simulate 100 users:
First, let’s see what results we get from a simple model that treats all edit attempts as independent and interchangeable, even though we know this to be wrong since edit attempts from the same user are going to share a base success probability (based on that user’s simulated intercept):
The results are:
fit0 %>%
tbl_regression(
label = list(new_interface = "Using new interface"),
intercept = TRUE
)
Characteristic | log(OR)1 | 95% CI1 | p-value |
---|---|---|---|
(Intercept) | -0.82 | -0.90, -0.74 | <0.001 |
Using new interface | 0.82 | 0.71, 0.93 | <0.001 |
1 OR = Odds Ratio, CI = Confidence Interval |
This model has done an OK job of estimating the coefficients \(\beta_0\) = -0.7 and \(\beta_1\) = 0.8.
However, the model above is fundamentally flawed because it assumes all of the edit attempts are independent and identically distributed, but this is not the case. A much more correct model would incorporate the fact that these edit attempts are related to each other because they were done by the same users.
Hierarchical regression models – also known as multilevel models and mixed-effects models (due to the presence of random effects in addition to fixed effects) – are used to model this structure when individuals observations are grouped. We refer to these groupings as random effects. In fact, we can even model more deeply nested structures – for example, if we had simulated multiple wikis and multiple users within those wikis then we could model this hierarchy.
The results are:
tbl1 <- fit1 %>%
tbl_regression(
label = list(new_interface = "Using new interface"),
intercept = TRUE
)
tbl1
Characteristic | log(OR)1 | 95% CI1 | p-value |
---|---|---|---|
(Intercept) | -0.77 | -0.95, -0.60 | <0.001 |
Using new interface | 0.86 | 0.72, 1.0 | <0.001 |
1 OR = Odds Ratio, CI = Confidence Interval |
This model has done a better job of estimating the coefficients \(\beta_0\) = -0.7 and \(\beta_1\) = 0.8.
In order to interpret these estimates, we have to apply a transformation. However, we need to be careful about the confidence intervals. The delta method can be used to estimate the confidence intervals of the transformed parameters:
c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit1,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation") %>%
gt(rowname_col = "transformation") %>%
fmt_number(vars(`Estimate`, `SE`, `2.5 %`, `97.5 %`)) %>%
cols_merge(vars(`2.5 %`, `97.5 %`), pattern = "({1}, {2})") %>%
cols_label(`2.5 %` = "95% CI", SE = "Standard Error") %>%
tab_header("Transformed effects", "Estimated using delta method") %>%
tab_footnote("CI: Confidence Interval", cells_column_labels(vars(`2.5 %`)))
Transformed effects | |||
---|---|---|---|
Estimated using delta method | |||
Estimate | Standard Error | 95% CI1 | |
b1/4 | 0.21 | 0.02 | (0.18, 0.25) |
exp(b1) | 2.36 | 0.17 | (2.03, 2.69) |
1 CI: Confidence Interval |
The maximum lift in edit attempt success probability is 21% (95% CI: 18%-25%), and because the confidence interval does not include 0 this is statistically significant at the 0.05 level. When using the new interface the odds of publishing the edit are multiplied by 2.36 (95% CI: 2.03-2.69) – effectively double – and in this case we would determine statistical significance (at the 0.05 level) by checking whether this confidence interval contains 1, since a multiplicative effect of 1 is no change either way.
One of the things we did in the simulation was have special behavior for highly-active users. Specifically, when a user made more than 150 simulated edit attempts our simulation made approximately 5% of their edits use the new interface OR 85% of their edits use the new interface – effectively simulating power users who are very apprehensive about the new interface or who have a strong preference for it.
most_active_users %>%
mutate(
prop_new_interface = n_new_interface / n_edits,
prop_successes = n_successes / n_edits,
prop_successes_new = n_successes_new / n_new_interface
) %>%
gt() %>%
fmt(
starts_with("prop_"),
fns = function(x) sprintf("%s%.0f%%", ifelse(x < 0.1, " ", ""), 100 * x)
) %>%
cols_merge(ends_with("_new_interface"), pattern = "{1} (<b>{2}</b>)") %>%
cols_merge(ends_with("_successes"), pattern = "{1} (<b>{2}</b>)") %>%
cols_merge(ends_with("_successes_new"), pattern = "{1} (<b>{2}</b>)") %>%
cols_align("left", vars(user_id)) %>%
cols_align("right", starts_with("n_")) %>%
cols_label(
user_id = "User ID",
n_edits = "Total edit attempts",
n_new_interface = "Edits made using new interface",
n_successes = "Total edits published",
n_successes_new = "Edits published with new interface"
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of total edits which were made using the new interface"),
locations = cells_column_labels(vars(n_new_interface))
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of edits made using the new interface which were published"),
locations = cells_column_labels(vars(n_successes_new))
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of total edits published"),
locations = cells_column_labels(vars(n_successes))
) %>%
tab_style(
cell_text(font = "monospace", size = "large"),
cells_body(starts_with("n_"))
) %>%
tab_style(
style = list(
cell_fill(color = "gray95")
),
cells_body(rows = (1:nrow(most_active_users)) %% 2 == 1)
) %>%
tab_header("Most active (simulated) users")
Most active (simulated) users | ||||
---|---|---|---|---|
User ID | Total edit attempts | Edits made using new interface1 | Edits published with new interface2 | Total edits published3 |
002 | 424 | 349 (82%) | 142 (41%) | 161 (38%) |
003 | 262 | 8 ( 3%) | 2 (25%) | 118 (45%) |
023 | 150 | 118 (79%) | 82 (69%) | 99 (66%) |
025 | 125 | 22 (18%) | 10 (45%) | 37 (30%) |
035 | 180 | 6 ( 3%) | 3 (50%) | 55 (31%) |
040 | 182 | 164 (90%) | 75 (46%) | 81 (45%) |
059 | 188 | 10 ( 5%) | 2 (20%) | 43 (23%) |
067 | 122 | 90 (74%) | 45 (50%) | 52 (43%) |
070 | 176 | 9 ( 5%) | 1 (11%) | 16 ( 9%) |
094 | 159 | 8 ( 5%) | 2 (25%) | 15 ( 9%) |
1 (%) is the proportion of total edits which were made using the new interface | ||||
2 (%) is the proportion of edits made using the new interface which were published | ||||
3 (%) is the proportion of total edits published |
These users account for 36% of all edits in this simulated dataset.
Let us see what happens to the results when we exclude those users from analysis:
Characteristic | Including most active users | Excluding most active users | ||||
---|---|---|---|---|---|---|
log(OR)1 | 95% CI1 | p-value | log(OR)1 | 95% CI1 | p-value | |
(Intercept) | -0.77 | -0.95, -0.60 | <0.001 | -0.74 | -0.92, -0.55 | <0.001 |
Using new interface | 0.86 | 0.72, 1.0 | <0.001 | 0.89 | 0.73, 1.0 | <0.001 |
1 OR = Odds Ratio, CI = Confidence Interval |
list(
`Including most active users` = c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit1,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation"),
`Excluding most active users` = c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit2,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation")
) %>%
bind_rows(.id = "dataset") %>%
gt(rowname_col = "transformation", groupname_col = "dataset") %>%
fmt_number(vars(`Estimate`, `SE`, `2.5 %`, `97.5 %`)) %>%
cols_merge(vars(`2.5 %`, `97.5 %`), pattern = "({1}, {2})") %>%
cols_label(`2.5 %` = "95% CI", SE = "Standard Error") %>%
tab_header("Transformed effects") %>%
tab_footnote("CI: Confidence Interval", cells_column_labels(vars(`2.5 %`)))
Transformed effects | |||
---|---|---|---|
Estimate | Standard Error | 95% CI1 | |
Including most active users | |||
b1/4 | 0.21 | 0.02 | (0.18, 0.25) |
exp(b1) | 2.36 | 0.17 | (2.03, 2.69) |
Excluding most active users | |||
b1/4 | 0.22 | 0.02 | (0.18, 0.26) |
exp(b1) | 2.43 | 0.20 | (2.04, 2.81) |
1 CI: Confidence Interval |
In this case the estimates did not change much when excluding these most active users, but still there is no reason not to include those users.
One of the benefits of switching to a Bayesian approach with this model is we can reason about the parameters (and any other quantities) probabilistically. This benefit will become apparent soon, but first we need to fit the Bayesian model and obtain draws of the parameters from what’s called the joint posterior distribution – so called because it’s a re-allocation of possibilities (from values of parameters we thought were likely a-priori) after taking into account observed data.
summary(fit3)
Family: bernoulli
Links: mu = logit
Formula: edit_success ~ new_interface + (1 | user_id)
Data: simulated_edits (Number of observations: 5519)
Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~user_id (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.70 0.07 0.57 0.87 1.00 1195
Tail_ESS
sd(Intercept) 2055
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -0.77 0.09 -0.94 -0.59 1.00 1137
new_interface 0.86 0.07 0.71 1.00 1.00 5915
Tail_ESS
Intercept 2088
new_interface 3332
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
tbl3 <- fit3 %>%
spread_draws(b_new_interface, b_Intercept) %>%
mutate(
exp_b = exp(b_new_interface),
b4 = b_new_interface / 4,
avg_lift = invlogit(b_Intercept + b_new_interface) - invlogit(b_Intercept)
) %>%
pivot_longer(
b_new_interface:avg_lift,
names_to = "param",
values_to = "val"
) %>%
group_by(param) %>%
summarize(
ps = c(0.025, 0.5, 0.975),
qs = quantile(val, probs = ps),
.groups = "drop"
) %>%
mutate(
quantity = ifelse(
param %in% c("b_Intercept", "b_new_interface"),
"Parameter", "Function of parameter(s)"
),
param = factor(
param,
c("b_Intercept", "b_new_interface", "exp_b", "b4", "avg_lift"),
c("(Intercept)", "Using new interface", "Multiplicative effect on odds", "Divide-by-4 rule", "Average lift")
),
ps = factor(ps, c(0.025, 0.5, 0.975), c("lower", "median", "upper")),
) %>%
pivot_wider(names_from = "ps", values_from = "qs") %>%
arrange(quantity, param)
tbl3 %>%
gt(rowname_col = "param", groupname_col = "quantity") %>%
row_group_order(c("Parameter", "Function of parameter(s)")) %>%
fmt_number(vars(lower, median, upper), decimals = 3) %>%
fmt_percent(columns = vars(median, lower, upper), rows = 2:3, decimals = 1) %>%
cols_align("center", vars(median, lower, upper)) %>%
cols_merge(vars(lower, upper), pattern = "({1}, {2})") %>%
cols_move_to_end(vars(lower)) %>%
cols_label(median = "Point Estimate", lower = "95% CI") %>%
tab_style(cell_text(weight = "bold"), cells_row_groups()) %>%
tab_footnote("CI: Credible Interval", cells_column_labels(vars(lower))) %>%
tab_footnote(
html("Average lift = Pr(Success|New interface) - Pr(Success|Old interface) = logit<sup>-1</sup>(β<sub>0</sub> + β<sub>1</sub>) - logit<sup>-1</sup>(β<sub>0</sub>)"),
cells_body(vars(median), 3)
) %>%
tab_header("Posterior summary of model parameters")
Posterior summary of model parameters | ||
---|---|---|
Point Estimate | 95% CI1 | |
Parameter | ||
(Intercept) | −0.769 | (−0.936, −0.587) |
Using new interface | 0.856 | (0.712, 1.002) |
Function of parameter(s) | ||
Multiplicative effect on odds | 2.354 | (2.039, 2.722) |
Divide-by-4 rule | 21.4% | (17.8%, 25.0%) |
Average lift | 20.5%2 | (17.1%, 23.9%) |
1 CI: Credible Interval | ||
2 Average lift = Pr(Success|New interface) - Pr(Success|Old interface) = logit-1(β0 + β1) - logit-1(β0) |
The results are essentially the same as the non-Bayesian model in the previous section. A couple of key differences:
Hypothesis testing in this paradigm works a little differently. Rather than performing null hypothesis significance testing (NHST), rejecting or failing-to-reject the null hypothesis, and calculating p-values, we can ask for probabilities of ranges.
What’s the probability using the new interface at least doubles the odds of an edit getting published?
hyp1 <- "exp(new_interface) > 2"
fit3 %>%
hypothesis(hyp1)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (exp(new_interfac... > 0 0.36 0.17 0.09 0.65
Evid.Ratio Post.Prob Star
1 74.47 0.99 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Of interest is the Post.Prob
column – the posterior
probability. In this case, we are almost absolutely sure that the odds
of an edit getting published at least double when the new interface is
being used.
What’s the probability that the average lift in edit attempt success probability is positive?
hyp2 <- "invlogit((Intercept) + new_interface) - invlogit((Intercept)) > 0"
fit3 %>%
hypothesis(hyp2)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0.2 0.02 0.18 0.23
Evid.Ratio Post.Prob Star
1 Inf 1 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
What’s the probability that the average lift in edit attempt success probability is at least 20%?
h <- "invlogit((Intercept) + new_interface) - invlogit((Intercept)) > 0.2"
(hyp3 <- hypothesis(fit3, hypothesis = h))
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0 0.02 -0.02 0.03
Evid.Ratio Post.Prob Star
1 1.53 0.6
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
There is a 0.60425 probability that the difference between the Pr(Success | New interface) and Pr(Success | Old interface) is greater than 0.2 – and it is up to the stakeholder to decide whether that is high enough for them or if they would prefer to see a probability of, say, 0.8.
Another question we may ask is “is there a significant difference in proportions of successful edits between edits made using the new interface and those made using the old interface?”
edit_counts <- simulated_edits %>%
group_by(user_id) %>%
summarize(
n_total_edits = n(),
new_ui_total_edits = sum(new_interface),
old_ui_total_edits = sum(1 - new_interface),
new_ui_successful_edits = sum(edit_success * new_interface),
old_ui_successful_edits = sum(edit_success * (1 - new_interface))
) %>%
# Only users who have used both UIs at least once:
filter(
new_ui_total_edits > 0,
old_ui_total_edits > 0
) %>%
mutate(
new_ui_prop_success = new_ui_successful_edits / new_ui_total_edits,
old_ui_prop_success = old_ui_successful_edits / old_ui_total_edits
)
Note: 4 users were excluded from this analysis because they did not use both interfaces, and this analysis requires at least one edit attempt using each of the interfaces.
tidy_counts <- edit_counts %>%
select(user_id, contains("ui")) %>%
pivot_longer(
!user_id,
names_to = c("interface", ".value"),
names_pattern = "(.*)_ui_(.*)",
values_drop_na = TRUE
)
ggplot(tidy_counts, aes(y = prop_success, x = interface)) +
geom_boxplot() +
geom_jitter(aes(size = total_edits), width = 0.2, height = 0, alpha = 0.5) +
scale_y_continuous("Proportion of edits published", labels = scales::percent_format(1)) +
scale_size_continuous(
"Edits attempted by user",
range = c(1, 4), trans = "log", breaks = c(1, 5, 55)
) +
ggtitle("Proportion of edits published, by interface") +
theme(legend.position = "bottom")
The users at the top (100%) and bottom (0%) made very few edits, so it is easy for the proportion to be “all” or “none” when the denominator is, say, 1.
To actually test the hypothesis we perform a paired t-test since we have a success proportion from each editor for each user.
t.test(
x = edit_counts$new_ui_prop_success,
y = edit_counts$old_ui_prop_success,
alternative = "greater",
paired = TRUE
)
Paired t-test
data: edit_counts$new_ui_prop_success and edit_counts$old_ui_prop_success
t = 5.0261, df = 95, p-value = 1.175e-06
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.09907877 Inf
sample estimates:
mean of the differences
0.1479853
The average difference in proportion of published edits between the interfaces is 14.8% (p < 0.001), which is both practically and statistically significant.
The underlying question here is different. The models above inferred the latent (hidden) value of the publish probability, which is different than inferring the average proportion of published edits.
Also, for anyone curious:
hyp4 <- "invlogit((Intercept) + new_interface) - invlogit((Intercept)) > 0.148"
fit3 %>%
hypothesis(hyp4)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0.06 0.02 0.03 0.08
Evid.Ratio Post.Prob Star
1 1332.33 1 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Software | Version |
---|---|
OS | macOS Big Sur 10.16 |
R | R version 4.1.0 (2021-05-18) |
CmdStan | 2.29.2 |
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/bearloga/wmf-mlm-glm-notes, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".