Report on Leading Indicators for Reading List A/B test on Desktop

Author
Affiliation

Jennifer Wang

Published

December 12, 2025

Modified

December 12, 2025

Task: T410462

Introduction

The Reader Experience team developed reading list for logged-in readers on desktop. (Hypothesis FY25-26 WE3.3.4). An A/B test is conducted on the XLab experiment platform to evaluate its impact on internal referrals and to measure feature usage.

Methodology

The A/B test was run on logged-in web users on desktop. It was enabled in tiers—starting with 5 pilot wikis on November 12, and then deployed on English Wikipedia on November 19. For this analysis, we reviewed A/B test data recorded from November 12 through December 9, 2025.
In this test, a randomly selected half of logged-in users who met the criteria below were assigned to the treatment group, where they saw the Reading List feature: (1) active in the last 3 months, (2) 0 edits, (3) zero watchlist items (except for user page and user talk page), and (4) zero existing Reading List tables. The remaining half were assigned to the control group, where they experienced the current interface. We tracked pageviews with referral information (internal or external) and reviewed the internal referral rate on a per-user basis for all tested users, as well as for the subgroup of users who engaged with the Reading List feature. We also tracked click events on the Reading List feature to understand overall feature usage.

Summary of results

Note

Results are based on initial A/B test data collected on desktop to evaluate the impact of Reading List on desktop users. Later, we discovered that the feature was accidentally enabled on the mobile web platform as well. Engineer is adding instrumentation for mobile web and extending A/B test. More event data from both platforms will be collected to confirm the statistical significance of the findings and the combined effect across platforms. We will review the full A/B test results as the second part of the analysis in T410462.

User engagement

  • During the analysis timeframe, 113 users engaged with the Reading List, representing 2.86% of the treatment group.

Internal referral rate

  • Across all experiment users, internal referral rates were similar for the control and treatment groups (around 30%). No statistically significant increase was observed in the treatment group, likely due to the low engagement with the Reading List.
  • Among users who used the Reading List, the average internal referral rate was 69.08%, substantially higher than the control group or the overall treatment group (30%). This difference may be due to the Reading List increasing internal referrals, or because users who engaged with the feature naturally have higher internal referral rates, or both.

Setup

Code
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(tidyverse); 
library(lubridate); 
library(scales);
library(magrittr); 
library(dplyr);
})
Code
library(relax) # for xlab stats report
Code
# For summary tables   
library(gt)
library(gtsummary)
library(IRdisplay)
Code
library(htmltools)

Analysis

Internal referral rate

Control vs Treatment

Code
df_internal_ref_1  <-
  read.csv(
    file = 'Data_out/internal_referral_1st_tier.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
df_internal_ref_2  <-
  read.csv(
    file = 'Data_out/internal_referral_2nd_tier.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
df_internal_ref_all <- bind_rows(df_internal_ref_1 , df_internal_ref_2)
Code
df_stats_internal_ref_all <- df_internal_ref_all %>%
    select(variation, internal_ref_rate) %>%
    rename(outcome=internal_ref_rate) %>%
    calculate_metric_stats(metric_type = "mean") 
Code
df <- df_stats_internal_ref_all %>%
  as.data.frame() %>%
  rownames_to_column("group")
Code
display_html(
    as_raw_html(
             df  %>%
              gt()%>%
              tab_header(
                title = md("Internal Referral Rates of the Control and Treatment Groups<br>Statistical Summary")
              )  %>%
              cols_label(
                    group = "Experiment group",
                    sample_size = "Sample Size",
                    sample_mean = "Sample Mean",
                    sample_variance = "Sample Variance"
                  ) %>%
              tab_style(
                style = cell_text(weight = "bold"),
                locations = cells_column_labels(columns = c(group, sample_size, sample_mean,sample_variance ))
                ) %>%
                fmt_percent(
                columns = c("sample_mean", "sample_variance"),
                decimals = 2
              ) %>%
                fmt_number(
                columns = "sample_size",
                decimals = 0
              ) %>%
              opt_stylize(6) %>%
             cols_width(everything() ~ px(150)) %>%
               tab_source_note(
                   source_note = md(
        "Data source: schema mediawiki_product_metrics_reading_list <br>
        Timeframe: November 12 - December 9, 2025 <br>
        Platform: desktop ")
                )%>%
          tab_style(
            style = cell_text(align = "left"),
            locations = cells_source_notes()
          )
        )
    )
Internal Referral Rates of the Control and Treatment Groups
Statistical Summary
Experiment group Sample Size Sample Mean Sample Variance
control 4,163 30.28% 9.64%
treatment 3,946 30.20% 9.60%
Data source: schema mediawiki_product_metrics_reading_list
Timeframe: November 12 - December 9, 2025
Platform: desktop
Code
df_lift_internal_ref_all  <- df_internal_ref_all %>%
    select(variation, internal_ref_rate) %>%
    rename(outcome=internal_ref_rate) %>%
    analyze_relative_lift(metric_type = "mean")
Code

display_html(
    as_raw_html(
             df_lift_internal_ref_all   %>%
              gt()%>%
              tab_header(
                title = md("Internal Referral Rates of the Control and Treatment Groups<br>Impact Estimation")
              )  %>%
              tab_spanner(
                  label = "Bayes",
                  columns = c(estimate_bayes, chance_to_win, cred_lower, cred_upper)
              ) %>%
              tab_spanner(
                  label = "Frequency",
                  columns = c(estimate_freq, p_value, conf_lower, conf_upper)
              ) %>%  
              cols_label(
                    estimate_bayes = "Change (Bayes)",
                    chance_to_win = "Chance To Win",
                    cred_lower = "95% CI Lower (Bayes)",
                    cred_upper = "95% CI Upper (Bayes)",
                    estimate_freq = "Change (Freq)",
                    # p_value = 
                    conf_lower = "95% CI Lower (Freq)",
                    conf_upper = "95% CI Upper (Freq)",
                  ) %>%
              tab_style(
                style = cell_text(weight = "bold"),
                locations = cells_column_spanners(c("Bayes", "Frequency"))
                ) %>%
                fmt_percent(
                columns = everything(),
                decimals = 2
              ) %>%
             # opt_stylize(6) %>%
             cols_width(everything() ~ px(150)) %>%
               tab_source_note(
                   source_note = md(
        "Timeframe: November 12 - December 9, 2025 <br>
        Platform: desktop ")
                )%>%
          tab_style(
            style = cell_text(align = "left"),
            locations = cells_source_notes()
          )
        )
    )

Internal Referral Rates of the Control and Treatment Groups
Impact Estimation
Bayes Frequency
Change (Bayes) Chance To Win 95% CI Lower (Bayes) 95% CI Upper (Bayes) Change (Freq) p_value 95% CI Lower (Freq) 95% CI Upper (Freq)
−0.26% 45.38% −4.71% 4.18% −0.26% 91.79% −10.05% 9.52%
Timeframe: November 12 - December 9, 2025
Platform: desktop

Internal referral rates were similar for both the control and treatment groups, around 30%. Overall, we did not observe a statistically significant increase in the treatment group. This may be due to the low engagement with the Reading List among treatment users.

Users who used the feature in treatment group

Code
df_internal_ref_feature_user_1  <-
  read.csv(
    file = 'Data_out/internal_referral_feature_user_1st_tier.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
df_internal_ref_feature_user_2  <-
  read.csv(
    file = 'Data_out/internal_referral_feature_user_2nd_tier.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
df_internal_ref_feature_user_all <- bind_rows(df_internal_ref_feature_user_1 , df_internal_ref_feature_user_2)
Code
df_stats_internal_ref_feature_user <- df_internal_ref_feature_user_all %>%
     summarize(
                sample_size = n(),
                sample_mean = mean(internal_ref_rate),
                sample_variance = var(internal_ref_rate)
            ) 
Code
display_html(
    as_raw_html(
             df_stats_internal_ref_feature_user  %>%
              gt()%>%
              tab_header(
                title = md("Internal Referral Rates of Users Who Used the Feature<br>Statistical Summary")
              )  %>%
              cols_label(
                    sample_size = "Sample Size",
                    sample_mean = "Sample Mean",
                    sample_variance = "Sample Variance"
                  ) %>%
              tab_style(
                style = cell_text(weight = "bold"),
                locations = cells_column_labels(columns = c(sample_size, sample_mean,sample_variance ))
                ) %>%
                fmt_percent(
                columns = c("sample_mean", "sample_variance"),
                decimals = 2
              ) %>%
                fmt_number(
                columns = "sample_size",
                decimals = 0
              ) %>%
              opt_stylize(6) %>%
             cols_width(everything() ~ px(200)) %>%
               tab_source_note(
                   source_note = md(
        "Data source: schema mediawiki_product_metrics_reading_list <br>
        Timeframe: November 12 - December 9, 2025 <br>
        Platform: desktop ")
                )%>%
          tab_style(
            style = cell_text(align = "left"),
            locations = cells_source_notes()
          )
        )
    )
Internal Referral Rates of Users Who Used the Feature
Statistical Summary
Sample Size Sample Mean Sample Variance
113 69.08% 4.48%
Data source: schema mediawiki_product_metrics_reading_list
Timeframe: November 12 - December 9, 2025
Platform: desktop

During the analysis timeframe, 113 users used the Reading List, representing 2.86% of users in the treatment group (113/3946).
The average internal referral rate among these users is 69.08%, which is substantially higher than that of the control group or the average of all treatment users who were exposed to the feature, regardless of whether they actually used it (30%).
The observed difference could result from the Reading List driving higher internal referrals, or from the fact that users who interacted with the Reading List tend to have a naturally higher internal referral rate, or both.

Acknowledgements

  • The relax library for statistical analysis is developed by Mikhail Popov. repo