This is a report on Wikimedia Maps usage across Wikimedia Projects.

Maps usage on Wikimedia Projects

Mapframe Inclusion

According to the MediaWiki Maps page, the following wikis have Maps enabled with mapframes (maplinks are enabled on all wikis):

Let’s count how many articles on those wikis have mapframes. Some articles have more than one mapframe, so we also counted the total number of mapframes.

# dbs is a list of databases in analytics-store
wikivoyages <- grep("voyage$", dbs, value = TRUE)
wikipedias <- c("cawiki", "hewiki", "ruwiki", "mkwiki", "frwiki", "fiwiki", "nowiki", "svwiki", "ptwiki", "cswiki", "euwiki")
other_projects <- c("metawiki", "mediawikiwiki", "uawikimedia")
query <- "SELECT
  COUNT(*) AS `total articles`,
  SUM(IF(mapframes > 0, 1, 0)) AS `articles with a mapframe`,
  SUM(COALESCE(mapframes, 0)) AS `total mapframes`,
  SUM(IF(mapframes > 0, 1, 0))/COUNT(*) AS `mapframe prevalence`
FROM (
  SELECT
    p.page_id,
    pp_value AS mapframes
  FROM (
    SELECT pp_page, pp_value
    FROM page_props
    WHERE pp_propname = 'kartographer_frames' AND pp_value > 0
  ) AS filtered_props
  RIGHT JOIN (
    SELECT page_id FROM page
    WHERE page_namespace = 0 AND page_is_redirect = 0
  ) p
  ON p.page_id = filtered_props.pp_page
) joined_tables;"
mapframes <- lapply(c(wikivoyages, wikipedias, other_projects), function(db) {
  message("Fetching mapframe statistics from ", db, "...")
  con <- dbConnect(MySQL(), host = "127.0.0.1", group = "client", dbname = db, port = 3307)
  suppressWarnings(result <- wmf::mysql_read(query, db, con = con))
  invisible(dbDisconnect(con))
  return(result)
})
mapframes <- dplyr::bind_rows(mapframes)
rownames(mapframes) <- language_projects[c(wikivoyages, wikipedias, other_projects)]

Below are the results as of 11 September 2017:

DT::datatable(
  mapframes,
  caption = "This shows the prevalence of mapframes on wikis that have it enabled.",
  filter = "top",
  extensions = "Buttons",
  options = list(
    pageLength = 10, autoWidth = TRUE, language = list(search = "Filter:"),
    order = list(list(4, "desc")), dom = "Bfrtip", buttons = c("copy", "csv")
  )
) %>%
  DT::formatPercentage("mapframe prevalence", 3) %>%
  DT::formatCurrency(
    columns = c("total articles", "articles with a mapframe", "total mapframes"),
    currency = "", digits = 0
  )


The mean prevalence across mapframe-enabled wikis is 6.98%. The median prevalence across mapframe-enabled wikis is 0.04%. Overall prevalence is 1.76%.

Maps usage on Wikimedia Commons

Maps within Data namespace

Map data allows users to store GeoJSON data on wiki, similar to images. Search for *.map within Data namespace and you get results like Data:Parramatta Light Rail.map:

Data:Parramatta Light Rail.map, available under Creative Commons Zero.

“Data:Parramatta Light Rail.map”, available under Creative Commons Zero.

Or if you search for *.tab within Data namespace, you’ll get tabular datasets like Data:Bea.gov/GDP by state.tab.

Let’s see how many of those there are:

(Query run on 11 September 2017.)

SELECT
  CASE WHEN page_title RLIKE '\.map$' THEN 'map'
       WHEN page_title RLIKE '\.tab$' THEN 'tabular'
       ELSE 'other'
  END AS data,
  FORMAT(COUNT(*), 0) AS total
FROM page
WHERE page_namespace = 486
GROUP BY data;
2 records
data total
map 528
tabular 269

Geo-tags

In June 2016 we released Maps on Commons (T138029). Users could add coordinates to files to geo-tag them. For example:

SELECT page_title AS file, gt_lat AS latitude, gt_lon AS longitude
FROM (
  SELECT gt_page_id, gt_lat, gt_lon
  FROM geo_tags
  WHERE gt_primary = 1
    AND NOT (gt_lat = 0 AND gt_lon = 0)
  LIMIT 10
) geo_tagged
LEFT JOIN (
  SELECT page_id, page_title
  FROM page
  WHERE page_namespace = 6
    AND page_is_redirect = 0
) p
ON geo_tagged.gt_page_id = p.page_id;
Displaying records 1 - 10
file latitude longitude
Celle_Ligure-oratorio_di_San_Michele_Arcangelo-interno.jpg 44.34100 8.5565004
Werben_-_Schmogrower_Straße_0001.jpg 51.83630 14.1932001
Peitz_-_Badesee_Garkoschke_0001.jpg 51.85640 14.3852997
Sassello-cappella_punta_san_michele-interno.jpg 44.48650 8.5705004
Vossloh_Euro_4000_pupitre.JPG 48.96450 1.9070000
1960_Cadillac_Series_62.jpg 52.27150 0.7293333
1960_Cadillac_Series_62_Engine.jpg 52.27150 0.7293333
ATM_Autodromo_BusOtto_artic_trolleybus_302_(MAN_chassis)_Loreto.jpg 45.48600 9.2174997
Azienda_Elettrica_Ticinese.jpg 7.24165 0.0000000
Bobby_Orr_2010_WinterCl.jpg 42.34650 -71.0970001

One way to consider usage is how many pages are geo-tagged vs not (query run on 11 September 2017):

SELECT
  FORMAT(COUNT(*), 0) AS `total files`,
  FORMAT(SUM(is_geotagged), 0) AS `geo-tagged files*`,
  CONCAT(ROUND(100*SUM(is_geotagged)/COUNT(*),2),'%') AS `proportion geo-tagged*`
FROM (
  SELECT
    p.page_id AS page_id,
    CASE WHEN filtered_geotags.geotagged = 'yes' THEN true
         WHEN filtered_geotags.geotagged IS NULL THEN false
    END AS is_geotagged
  FROM (
    SELECT gt_page_id, 'yes' AS geotagged
    FROM geo_tags
    WHERE gt_primary = 1
  ) AS filtered_geotags
  RIGHT JOIN (
    SELECT page_id
    FROM page
    WHERE page_namespace = 6
      AND page_is_redirect = 0
  ) p
  ON filtered_geotags.gt_page_id = p.page_id
) joined_tables
1 records
total files geo-tagged files* proportion geo-tagged*
41,894,078 9,587,622 22.89%

* This is actually overcounting because there is a known issue (T143366) where the geo_tags table (List of pages’ geographical coordinates) is updated when coordinates are added to a page, but not updated when the coordinates are removed. It looks like the code that deals with geo_tags is part of the GeoData extension and Wikidata extension repositories (e.g. GeoDataDataUpdater.php).

Appendix

Setup

This report was compiled using RMarkdown, knitr, and an open SSH tunnel for connecting to our databases:

ssh -N stat6 -L 3307:analytics-store.eqiad.wmnet:3306

Notes

When figuring stuff out (e.g. what it looks like in the database when a page has a map) and working with page IDs, the MediaWiki API can be used to get a page title from a page ID:

https://commons.wikimedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=jsonfm&pageids=ID1|ID2