A partial archive of https://discourse-mediawiki.wmflabs.org as of Saturday May 21, 2022.

Accessing a Commons thumbnail via Wikidata

unuaiga

Hi,

I’m developping a language learning program in which I illustrate words with a picture. The words are labels of Wikidata items I get width a sparql query. In this query, I also retrieve the picture linked to the item.

    SELECT ?label ?image  WHERE {
      ?ident wdt:P31|wdt:P279 wd:Q1075.	
      ?ident rdfs:label ?label.		
      ?ident wdt:P18 ?image.		
      FILTER (lang(?label)="fr")
      
    }
    LIMIT 12

The retrieved picture is a redirection to Commons, e.g. http://commons.wikimedia.org/wiki/Special:FilePath/Owoce%20wisni.jpg

I would like, from this url, acces to the url of a thumbnail of the picture (to minimize my pages loading time). For instance, I would like to obtain the file url : https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Owoce_wisni.jpg/320px-Owoce_wisni.jpg

Is that an automatic way to do this ?

Thanks a lot

Tgr

There probably is a nicer way to do it within SPARQL, but you can just cut off the prefix and send the filename to the imaginfo API:
https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&prop=imageinfo&iiprop=url&iiurlwidth=320&titles=File:Owoce%20wisni.jpg

unuaiga

Hi,

Someone gave me the answer. I have to add “?width=300px” at the end of the url, and if there is one, to remove the “File:”. E.g. :
http://commons.wikimedia.org/wiki/Special:FilePath/Owoce%20wisni.jpg?width=300px

I could also use the API, but the url solution is simpler.

Thanks

Tgr

Terrible for caching though. That URL loads a special page and issues a redirect every time you use it.

Abbe98

For anyone looking for an answer in the future. It’s possible to calculate thumbnails based on the filename:

The interesting part of an thumbnail URL is the following:

/4/46/

It’s calculated from the MD5 hash of the filename, in this case:

46fe0cdbf75cf2cd9626125af14cd4a6

The first section of the URL subset is the first character of the MD5 hash and the second part is the two first characters.

The SparQL from above would become (query.wikidata.org):

SELECT ?label ?thumb WHERE {
  ?ident wdt:P31|wdt:P279 wd:Q1075 .
  ?ident rdfs:label ?label .		
  ?ident wdt:P18 ?image .

  BIND(REPLACE(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/", "") as ?fileName) .
  BIND(REPLACE(?fileName, " ", "_") as ?safeFileName)
  BIND(MD5(?safeFileName) as ?fileNameMD5) .
  BIND(CONCAT("https://upload.wikimedia.org/wikipedia/commons/thumb/", SUBSTR(?fileNameMD5, 1, 1), "/", SUBSTR(?fileNameMD5, 1, 2), "/", ?safeFileName, "/650px-", ?safeFileName) as ?thumb)
  FILTER (lang(?label)="fr")
}
LIMIT 12
Tgr

The actual logic is more complex; this won’t work all the time (e.g. you might need to append .png or .jpg if the original is not one of those file types; the request will error out if the file is smaller than 650px; if it is exactly 650 px, it might or might not work depending on the file type).

Abbe98

@Tgr because we are targeting P18 we do limit the result to images (at least SVG is also supported by the thumbnail API) although the size issue remains and I’m sure that the filename normalization isn’t bulletproof.

Tgr

SVG is supported by the thumbnail API but that SPARQL query won’t work for SVG files, which will have thumbnails like Example.svg.png.

Abbe98

Actually SVG thumbnails works just fine, no need for a PNG extension:
https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Color_icon_yellow.svg/650px-Color_icon_yellow.svg
https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Color_icon_yellow.svg/60px-Color_icon_yellow.svg

Tgr

Huh, apparently it works with any extension:
https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Color_icon_yellow.svg/650px-Color_icon_yellow.foo

Not nice on the caches though, probably should be fixed.