A partial archive of https://discourse-mediawiki.wmflabs.org as of Saturday May 21, 2022.

Best practices for API client user agent?

tobias47n9e

I am writing an API client for Wikidata (crates.io/crates/wikibase) and am wondering what the best practices for the user agent are.

There is some info about it here: meta.wikimedia.org/wiki/User-Agent_policy, but it could use some more details for library authors.

I was thinking something like

User-Agent: Library <Version>, User <Version>, Task

would be very easy to understand.

So for running the tests of Wikibase RS I would send:

User-Agent: Wikibase RS<0.1.2>, Task: Testing

and for a wkdr query:

User-Agent: Wikibase RS<0.1.2>, wkdr<0.1.1>,Task: CLI query

The main question is if we have some standard format and how much information would be useful :thinking:

Tgr

According the example in the UA policy, it would be something like wkdr/0.1.1 Wikibase-RS/0.1.2. I don’t think there is much value in task info, it will make harder to group / count actions by the tool, and if a client is so broken that it needs to be blocked no one would want to experiment with trying whether it’s only broken for one specific task, anyway.

TheDJ

Definitely make sure you get the format of that first Identifier/version part correct, as that is actually dictated by the spec for user-agents. @Tgr’s suggestion seems best:

MyBot/version Wikibase/version

The full details of the policy referred to by @tgr can be found here: https://meta.wikimedia.org/wiki/User-Agent_policy

LucasWerkmeisterWMDE

Well, the specification in RFC 7231 is somewhat different from what MDN says… according to the RFC, the User-Agent consists of a product (name or name/version), followed by any number of comments (enclosed in parentheses) or further products (in decreasing order of significance). So I think the default user agent of your library should be:

Wikibase-RS/0.0.1

and the user agent of wkdr perhaps:

wkdr/0.0.1 Wikibase-RS/0.0.1

and some fictional bot built on top of your library could use something like

mybot/1.0.0 (import foo data; operator: User:bar) Wikibase-RS/0.0.1

tobias47n9e

Thank you all for your clarifications! It has helped me a lot and I am now implementing it, so users of Wikibase RS need to set an env-variable before they can use it. The users have to set a prefix " Wikibase-RS/0.0.1". Even if they game the system the final user agent string will be “Wikibase-RS/0.0.1”. It would already be online, but I think me testing it during development has led my IP to be being blocked :blush:

tobias47n9e

Now that I am thinking about it: What kind of regex is used for the user-agent monitoring? I would like to validate that right in the library, so people can’t run Wikibase RS without the proper configuration.

Currently using this pcre regex for testing purposes:

[a-zA-Z\d-_]+\/[0-9\.]+
Tgr

At a glance, WMF Analytics seems to use UAParser which is basically just a huge collection of regexes and does not seem to contain Wikipedia-specific bots. I might have missed some other place where they do UA parsing, though. This might be a better question for #wikimedia-analytics.