A partial archive of https://discourse-mediawiki.wmflabs.org as of Saturday May 21, 2022.

How do you know whether an old extension is actually used or not?

freephile

Say I want to remove an old :timer_clock: extension from a wiki. How would I know whether or not the extension is actually being used (and where) so that I could determine the impact?

The only approach I can think of is manual, and differs depending on the type of extension. I wonder if there is a (class) method, a debugging method, a tool, or even an extension that is purpose built for this analysis?

The manual approach would be something like:

  1. Determine type of extension.
  2. If the extension provides a SpecialPage to do something, (e.g. Html2Wiki) then you know that removing the extension only removes the functionality provided by the SpecialPage.
  3. If parser function, tag, or magic word, use ‘Replace Text’ extension to search the wiki for uses of that function, tag or magic word.
  4. If extension creates it’s own database tables, look at those tables for records, and foreign keys to the pages that the extension relates to.
  5. If it’s a skin, then obviously removing it limits the available skin choices.
  6. If it’s an API extension, you should know whether or not it’s being used. You would need to make sure there aren’t external “users” (scripts, apps whatever) of the API

Since it’s complicated, my guess is that you just roll up your sleeves and figure it out. But, maybe I’m not aware of a technique that makes this quicker/easier.

Tgr

Set up something like xhprof, wait a week, see how much the extension files get loaded.

freephile

Thanks Gergő, we actually use XHProf in Meza + QualityBox to do profiling, although I’m not too familiar with actually using XHProf (James added it). So, I’m digging into it. In the process, I’ve done a little write-up https://freephile.qualitybox.us/wiki/XHProf (I notice that the mw.org page could use some updating, so I’ll try to contribute my findings there too.)

I wonder whether the new addition of “slots” in the Multi-Content Revisions would allow for the storage of parser/profiling information to describe what extensions are used on a page; thus enabling a feature for extension management that would say: “Warning: If you disable / remove extension ‘foo’, then these pages will be affected.” <= @cicalese What do you think? I imagine a WordPress-like extension management interface.

cicalese

Interesting thought, @freephile. I know that when I was maintaining wiki farms and occasionally had to migrate prototype wikis from development wiki farms to their ultimate production homes, I sometimes had that problem of determining what extensions were actually used in the wiki. It would be good to see if there were a low-overhead way of capturing this information. We often had to resort to searches using Extension:RigorousSearch or Extension:ReplaceText.

Tgr

Multi-Content Revisions is about content; you need to save a new revision of a page to store new data. That’s not really useful for profiling (or parsing, even, given that the parsing of a page depends on things other than the page content, e.g. due to template transclusion). Collecting that data via some more appropriate means (structured logging, profiling, EventLogging etc.) is not particularly hard, but either you need to modify all extensions to log when they do something important or the results will be noisy (e.g. due to an extension hook that gets called on every page, determines it has nothing to do and returns).

freephile

That’s what I was thinking. How could you collect info about when an extension actually modifies page content (vs. executing a hook, finds nothing to do, and moves on)? I don’t know whether this could be handled by core, or whether the responsibility would fall on the extension developer who would have to deliberately call some method to register itself in the ‘dependency’ slot. Still, the whole idea of extra slots to store meta-data about a page seems like the perfect place to store info about what extensions are used on a page. Since extensions might not be present “on page”; but rather only through template transclusion, then perhaps the extension management feature would list the templates that you’d break by removing an extension.

Tgr

I guess once something like T154674 is in core, you could decorate the hook service with something that watches whether the hook changed anything. IMO it would end up being a huge waste of time, compared to just sampling your pages, then switching off stuff that does not seem to be used and seeing if anyone complains.

So imagine Template:A containing <includeonly>{{#foo}}</includeonly> and page B containing {{A}}. You can’t store that fact that #foo is used in a slot on A because it actually isn’t. And you can’t store it in a slot on B because that can’t depend on the contents of A which can be changed any time without B getting a new revision. I guess you could put it into the parser cache entry which does get regenerated every time a template changes, but it would be very pointless compared to just logging that #foo was used in page B and then using ELK or whatever to query those logs efficiently.