A partial archive of https://discourse-mediawiki.wmflabs.org as of Saturday May 21, 2022.

Setting a timeout for database queries

Daimona

Hi! I’m currently trying to develop a patch (https://gerrit.wikimedia.org/r/#/c/411593/) to add to AbuseFilter a way to search through filter rules. This works by adding to the database query a further condition with LIKE or RLIKE basing on users’ choice. The scratch works quite well, though I’m facing a big trouble: I don’t have a way to set a timeout and prevent the run of bad regexes which may create long-running queries. As a side note, this isn’t likely to be done intentionally, since the search is precluded to those who can see private filters (i.e. trusted users), but the fact still remains. So, in looking for a way out, these are my questions:
1 - Are database queries “limited” by default? I.e., do they already have a max execution time set by MW?
2 - Is there a simple way to set a max execution time without adding too much code (like a cronjob would), maybe directly offered by MW?
3 - In MySQL 5.7.4+ you can actually set a MAX_EXECUTION_TIME parameter, though right now MW has support for MySQL 5.5+. The same goes for MariaDB, which added a similar parameter in a following version. So, are there plans to raise version compatibility in a short future?
Many thanks.

Tgr

MediaWiki can’t limit queries1; system administrators can (and usually do). Queries which run long enough to hit the limit are still problematic if they happen often. I’m not sure I understand what risk you are trying to mitigate here, though. How many abuse filters can a wiki have? A few thousand, maybe? With each less than a few thousand characters? I doubt it’s possible to cause any significant delay with that. Are you worried about regex DoS attacks? Those kinds of patterns are hard to get by accident, so a search interface limited to admins is probably fine.

On a higher level, MySQL is not very good at searching (you cannot easily search in multiple fields, can’t do fuzzy searches etc); using the normal search backend might give better results, although it’s a lot more complex to set up.

1 Well, it actually can (and does) to some extent, e.g. there are transaction time limits. In any case, that’s high-level configuration and not something individual features are supposed to mess with.

Daimona

Many thanks for the answer! I hoped that there had been a way, so I’m a bit sorry to hear this. Actually, the amount of data for filters is much smaller. For instance, on it.wiki we have around 500 filters (including deactivated ones), many of which don’t go over 100 words. And yes, I’m worried about a possible, accidental, ReDoS attack. Actually, a single 20 characters filter may trigger it, if the used regex is really bad, like the classic catastrophic patterns. Some engines automatically detect such situations and prevent wasting lots of time, though I don’t know whether SQL has this feature. Actually, I can’t tell how big is the risk of a user unintentionally using a malformed pattern.
In my context, implementing another type of search (like elastica) would be way too complex and exaggerated for this task, since database queries seem to work fine with this kind of searches.
Anyway, it would be a huge relief for me to hear that there won’t be the risk to break the software with a simple regex, and if so I’ll finally submit the patch. Thanks again.

Chicocvenancio

I’m not sure that is true¹, but I do agree that the combination of bad regex and bad filter is unlikely enough to not be a big concern.

1 E.G https://phabricator.wikimedia.org/phame/post/view/64/laughing_ores_to_death_with_regular_expressions_and_fake_threads/

Tgr

IMO the longer term AbuseFilter filters should be converted into pages with a custom ContentHandler, at which point you get search pretty much for free. But yeah, that’s far from low-hanging fruit.

Filed T187669 about the regex problem, maybe it is possible to add some kind of regex validation to MediaWiki. I wouldn’t worry about it in the context of AbuseFilter search, though.

Daimona

Yes, that might be a future developement for AF, nothing that could be done quickly though. Thanks for the task, I’ll subscribe to it hoping in some good news, also hoping for a bump in mySQL supported version. And finally, I’ll commit a definitive version of the patch. Many thanks for the help!