The ?action=cirrusDump
shows result after restarting the elasticsearch service in the background.
But even if there are JSON results, text=""
and file_text=false
The PDFs are pure text files generated with MS Word … no scans or something similar.
I ran the maintenance scripts so often, I can’t count …
Here is a dump of the JSON generated with cirrusDump option:
text ""
source_text ""
text_bytes 0
content_model "wikitext"
language "de"
heading []
opening_text null
auxiliary_text []
defaultsort false
file_text false
file_media_type "OFFICE"
file_mime "application/pdf"
file_size 957241
file_width 1239
file_height 1754
file_bits 0
file_resolution 1474
The PDFHandler seems to work at least a little bit … since it detects the correct file parameter and generates thumbnails.
I’ll try to find something in the MediaWiki logs.
Are there special requirements of MW version or PHP version? I’ve the sneaky suspicion that there is something wrong deep in the LocalFile.php - since I get errors running refreshImageMetadata.php
… like “wrong filename or folder” (Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch.)