Steps to Reproduce:
Take any SVG file with the first <switch> tag appearing after $wgSVGMetadataCutoff (256kB).
Actual Results:
no translations dropdown to choose
Expected Results:
translations dropdown to choose
DetailsSubject Repo Branch Lines +/- Customize query in gerritRelated Objects- Mentioned In
- T343095: Commons translate failing for image with a lot of elements
T309426: Small, translated, SVG files do not display the "render this image in $lang" dropdown
T275263: Translation dropdown not available on File: page after translating a specific SVG file on Commons via svgtranslate tool
T279133: switch not recognized after $wgSVGMetadataCutoff
T270999: Perform workaround for SVG files larger than 256K which can have their <switch> element undetected by the MW metadata extractor - Mentioned Here
- T40010: RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis
Event Timeline
Comment ActionsTwo proposals: increase the number of bytes read or shift multilingual testing to upload time (when the file is read anyway).
In T40010, Ponor looked at 30 SVG files and stated the mean file size was 700 kB. JoKalliauer stated that only about 500 SVG files are being uploaded every day. Johannes also says that SVGs are 2.8 percent of uploads.
SVG illustrations will be placing text on top of a drawing, so most text elements will be at the end of the file.
At one point, SVG uploads were limited to 10 MB. I do not know if that limit is still in effect.
I do not know how long it takes for MW to parse an XML file.
- We might change $wgSVGMetadataCutoff to be 3 times the average SVG file, that is 2 MB. That should allow must SVG files to be read completely and therefore correctly processed. It means that reading humongous SVG files may take up to 8 times longer, but the average case should only be 3 times longer. (It will also take up to 1.7 MB more process memory, which may be a more stringent limitation).
- As I understand it, the SVG file is parsed every time a page built. I'd also believe the page must be read completely when it is uploaded. At upload, the SVG file could be scanned for systemLanguage attributes, and then an entry could be made in the database whether it is multilingual. If there were no language attributes, then a page build need not scan the file at all (it could get image width and height from the imageinfo database). If there were systemLanguage attributes, then it could scan the first 2 MB of the file (or even the entire file). Having such a flag may even decrease the SVG processing time if a small percentage of SVG files are multilingual.
- Alternatively, the database could include all the langtags discovered in at file upload, so the SVG file would not have to be reread to build a page.
Comment ActionsWhen I've run into this problem, I've used two workarounds.
One is to add a hidden switch near the top of the file:
<switch visibility="hidden">
<text systemLanguage="en">English</text>
<text systemLanguage="de">Deutsch</text>
<text>English</text>
</switch>
The second is to add a similar switch to the defs element:
<defs>
<g id="legend">
<switch>
<text systemLanguage="en">English</text>
<text systemLanguage="de">Deutsch</text>
<text>English</text>
</switch>
</g>
</defs>
SVG Translate offers to translate the text, and the users if the users add a translation, then it will show up on the File page.
SVG Translate could always add such an element near the front of the file. A trick would be to set the id to an SVG Translate GUID. Then SVG translate could always add the language without offering it to the user.
Comment ActionsActually, it looks like this is using XMLReader, so memory usage should be quite low. If there was to be some sort of DOS issue, it would probably be with recursive entity expansion which would not be prevented via the cut-off. (However libxml does have better checks against this now a days).
With that in mind, i think it makes sense to increase to 5MB.
Comment ActionsChange 1000386 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):
[mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB).
Comment ActionsChange 1000386 merged by jenkins-bot:
[mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB).
Comment ActionsAlternatively, the database could include all the langtags discovered in at file upload, so the SVG file would not have to be reread to build a page.
For reference, this is actually how it works.
@Glrx Do you think upping the limit to 5MB is sufficient to call this bug fixed?
Comment Actions
Some time ago, I learned that the langtags were stored in the MW database (they are a bit buried in the API). I'm not a MW expert.
Yes, 5 MB is enough to close this issue. That size is well above the typical size, and SVG files that are above 5 MB probably have other issues. I've fixed several SVG files with this problem. IIRC, the file sizes were usually less than 1 MB (it was a 256 kB limit rather than 512 kB).
The biggest file I recall is https://commons.wikimedia.org/wiki/File:2022_Russian_invasion_of_Ukraine.svg which was probably 2 MB at the time. It has now grown to 3.7 MB (apparently gaining 1.5 MB when the base map was improved in August 2023). It is a map that has such detail that it is not expected to be viewed in MW directly; users will download and view the SVG so they can pan and zoom the image.
I would still encourage that SVG Translate add a hidden switch element at the start of the SVG file, but that is a separate issue.
Comment ActionsFor reference, on commons, there are 43066 SVGs that are > 5MB out of 2 419 905 in total (1.7%)
For images where we have detected translations (However this will miss any images where this bug is present, so maybe not a useful stat) 13 out of 4771 (0.27%) are larger than 5MB. The list is below:
+-------------------------------------------------+------------+
| img_name | Size (MiB) |
+-------------------------------------------------+------------+
| 1979_United_Kingdom_EU_Election.svg | 24.4278 |
| Bahnstrecke_Oberhausen–Arnhem_Karte.svg | 17.7614 |
| Corsica-geographic_map.svg | 14.2364 |
| Geographic_map_of_Carpathian_mountains_CS.svg | 10.7202 |
| Indian_General_Election_2014_by_alliance.svg | 5.0228 |
| Iran-geographic_map-es.svg | 14.1260 |
| Iran-geographic_map.svg | 12.7000 |
| Iran-geographic_map_clean.svg | 8.6157 |
| Iran_Faults_map.svg | 12.8078 |
| Neubaustrecke_Rhein-Main-Rhein-Neckar_Karte.svg | 6.9020 |
| Pannonian_Basin_geographic_map-es.svg | 10.2899 |
| Pannonian_Basin_geographic_map.svg | 9.5815 |
| İran_coğrafya_haritası.svg | 12.5943 |
+-------------------------------------------------+------------+
Anyways, calling this done. If the limit is still causing problems in any significant way, people can reopen this task or make a new one.
Comment ActionsWe might have to run a forced metadata refresh on the SVGs. Otherwise I think those SVGs between old and new value require a re upload to detect that they have new metadata.
foreachwiki maintenance/refreshImageMetadata.php --mediatype=DRAWING --mime=image/svg+xml --force --throttle
Unfortunately there doesn't seem to be a way to select only svgs of a certain size, so this would reparse all svgs, which is quite a bit. I don't think that will be a problem, because relatively SVGs are a tiny set of the uploads, but it's always a bit of a gamble.
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Disclaimer ·
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Customize query in gerrit
Related Objects
Event TimelineComment Actions Two proposals: increase the number of bytes read or shift multilingual testing to upload time (when the file is read anyway). In T40010, Ponor looked at 30 SVG files and stated the mean file size was 700 kB. JoKalliauer stated that only about 500 SVG files are being uploaded every day. Johannes also says that SVGs are 2.8 percent of uploads. SVG illustrations will be placing text on top of a drawing, so most text elements will be at the end of the file. At one point, SVG uploads were limited to 10 MB. I do not know if that limit is still in effect. I do not know how long it takes for MW to parse an XML file.
Comment Actions When I've run into this problem, I've used two workarounds. One is to add a hidden switch near the top of the file: <switch visibility="hidden"> <text systemLanguage="en">English</text> <text systemLanguage="de">Deutsch</text> <text>English</text> </switch> The second is to add a similar switch to the defs element: <defs> <g id="legend"> <switch> <text systemLanguage="en">English</text> <text systemLanguage="de">Deutsch</text> <text>English</text> </switch> </g> </defs> SVG Translate offers to translate the text, and the users if the users add a translation, then it will show up on the File page. SVG Translate could always add such an element near the front of the file. A trick would be to set the id to an SVG Translate GUID. Then SVG translate could always add the language without offering it to the user. Comment Actions Actually, it looks like this is using XMLReader, so memory usage should be quite low. If there was to be some sort of DOS issue, it would probably be with recursive entity expansion which would not be prevented via the cut-off. (However libxml does have better checks against this now a days). With that in mind, i think it makes sense to increase to 5MB. Comment Actions Change 1000386 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff): [mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB). Comment Actions Change 1000386 merged by jenkins-bot: [mediawiki/core@master] Change $wgSVGMetadataCutoff default to 5 MiB (previously 512KiB). Comment Actions
For reference, this is actually how it works. @Glrx Do you think upping the limit to 5MB is sufficient to call this bug fixed? Comment Actions Some time ago, I learned that the langtags were stored in the MW database (they are a bit buried in the API). I'm not a MW expert. Yes, 5 MB is enough to close this issue. That size is well above the typical size, and SVG files that are above 5 MB probably have other issues. I've fixed several SVG files with this problem. IIRC, the file sizes were usually less than 1 MB (it was a 256 kB limit rather than 512 kB). The biggest file I recall is https://commons.wikimedia.org/wiki/File:2022_Russian_invasion_of_Ukraine.svg which was probably 2 MB at the time. It has now grown to 3.7 MB (apparently gaining 1.5 MB when the base map was improved in August 2023). It is a map that has such detail that it is not expected to be viewed in MW directly; users will download and view the SVG so they can pan and zoom the image. I would still encourage that SVG Translate add a hidden switch element at the start of the SVG file, but that is a separate issue. Comment Actions For reference, on commons, there are 43066 SVGs that are > 5MB out of 2 419 905 in total (1.7%) For images where we have detected translations (However this will miss any images where this bug is present, so maybe not a useful stat) 13 out of 4771 (0.27%) are larger than 5MB. The list is below: +-------------------------------------------------+------------+ | img_name | Size (MiB) | +-------------------------------------------------+------------+ | 1979_United_Kingdom_EU_Election.svg | 24.4278 | | Bahnstrecke_Oberhausen–Arnhem_Karte.svg | 17.7614 | | Corsica-geographic_map.svg | 14.2364 | | Geographic_map_of_Carpathian_mountains_CS.svg | 10.7202 | | Indian_General_Election_2014_by_alliance.svg | 5.0228 | | Iran-geographic_map-es.svg | 14.1260 | | Iran-geographic_map.svg | 12.7000 | | Iran-geographic_map_clean.svg | 8.6157 | | Iran_Faults_map.svg | 12.8078 | | Neubaustrecke_Rhein-Main-Rhein-Neckar_Karte.svg | 6.9020 | | Pannonian_Basin_geographic_map-es.svg | 10.2899 | | Pannonian_Basin_geographic_map.svg | 9.5815 | | İran_coğrafya_haritası.svg | 12.5943 | +-------------------------------------------------+------------+ Anyways, calling this done. If the limit is still causing problems in any significant way, people can reopen this task or make a new one. Comment Actions We might have to run a forced metadata refresh on the SVGs. Otherwise I think those SVGs between old and new value require a re upload to detect that they have new metadata. foreachwiki maintenance/refreshImageMetadata.php --mediatype=DRAWING --mime=image/svg+xml --force --throttle Unfortunately there doesn't seem to be a way to select only svgs of a certain size, so this would reparse all svgs, which is quite a bit. I don't think that will be a problem, because relatively SVGs are a tiny set of the uploads, but it's always a bit of a gamble. Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Disclaimer · |