T342444 was halted because the reindexing was too much slow.
- Update config with a more efficient interim analysis chain in case any reindexing needs to be done.
- Refactor recent analysis upgrades (acronyms and camelCase) to be acceptably efficient as custom filters in
the extra plugina new pluginEnable plugin version-checking in analysis config (so we know we have the new extra plugin)- Enable less expensive fallback versions of camelCase and acronym processing for 3rd party users without the new plugin
- Possibly investigate other slow points in global configs (implement immediately or open new tickets)
New dependency: We can/should link this with T332337, which also needs a new filter and put everything in one new plugin.
DetailsSubject Repo Branch Lines +/- Refactor CamelCase Analysis into Textify Plugin search/extra master +859 -825 Allow Fallback Filters, Config CamelCase Plugin mediawiki/extensions/CirrusSearch Comment ActionsChange 957806 had a related patch set uploaded (by Tjones; author: Tjones):
[mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization
Comment ActionsChange 957806 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization
Comment ActionsWe previously discussed how to bundle the new filters, but talked about it again today.
Since acronym and camelCase processing aren't language-specific, creating a separate plugin isn't an obvious requirement or even desirable. Moving them into the extra plugin made sense from an architectural point of view, but the added complexity for our own deployment, 3rd party users, and even developers is undesirable. Trying to resolve everything in the config builder by testing for specific WMF versions of plugins (e.g., "v7.10.2-wmf5 or newer") is possibly more complexity than it is worth at the moment.
OTOH, creating and checking for the presence of a new plugin is easy. Though it is possible that in the future that the overhead of many plugins is a problem, but at the moment there is no evidence of that. For now, our standard operating procedure will be to create a new plugin when we have a batch of new filters to create.
As a big-picture compromise, it makes sense to work on T332337 (ICU tokenizer repair) before returning to T332342 (folding), and bundle the new filter there with the two here, so that all three new filters can be in one plugin.
Comment ActionsChange 965602 had a related patch set uploaded (by Tjones; author: Tjones):
[search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin
Comment ActionsChange 965603 had a related patch set uploaded (by Tjones; author: Tjones):
[search/extra@master] Refactor CamelCase Analysis into Textify Plugin
Comment ActionsChange 965793 had a related patch set uploaded (by Tjones; author: Tjones):
[search/extra@master] Add limited_mapping to Textify Plugin
Comment ActionsChange 965575 had a related patch set uploaded (by Tjones; author: Tjones):
[mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin
Comment ActionsChange 965576 had a related patch set uploaded (by Tjones; author: Tjones):
[mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin
Comment ActionsChange 967912 had a related patch set uploaded (by Tjones; author: Tjones):
[mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present
Comment ActionsChange 965575 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin
Comment ActionsChange 965576 merged by Tjones:
[mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin
Highlights:
- Acronym extra load time decreased from 274.1% to 4.4%!
- camelCase extra load time decreased from 19.9% to 3.6%
- limited_mappings are ~50% faster then regular mappings for simple one-char to one-char mappings
- I'm going to work on T332337: Repair multi-script tokens split by the ICU tokenizer next instead of going back to T332342: Standardize ASCII-folding/ICU-folding across analyzers to roll everything into one new plugin.
Comment ActionsChange 967912 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present
Comment ActionsChange 965602 merged by jenkins-bot:
[search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin
Comment ActionsChange 965603 merged by jenkins-bot:
[search/extra@master] Refactor CamelCase Analysis into Textify Plugin
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Refactor CamelCase Analysis into Textify Plugin | search/extra | master | +859 -825 | |
Allow Fallback Filters, Config CamelCase Plugin | mediawiki/extensions/CirrusSearch | Comment Actions Change 957806 had a related patch set uploaded (by Tjones; author: Tjones): [mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization Comment Actions Change 957806 merged by jenkins-bot: [mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization Comment Actions We previously discussed how to bundle the new filters, but talked about it again today. Since acronym and camelCase processing aren't language-specific, creating a separate plugin isn't an obvious requirement or even desirable. Moving them into the extra plugin made sense from an architectural point of view, but the added complexity for our own deployment, 3rd party users, and even developers is undesirable. Trying to resolve everything in the config builder by testing for specific WMF versions of plugins (e.g., "v7.10.2-wmf5 or newer") is possibly more complexity than it is worth at the moment. OTOH, creating and checking for the presence of a new plugin is easy. Though it is possible that in the future that the overhead of many plugins is a problem, but at the moment there is no evidence of that. For now, our standard operating procedure will be to create a new plugin when we have a batch of new filters to create. As a big-picture compromise, it makes sense to work on T332337 (ICU tokenizer repair) before returning to T332342 (folding), and bundle the new filter there with the two here, so that all three new filters can be in one plugin. Comment Actions Change 965602 had a related patch set uploaded (by Tjones; author: Tjones): [search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin Comment Actions Change 965603 had a related patch set uploaded (by Tjones; author: Tjones): [search/extra@master] Refactor CamelCase Analysis into Textify Plugin Comment Actions Change 965793 had a related patch set uploaded (by Tjones; author: Tjones): [search/extra@master] Add limited_mapping to Textify Plugin Comment Actions Change 965575 had a related patch set uploaded (by Tjones; author: Tjones): [mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin Comment Actions Change 965576 had a related patch set uploaded (by Tjones; author: Tjones): [mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin Comment Actions Change 967912 had a related patch set uploaded (by Tjones; author: Tjones): [mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present Comment Actions Change 965575 merged by jenkins-bot: [mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin Comment Actions Change 965576 merged by Tjones: [mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin Highlights:
Comment Actions Change 967912 merged by jenkins-bot: [mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present Comment Actions Change 965602 merged by jenkins-bot: [search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin Comment Actions Change 965603 merged by jenkins-bot: [search/extra@master] Refactor CamelCase Analysis into Textify Plugin |