With the ability to:
- Detect whether an edit involves someone adding a reference (T325713) and
- Detect whether an edit involves someone adding new information (T333714)
...we'd like to use this ticket to learn what proportion of edits that involve people adding new information also include them adding a reference.
Decision(s) to be made
Knowing the proportion of edits that involve people adding new information and a reference will enable the Editing Team to decide:
- What percentage change in the proportion of edits that involve people adding new information that also include a reference should we expect Edit Check to cause?
Research questions
- 1. What percentage of main namespace Wikipedia edits that involve people adding new information include a reference? How do these percentages vary by wiki, editor experience level, and editing interface?
- 2. What percentage of these edits are reverted? How does that compare to the revert rate of the new content edits that do not include a reference?
Related Objects- Mentioned In
- T371158: [SPIKE] What percentage of edits are reverted because of peacock behavior?
T346982: Generate a list of references people cite when adding new content
T344132: [Analysis] What percentage of all edits to the main namespace with VisualEditor edits add new information
T333714: Introduce a tag to identify edits that involve people adding new content - Mentioned Here
- T333714: Introduce a tag to identify edits that involve people adding new content
T324733: Introduce a tag to identify edits that meet the Edit Check heuristic
T325713: Introduce a change tag to identify edits that include a reference
Event Timeline
MNeisler edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Jul 10 2023, 3:09 PM2023-07-10 15:09:05 (UTC+0)Comment ActionsThe new edit tags to identify edits that involve people adding new content (editcheck-newcontent ) and edits that include a reference (editcheck-newreference) have been deployed. I completed an initial QA check today and confirmed that both of these tags are currently being recorded in the database.
I'll plan to start this task next week after we log about a week worth of data.
Comment Actions@ppelberg Results from this analysis are summarized below. Please let me know if you have any questions.
Methodology
I reviewed data logged in the mediawiki_revision_tags_change table to identify the proportion of published edits tagged with the new hidden change tag editcheck-newcontent that also include the change tag editcheck-newreference (see tag definitions below). The results below reflect data logged between 7 July 2023 (after both tags were deployed) to 22 July 2023 across all Wikipedia main namespaces. Bots were excluded.
- editcheck-newreference: Implemented in T325713 to identify all edits made using the visual editor to pages in the main namespace that involve an edit where people add a new (non-reused) reference. Deployed on July 6th. Note: This way this is currently implemented does not count the re-use of an existing reference.
- editcheck-newcontent: Implemented in T333714 to identify all edits made using the visual editor that meet the edit check heuristic conditions defined in T324733 with the exception that this tag does not consider whether a new reference was added as part of the edit in question. Deployed on July 3rd.
Results
Overall:
20% of new content edits made with VE include a new reference.
Total new content edits Total new content edits with a new reference Proportion of new content edits with a new reference
88523 18011 20.3%
By Editor Logged In Status:
A higher proportion of new content edits by registered editors include a new reference compared to unregistered editors
User status Total new content edits Total new content edits with a new reference Proportion of new content edits with a new reference
registered 70305 15613 22.2%
unregistered 18228 2400 13.2%
By Editor Experience:
Newcomers and junior editors are less likely to add a new reference with their new content edit compared to more senior editors. As shown in the chart below, the proportion of new content edits with a reference increases as user experience increases.
12% of new content edits completed by newcomers (users completing their first edit as a registered user) include at least one new reference while 26% of new content edits completed by senior editors include at least one new reference.
By Wiki
Results vary per wiki and range from a high of 36% of new content edits that include a reference at Urdu Wikipedia (urwiki) to a low of 2.4% at Malagasy Wikipedia (mgwiki). Note: I limited the per wiki analysis only to wikis that had over 100 new content edits in the reviewed time period.
Some other results for mid-size to larger wikis:
English Wikipedia: 23.4%
French Wikipedia: 18.9%
Portuguese Wikipedia: 23.4%
Hausa Wikipedia: 19.0%
Catalan Wikipedia: 34%
Spanish Wikipedia: 18.9%
By Revert Status
Note: This includes edits reverted within 48 hours. Some edits may have been reverted past that time.
What proportion of new content edits that include a reference are reverted?:
Total new content edits with a new reference Number of these edits reverted Proportion of these edits reverted
18011 1120 6.2%
Only about 6.2% of new content edits with at least one new reference are reverted.
How does that compare to the revert rate of new content edits that do not include a new reference?
Total new content edits without a new reference Number of these edits reverted Proportion of these edits reverted
70519 8127 11.5%
A higher proportion (11.5%) of new content edits that do not include a reference were reverted during the reviewed time period.
Comment ActionsThis looks great, @MNeisler. Per what we talked about offline today, as a next step we're going to add per wiki breakdowns [i] for:
- The proportion of new content edits that include a new reference by editor experience
- How the revert rate of new content edits vary between those new content edits that do and do not include references
- Note: if we can see this broken out by experience level, that would be wonderful.
I also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new new content, overall and broken out by project and experience level?
i. The wikis where we'd like to see project-specific breakdowns include: en.wiki, fr.wiki, sw.wiki, ar.wiki, pt.wiki, ha.wiki, ig.wiki, af.wiki, yo.wiki, de.wiki. via Superset
ii. Read: the same conditions we've used to compute other metrics on this taskComment Actions@ppelberg
Please see results for additional per wiki breakdowns below:
The proportion of new content edits that include a new reference by editor experience and wiki
Note: swwiki, afwiki, igwiki and yowiki did not have sufficient events for analysis. At least one more month of data is likely needed to include per wiki breakdowns for these wikis.
When broken down by project, we see roughly similar overall trends with new editors least likely to publish a new content edit with a new reference and senior contributors more likely. Some initial observations:
- Hawiki did not have any new editors that published an edit on a main namespace during the reviewed period and only 8% of edits by junior contributors on this project included at least one new reference (compared to 22% at English Wikipedia).
- At both arwiki and ptwiki, senior editors with over 500 edits are 3 times more likely to publish an edit with a new reference compared to new editors. This difference is higher than the difference observed between senior and new editors at the other reviewed wikis.
- At Ptwiki, we observed the highest proportion of new content edits with new references by senior editors (34% for editors with 100-500 edits and 32% for editors with over 500 edits)
- The other larger wikis enwiki, dewiki and frwiki follow overall trends with the percent of new content edits that include a reference consistently increasing with experience.
How does the revert rate of new content edits vary between those new content edits that do and do not include references?
Overall by experience level
- For each editor experience level, new content edits that include at least one new reference are reverted less than new content edits without a new reference.
- For new editors and junior editors (under 100 edits), there's a higher percent difference between the revert rate of new content edits with a reference and without a reference. For example, the revert rate of new content edits by junior editors (under 100 edits) decreased by about 42% (15% → 8.8%) when at least one new reference was included compared to decreasing by 28% (1% -> 0.75%) for senior editors (over 500 edits).
By Wiki
For each reviewed project, new content edits that include a new reference are reverted less than new content edits that do not include a new reference but the percent decrease varies per project.
- Ptwiki has the lowest revert rate (0.63%) of new content edits that include a reference. This is about a -92% decrease (8.8% → 0.63%) from new content edits that do not include a reference.
- At arwiki, the inclusion of a reference had a smaller impact on the likelihood of an edit being reverted. At this wiki, there was a -20.5% decrease (12.9% → 10.25%) in the proportion of new content edits reverted when a new reference was included.
- Hawiki did not have any new content edits reverted during the reviewed timeframe so it is not shown in the cart above.
By wiki and experience level
To understand the impact of editor's experience level factors on their likelihood of their edit being reverted, I also reviewed revert rates by experience level for each project.
The above chart shows the proportion of new content edits reverted by whether a new reference was included for each wiki (represented by each row) and experience level (represented by each column).
- Enwiki, frwiki, and dewiki had similar trends with new content edits by new editors reverted more frequently than more senior editors. If the new content edit included a new reference, then the proportion of edits reverted decreased for each experience level. Higher percent decreases were observed for new editors and junior contributors compared to senior editors.
- At enwiki, 40.4% of new content edits by new editors without a reference are reverted. This is a 50% percent increase compared to the rate observed on other wikis.
- At ptwiki, a very small proportion of the 500 new content edits that included a reference were reverted (0.6%). These few reverted edits were completed by junior editors with between 1-99 edits and senior editors.
- At arwiki, a high proportion (35.7%) of new content edits by junior editors that included a reference were reverted. In contrast to other per wiki and overall trends, this is higher than the observed revert rate (14.8%) of new content edits that did not include a reference.
Note on per wiki revert rate analysis:
- This analysis reviewed edits reverted within 48 hours. Some of the reviewed wikis may take a longer time to revert new content edits, which may account for some of the observed per wiki variation in revert rates.
Comment ActionsI also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new content, overall and broken out by project and experience level?
Yes this is feasible and would not require too much additional work as it involves just a small change to the existing query. I wonder if it would be worthwhile to track as a separate task since it is a different scope/different question. If ok, I can create one and will track work on that analysis there.
Comment Actions
In T332848#9077281, @MNeisler wrote:
I also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new content, overall and broken out by project and experience level?
Yes this is feasible and would not require too much additional work as it involves just a small change to the existing query. I wonder if it would be worthwhile to track as a separate task since it is a different scope/different question. If ok, I can create one and will track work on that analysis there.
Pursing the answer to this question in a new task sounds great – thank you, @MNeisler.
Comment Actions
In T332848#9077218, @MNeisler wrote:
@ppelberg
Please see results for additional per wiki breakdowns below...
Excellent – this additional analysis offers precisely what we were seeking...thank you, @MNeisler.
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Disclaimer ·
- Mentioned In
- T371158: [SPIKE] What percentage of edits are reverted because of peacock behavior?
T346982: Generate a list of references people cite when adding new content
T344132: [Analysis] What percentage of all edits to the main namespace with VisualEditor edits add new information
T333714: Introduce a tag to identify edits that involve people adding new content - Mentioned Here
- T333714: Introduce a tag to identify edits that involve people adding new content
T324733: Introduce a tag to identify edits that meet the Edit Check heuristic
T325713: Introduce a change tag to identify edits that include a reference
Event Timeline
The new edit tags to identify edits that involve people adding new content (editcheck-newcontent ) and edits that include a reference (editcheck-newreference) have been deployed. I completed an initial QA check today and confirmed that both of these tags are currently being recorded in the database.
I'll plan to start this task next week after we log about a week worth of data.
@ppelberg Results from this analysis are summarized below. Please let me know if you have any questions.
Methodology
I reviewed data logged in the mediawiki_revision_tags_change table to identify the proportion of published edits tagged with the new hidden change tag editcheck-newcontent that also include the change tag editcheck-newreference (see tag definitions below). The results below reflect data logged between 7 July 2023 (after both tags were deployed) to 22 July 2023 across all Wikipedia main namespaces. Bots were excluded.
- editcheck-newreference: Implemented in T325713 to identify all edits made using the visual editor to pages in the main namespace that involve an edit where people add a new (non-reused) reference. Deployed on July 6th. Note: This way this is currently implemented does not count the re-use of an existing reference.
- editcheck-newcontent: Implemented in T333714 to identify all edits made using the visual editor that meet the edit check heuristic conditions defined in T324733 with the exception that this tag does not consider whether a new reference was added as part of the edit in question. Deployed on July 3rd.
Results
Overall:
20% of new content edits made with VE include a new reference.
Total new content edits | Total new content edits with a new reference | Proportion of new content edits with a new reference |
---|---|---|
88523 | 18011 | 20.3% |
By Editor Logged In Status:
A higher proportion of new content edits by registered editors include a new reference compared to unregistered editors
User status | Total new content edits | Total new content edits with a new reference | Proportion of new content edits with a new reference |
---|---|---|---|
registered | 70305 | 15613 | 22.2% |
unregistered | 18228 | 2400 | 13.2% |
By Editor Experience:
Newcomers and junior editors are less likely to add a new reference with their new content edit compared to more senior editors. As shown in the chart below, the proportion of new content edits with a reference increases as user experience increases.
12% of new content edits completed by newcomers (users completing their first edit as a registered user) include at least one new reference while 26% of new content edits completed by senior editors include at least one new reference.
By Wiki
Results vary per wiki and range from a high of 36% of new content edits that include a reference at Urdu Wikipedia (urwiki) to a low of 2.4% at Malagasy Wikipedia (mgwiki). Note: I limited the per wiki analysis only to wikis that had over 100 new content edits in the reviewed time period.
Some other results for mid-size to larger wikis:
English Wikipedia: 23.4%
French Wikipedia: 18.9%
Portuguese Wikipedia: 23.4%
Hausa Wikipedia: 19.0%
Catalan Wikipedia: 34%
Spanish Wikipedia: 18.9%
By Revert Status
Note: This includes edits reverted within 48 hours. Some edits may have been reverted past that time.
What proportion of new content edits that include a reference are reverted?:
Total new content edits with a new reference | Number of these edits reverted | Proportion of these edits reverted |
---|---|---|
18011 | 1120 | 6.2% |
Only about 6.2% of new content edits with at least one new reference are reverted.
How does that compare to the revert rate of new content edits that do not include a new reference?
Total new content edits without a new reference | Number of these edits reverted | Proportion of these edits reverted |
---|---|---|
70519 | 8127 | 11.5% |
A higher proportion (11.5%) of new content edits that do not include a reference were reverted during the reviewed time period.
This looks great, @MNeisler. Per what we talked about offline today, as a next step we're going to add per wiki breakdowns [i] for:
- The proportion of new content edits that include a new reference by editor experience
- How the revert rate of new content edits vary between those new content edits that do and do not include references
- Note: if we can see this broken out by experience level, that would be wonderful.
I also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new new content, overall and broken out by project and experience level?
i. The wikis where we'd like to see project-specific breakdowns include: en.wiki, fr.wiki, sw.wiki, ar.wiki, pt.wiki, ha.wiki, ig.wiki, af.wiki, yo.wiki, de.wiki. via Superset
ii. Read: the same conditions we've used to compute other metrics on this task
@ppelberg
Please see results for additional per wiki breakdowns below:
The proportion of new content edits that include a new reference by editor experience and wiki
Note: swwiki, afwiki, igwiki and yowiki did not have sufficient events for analysis. At least one more month of data is likely needed to include per wiki breakdowns for these wikis.
When broken down by project, we see roughly similar overall trends with new editors least likely to publish a new content edit with a new reference and senior contributors more likely. Some initial observations:
- Hawiki did not have any new editors that published an edit on a main namespace during the reviewed period and only 8% of edits by junior contributors on this project included at least one new reference (compared to 22% at English Wikipedia).
- At both arwiki and ptwiki, senior editors with over 500 edits are 3 times more likely to publish an edit with a new reference compared to new editors. This difference is higher than the difference observed between senior and new editors at the other reviewed wikis.
- At Ptwiki, we observed the highest proportion of new content edits with new references by senior editors (34% for editors with 100-500 edits and 32% for editors with over 500 edits)
- The other larger wikis enwiki, dewiki and frwiki follow overall trends with the percent of new content edits that include a reference consistently increasing with experience.
How does the revert rate of new content edits vary between those new content edits that do and do not include references?
Overall by experience level
- For each editor experience level, new content edits that include at least one new reference are reverted less than new content edits without a new reference.
- For new editors and junior editors (under 100 edits), there's a higher percent difference between the revert rate of new content edits with a reference and without a reference. For example, the revert rate of new content edits by junior editors (under 100 edits) decreased by about 42% (15% → 8.8%) when at least one new reference was included compared to decreasing by 28% (1% -> 0.75%) for senior editors (over 500 edits).
By Wiki
For each reviewed project, new content edits that include a new reference are reverted less than new content edits that do not include a new reference but the percent decrease varies per project.
- Ptwiki has the lowest revert rate (0.63%) of new content edits that include a reference. This is about a -92% decrease (8.8% → 0.63%) from new content edits that do not include a reference.
- At arwiki, the inclusion of a reference had a smaller impact on the likelihood of an edit being reverted. At this wiki, there was a -20.5% decrease (12.9% → 10.25%) in the proportion of new content edits reverted when a new reference was included.
- Hawiki did not have any new content edits reverted during the reviewed timeframe so it is not shown in the cart above.
By wiki and experience level
To understand the impact of editor's experience level factors on their likelihood of their edit being reverted, I also reviewed revert rates by experience level for each project.
The above chart shows the proportion of new content edits reverted by whether a new reference was included for each wiki (represented by each row) and experience level (represented by each column).
- Enwiki, frwiki, and dewiki had similar trends with new content edits by new editors reverted more frequently than more senior editors. If the new content edit included a new reference, then the proportion of edits reverted decreased for each experience level. Higher percent decreases were observed for new editors and junior contributors compared to senior editors.
- At enwiki, 40.4% of new content edits by new editors without a reference are reverted. This is a 50% percent increase compared to the rate observed on other wikis.
- At ptwiki, a very small proportion of the 500 new content edits that included a reference were reverted (0.6%). These few reverted edits were completed by junior editors with between 1-99 edits and senior editors.
- At arwiki, a high proportion (35.7%) of new content edits by junior editors that included a reference were reverted. In contrast to other per wiki and overall trends, this is higher than the observed revert rate (14.8%) of new content edits that did not include a reference.
Note on per wiki revert rate analysis:
- This analysis reviewed edits reverted within 48 hours. Some of the reviewed wikis may take a longer time to revert new content edits, which may account for some of the observed per wiki variation in revert rates.
I also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new content, overall and broken out by project and experience level?
Yes this is feasible and would not require too much additional work as it involves just a small change to the existing query. I wonder if it would be worthwhile to track as a separate task since it is a different scope/different question. If ok, I can create one and will track work on that analysis there.
In T332848#9077281, @MNeisler wrote:I also wonder: do you think it would be feasible to report on additional metric that we hadn't previously discussed as being within the scope of this task (see below)?
Proposed additional metric: proportion of all edits to the main namespace made with VE [ii] that involve people adding new content, overall and broken out by project and experience level?
Yes this is feasible and would not require too much additional work as it involves just a small change to the existing query. I wonder if it would be worthwhile to track as a separate task since it is a different scope/different question. If ok, I can create one and will track work on that analysis there.
Pursing the answer to this question in a new task sounds great – thank you, @MNeisler.
In T332848#9077218, @MNeisler wrote:@ppelberg
Please see results for additional per wiki breakdowns below...
Excellent – this additional analysis offers precisely what we were seeking...thank you, @MNeisler.