2020-05-27 Notes and Feedback Form for today's WL3C

4 views

Skip to first unread message

Zainan Zhou (a.k.a Victor)

unread,

May 27, 2020, 9:33:54 PM5/27/20

to WikiLoop Coalition, Samuel Klein, Zainan Victor Zhou

Hi WikiLoop Coalition, bcc every individual participants of today,

Thank you for attending today's WikiLoop Coalition Conference Call (WL3C), here are the unrefined notes today.

Useful Link
- Page on meta.wikimedia.org
- Join our Google Groups to subscribe to future calendar invites and emails. Don't tell your friends who you may find this group relevant to.
- Provide your feedback for this session

2020-05-27~ Vandalism Patrol & Useful KG standards

Participants:

James Hare
Thad Guidry
Sebastain Kohlmeier
Zainan Zhou

Summary

The following topics and LoopRequest and LoopOffer were discussed:

Vandalism patrol + Wikiloop Battlefield
Useful KG standards for wikilooping

Raw Notes

Vandalism patrol

Wikiloop Battlefield - staying neutral while providing an interface.

Maintaining community interest w/ a simple + universally useful tool.

Q: why not provide ‘good faith‘ label options? A: simple, avoid value judgement

Community members care a lot about careful assessment. Collecting facts, review.
Where we can contribute: speed up that evidence-gathering.
Goals: lower the barrier to contribute
Can get many more labels from regular readers/users than from wiki editors - working on capturing this. (comparison: you get billions of spam emails, many fewer vandal edits)
Loop requests from WLB: contributions + interface feedback (link to slide w/ invitations)
Loop offer: University of Dallas - Prof Andrews + students may be interested in collaboration + helping w/ such work
Other models?

Predictive models -- Is anyone training a model to predict when a future edit (next edit on a page, next edit by an article) will trigger ORES? Related: models of the potential challenges to a current edit, based on metadata other than (author age) and (edit diff): recent changes to the article, recentchanges for the entire wiki, other topic modeling and
Do trends matter to spam? Could we feed G + W trends into tools to help informhelp focus attention where there will be the most trouble? Cf. early-onset protection (perhaps very gradually titrated, less-restrictive protection options?)

Similar projects that interface w/ or use ORES : ?
Q: does ORES bad-actor data track ‘repeat actors’? Possibly not.
Related: huggle doesn’t track good edits.
Loop request: Tracking across projects
Behavioral fingerprints, textual fingerprints? Tracking individual problems and sock-puppets across projects.
~ Track across namespaces and wikis
~ Track across different tools (compare spam-remediation - talk to blacklist maintainers, WP plugin designers) wikis, blog

Other counter-spam approaches? [many problems considered solve in academia don’t scale well, so simpler min-approaches are implemented.tguidry linkedin]
MS Sharepoint spam?
YT misinfo spam : problem description is considered much different. Comments: be nice. Videos: don’t cross some bright lines (clickbait, illegal)

Useful KG standards (and categories)

What are we using / what do we need?
Beyond schema.org: useful protocols, ontologies, specifications

TG: works often on linked data protocols, and related ontologies
How can we improve linked data implementations within Wikidtaa?

Examples:
~ data shapes + constraints (ShEx), Mutexes (from FB)
Request: Export catalog of mutexes + descriptions -- BigMama + others
canonical functions to enforce constraints (??+)
Request: Is there a current project doing this?
Look into → OKFN catalog, Linked Open Vocabs (terms), [SJ/TG]
Wanted: incompatible type-matrix, meta-catalogs [support unmerge-requests]

Framework for communal agreement on incompat + other type-annotation
~ canonical ways to refer to a function (code + a VM to run it?) :
Future: WikiLambda, other code-wikis
Request: Existing collections of [functions] that would make good Z-objects

~ dataset [ontologies]: update-tempo, patch-mechanism, detailed source tree?
Request: Good current standards for (maintainable, shared datasets)
What standards does G!Dataset Search care about; where to find feeds of changes + submissions, how to connect bots for reading + writing
What other data seas* have helpful approaches to tracking inflows?

~ universal PIDs: concordances of IDs and authority files. (VIAF, Lens ID, …)
compare the fragility of url-shorteners and the IA work to preserve them. PIDs are basically url shorteners...

other KG standards - which [W3C, IEEE] ones to attend to? (Wikidata is often an effective substitute for more brittle + specialized standards, how to engage those processes to combine forces?)

https://datacommons.org/ -- [Q for us: connected w/ the Open Data Commons?]
To what extent does this offer good examples of [dataset ontologies + PID-sets]?

Some metadata fields for Datasets are within here: https://schema.org/Dataset
Idea: Typeset a process/bot to use/process data from specific sources.
Idea: Define a ShEx? shape from a subset of Dataset fields that provides the needed affordances [ draft some rules ] → Start by drafting rules as comments?
Orig Proposal: http://blog.schema.org/2012/07/describing-datasets-with-schemaorg.html

[aside: Leigh Dodds @ODI on data trusts + related norms]

Victor,

Google Search

Reply all

Reply to author

Forward

0 new messages