Speakers
Semantic Scholar: Sebastian Kohlmeier
Citation Hunt: Guilherme Gonçalves<ggonc...@google.com>
Citation related work in Wikidata.org / WMDE by Lydia Pintscher
Self-Introduction:
Aaron Halfaker: WMF Principle Research Scientist, AI team for detecting vandalism, quality, topic modeling. Studies quality control dynamics in Wikis like Wikipedia.
Elan: Google, WikiLoop Team, WikiLoop Game
James Hare: working on Wikibase for COVID19 research. Documenting works on Wikidata. / Citation on Wikipedia
Maria: Google Open Source Team / WikiLoop communication lead. Interested in citations. Makes open information reliable for users.
Thad: Big data, different sort of citation, cite / referenceable datapoint, most interested in machine-readable cite formats
Vinay: Google Brain, help create a tool to assist WP editors create stub articles.
Lydia: Product Management for WD at WMDE.
Guilherme Goncalves: SRE at Google, Citation Hunt author. Think of Citation as a good way for micro-contribution
Sebastian: Senior PM of Allen Institute for AI
SJ Klein: federated data, Wikipedia + Knowledge Futures group
Citation Graph by Allen Institute for AI, Sebastian Kohlmeier, Link to Slides
Allen Institute for AI: contribute to humanity through AI...
Semantic Scholar: science and technology academic article search engine.
COVID 19 research repository "CORD-19". parsed and extracted large corpus of related academic articles
CitationGraph
186M+ Papers, 1B+ Citations, expand coverage with Microsoft Academic Graph and publishers
Citation Classification: background citation, result citation, method citation.
Citation Alert
CitationGraph + Wikipedia Collaboration
Citation Template Integration (S2CID) / Semantic Scholar Citation ID
Initial work with S2 Author IDs in WIkidata
Integration with Citation Bot
LoopRequest: Want to be able to recommend citations for WIkidata, using similar text.
Citation Hunt by Guilherme Goncalves
Random browsing
Browse by category
Internationalization
Leaderboard
Usage
Training materials, physical events, tracking.
Now on its 5th year, multiple languages, twice a year 3-week long.
Starting again next week!
Various editathons, smaller campaigns, organic traffic
Wikimedians in residence help with this!
Links throughout Wikipedia itself (e.g. Wikipedia:Contributing_to_Wikipedia)
Documentation
Latest development:
Custom Citation Hunt (DEMO)
Next steps:
More languages and campaigns
Integration with other tools, e.g. SuggestBot
UX review, especially mobile
More types of backlog, e.g. OABot
Recommending citations :)
Wikidata and Citation (Lydia) Link to Slides
Wikidata grows and needs automation!
Automation related to citation
Hoped for cleaner data, but not always so.
We can do statistics on this when we have enough instances of a value (country demographics). But things like author names may only appear once or twice: pretty poor.
Prototype: generated a bunch of refs, got feedback from the community
Next: Wikidata game instances for this; early result: +300 refs this way [how does this compare to 1lib1ref in terms of participation, time-per-cite?]
Want more input on how to make this more effective!
Ideas:
1) schema.org: look at Type, to infer meaningful/high quality cites
2) models that could be trained:
A: how suspicious is a cite? A1: How important is it to find a source for this, if it will be kept? A2: where a source exists, how important is a doublecheck that the source makes the claim it’s associated with?
B: how trustworthy is a claim [cited or not]? Context includes the age of the account that posted it, the amount of context for the cite, the # of other claims for that entity, the amount of visibility/traffic the entity-page and the property get, the trustworthiness of any source, as per A above
3) connect w/ 1Lib1Ref and CitationHunt
4) look across the web for pages w/ text that makes a similar claim (entity E w/ value V for property P: look for {E,V,P} in proximity anywhere in a) SemSchol, b) CommonCrawl, c) Google)
LoopRequest and LoopOffer
AI2 offers citation recommendation candidates, Citation Hunt to suggest (long term) Guilherme + Sebastain
Citation Hunt to support Wikidata, Guilherme + Elan + Lydia
"Special” LoopOffer: Thad is there to help (and for free!) with respect to schema.org
WikiData make use of 1Lib1Ref Guilherme + Lydia
Late request: intern to help w/ WP-on-IPFS , SJ + Santhosh
Thad: can we use "Google Crowdsource" for abbreviations, other properties?
→ can highlight schema.org props that are often abbreviated (not canonical strings)
→ could help lydia ]
Interested Guests to invite: Thad, Daniel