Remember life before Wikipedia?
"An encyclopedia that anyone can edit"
knowledge base
it's a wiki
community model
link Wikipedia language editions
reuse statements across Wikipedia projects
provide complex query capabilities
It's awesome, especially if you're into data!
Factual claims are stored as statements
subject -- predicate -- object
thing -- relationship -- thing
item -- property -- value
Similar to RDF (and mapped to its model)
Independent of language (identifiers vs. names)
entities, labels, descriptions, statements
types of entities
- items (have wiki-links)
- properties (have data types and constraint statements)
- lexemes
Find item of your home town, school...
Statement details
- properties
- qualifiers
- references
A hub in the linked open data web
Wikidata properties for identifiers
One possible overview:
In groups of 2-3:
add/extend Wikidata items on some of your professors
see existing professor items as boilerplates
collect questions for afterwards
more than 55 million media files
not as shiny as Instagram, YouTube, Flickr...
but Open Content, no commercial interest!
community model
quite "unstructured"
Migration of Wikimedia Commons to Wikibase (2017-2019)
every media file is an entity
- multilingual media file captions
- statements about media files
properties reused from Wikidata, e.g. depicts (P180)
work in progress (e.g. no SPARQL yet)
More information at https://commons.wikimedia.org/wiki/Commons:Structured_data
Intoduction of three new types of entities in 2018:
- Lexemes (L)
- Forms (F)
- Senses (S)
Sample application: http://auregann.fr/derdiedas/
entities, labels, descriptions, statements
- items (have wiki-links)
- properties (have data types and constraint statements)
- lexemes
- property
- qualifiers
- references
Wikidata query service (SPARQL)
Several tools and programming libraries
(big) data dumps
Coverage is very inconsistent
Data modeling is instable
Qualifiers and references help to improve quality
- but not used as much
- harder to query
Working with Wikidata is like doing data science:
cleaning data & fighting with software
People are not paid
Nobody has a full overview
Tools (plenty!) come and go
Be nice and allow misunderstandings