Pre-processed cleaned data for the DBPedia ontology, a hierarchy of categories for Wikipedia articles.
Clean files:
hierarchy.csv
- Contains the parent-child relationships and instance countslabels.csv
- Maps IDs to human readable labels (but not all IDs have labels)Queries for the DBPedia ontoloty class hierarchy and their instance counts. The idea behind this is to generate a data visualization from this hierarchy.
The following queries were executed here: https://dbpedia.org/sparql
Sparql query 1:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?subclass WHERE {
?subclass rdfs:subClassOf ?class.
}
Sparql query 2 to include counts:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?subclass (COUNT(?instance) AS ?instanceCount) WHERE {
?subclass rdfs:subClassOf ?class.
OPTIONAL {
?instance rdf:type ?subclass.
}
}
GROUP BY ?class ?subclass
Sparql query 3 to include labels:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?classLabel ?subclass ?subclassLabel (COUNT(?instance) AS ?instanceCount) WHERE {
?subclass rdfs:subClassOf ?class.
OPTIONAL {
?instance rdf:type ?subclass.
}
OPTIONAL {
?class rdfs:label ?classLabel.
FILTER (lang(?classLabel) = "" || lang(?classLabel) = "en")
}
OPTIONAL {
?subclass rdfs:label ?subclassLabel.
FILTER (lang(?subclassLabel) = "" || lang(?subclassLabel) = "en")
}
}
GROUP BY ?class ?classLabel ?subclass ?subclassLabel
Transitive counts, filtered:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?classLabel ?subclass ?subclassLabel (COUNT(?instance) AS ?instanceCount) WHERE {
?subclass rdfs:subClassOf ?class.
OPTIONAL {
?instance rdf:type/rdfs:subClassOf* ?subclass.
}
OPTIONAL {
?class rdfs:label ?classLabel.
FILTER (lang(?classLabel) = "" || lang(?classLabel) = "en")
}
OPTIONAL {
?subclass rdfs:label ?subclassLabel.
FILTER (lang(?subclassLabel) = "" || lang(?subclassLabel) = "en")
}
}
GROUP BY ?class ?classLabel ?subclass ?subclassLabel
HAVING (COUNT(?instance) >= 100)