Version: 1.0.1 | Build: 2017-09-08 14:19

CMD2RDF

I'm Admin


Loading .......   

This page describes how CMD records are transformed into RDF and discusses some design decisions. Also described are two case studies that explore the possibilities of enriching the RDF with links to other datasets and the opportinities this creates.

Transforming CMD to RDF

A CMD record adheres to a Profile, which consists of a specific combination of reusable Components. Profiles and Components are specified in an XML vocabulary, which can be transformed to an XML Schema to validate the CMD record. In the case of CMD2RDF Profiles and Components are transformed to RDFS, and a CMD record to a compliant RDF. The CMD RDF thus follows the CMD meta model defined in ISO 24622-1:2013:

This model is reflected in the CMDM RDFS, which defines the core RDF Classes and Properties for CMD2RDF.

As is natural Components become a RDF Class. And it would be natural to map a CMD Element to a RDF Property. However, Elements have some capabilities that complicate this. Elements can have attributes, which can't be attached to RDF Properties. To deal with this the CMD2RDF mapping makes RDF Classes of CMD Elements, and instances of this class have a hasElementValue property and can have additional properties related to Attributes.

<descriptions xmlns="http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c_1271859438177/rdf#" rdf:about="#w354aac28b1b7">
	<cmdm:contains>
		<descriptions_Description rdf:about="#w354aac28b1b7b1">
   			<descriptions_hasDescriptionElementValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.</descriptions_hasDescriptionElementValue>
   			<cmdm:containsAttribute>
   				<descriptions_Description_LanguageIdAttribute rdf:about="#w354aac28b1b7b1Aa">
   					<descriptions_Description_hasLanguageIdAttributeValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">la</descriptions_Description_hasLanguageIdAttributeValue>
   				</descriptions_Description_LanguageIdAttribute>
   			</cmdm:containsAttribute>
		</descriptions_Description>
	</cmdm:contains>
</descriptions>

This example uses the common descriptions Component, which can contain zero or more Description elements. And each Description element can have a LanguageId attribute. All the RDF Classes corresponding to this Component are defined in the RDFS with the URI http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c_1271859438177/rdf (Note: at the moment this URL is not resolvable as the ComponentRegistry doesn't support RDF yet, but the corresponding RDF Graph is available in the CMD2RDF triple store), so all CMD records that use this Component can reuse these Classes and Properties.

The example also shows how CMD2RDF deals with the nesting of Components. In XML and in CMDI the type of relationship between a parent and a child is not made explicit, so CMD2RDF reuses the generic contains Property of the CMD Model to relate the Class instances.

The ComponentRegistry exposes an URI for every reusable Component, but these Components can be the root of a hierarchy of Components, named here inner Components. And these Components ultimitaly lead to Elements and Attributes. Inner Components, Elements and Attributes don't have their own URI, so CMD2RDF creates an URI for them based on URI of the reusable Component and their place in the hierarchy, e.g., descriptions_Description and descriptions_Description_LanguageIdAttribute.

Enriching CMD RDF

Linked (Open) Data is an interesting approach as it enables to link datasets by sharing URI. As CMDI isn't natively based on RDF these URIs are lacking. In the CMD2RDF project 2 experiments were done to enrich the CMD records with URIs that would enable linkage to other datasets.

CLAVAS

CLAVAS is a SKOS-based vocabulary server, also developed by CLARIN-NL. One of the vocabularies CLAVAS contains is a list of organisations and the variety of spellings of their names, sometimes even faulty ones. In CMD2RDF organisation names, as identified by the VLO facet concept mapping, were enriched with the links to the corresponding CLAVAS Concept. This enabled now to search for records related to an organisation without having to deal with variant spellings.

<vlo:hasFacetOrganisationElementValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Max Planc Institute for Psycholinguistics</vlo:hasFacetOrganisationElementValue>
<vlo:hasFacetOrganisationElementEntity rdf:resource="http://openskos.meertens.knaw.nl/Organisations/8c778a30-f607-45fd-838d-1ea00cea9150"/>

Using this vlo:hasFacetOrganisationElementEntity all records from the MPI for Psycholinguistics can be found, even while some (like this one) contains an misspelling. This is shown by the following 2 SPARQL queries: the first uses the organisation value, while the second uses the CLAVAS concept.

WALS

WALS is a rich Typological Database. It provides a wide range of linguistic features for many languages. A CMD record can be linked to a language appearing in WALS via 2 Linked Data hubs: Lexvo.org and DBpedia.org. CMD2RDF extended the language codes found by applying the VLO facet concept mapping with Lexvo.org and DBPedia.org language URI's.

<vlo:hasFacetISO6393ElementValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">yle</vlo:hasFacetISO6393ElementValue>
<vlo:hasFacetISO6393ElementEntity rdf:resource="http://dbpedia.org/resource/ISO_639:yle"/>
<vlo:hasFacetISO6393ElementEntity rdf:resource="http://lexvo.org/id/iso639-3/yle"/>

Now it becomes possible to query for languages with a certain linguistic property and find CLARIN resources, e.g., audio recordings, for them.

This query retrieves CLARIN resources for languages on which WALS has information on The Velar Nasal (WALS feature 9A).