So far, the previous sections presented more or less ways to describe data and their structure via RDF and ontology languages built on it (ie RDFS & OWL). This part now focuses on the available mechanisms to query these data, or the ontologies and knowledge bases described by these languages.
The Sparql Protocol and RDF Query Language is such a mechanism and in fact a complete, standardized solution to perform the following:
- Query OWL & RDFS as plain RDF.
- Query RDF datasets & their RDF graphs.
- Realize complex joins across databases.
- Explore data by querying known & unknown relationships.
As a query language, SPARQL provides the means to identify triple subsets and extract information as URIs, literals, blank nodes and subgraphs. Although it is based on SQL, it differs in that it matches patterns in graphs, rather than in related databases.
It was standardized in 2008 by W3C and has currently made its next evolutional step in the form of SPARQL 1.1. This current version extends the previous language features and abilities so as to resolve open issues of its predecessor. Among others it adds an update language, clarifies the relationship between queries, provides a vocabulary for describing SPARQL endpoints and offers several new features. Depending on the version, SPARQL is currently supported by almost all known RDF triple stores, most of which also offer associating endpoints where queries may be sent, evaluated and in turn return their results via HTTP.
In general, the use of SPARQL enables the following:
- Fulfills the 3rd Linked Data principle & provides useful RDF information.
- Offers access towards datasets & their data.
- Enables a manual exploration of datasets & potential consumers to seek for URIs identifying resources of interest.
- Provides the means to identify triple subsets & match patterns in RDF graphs.
As a simple example & referring to the Hellenic Police dataset http://greek-lod.math.auth.gr/police/, such patterns become obvious in scenarios that attempt to identify the following:
- The 1st ten Crime Types of any Status, along with their Total number of crimes,
- Occurring at 05/2010 & under the supervision of ATTICA Police Headquarters.
A simple SPARQL query to retrieve that information with the associating results is shown in the figure below. Analyzing it, a SELECT statement initially defines the result set to be returned, (i.e. distinct values of variables: NAME, TOTAL_CRIMES, DATE, TypeofCrime & STATUS). In turn the WHERE clause defines the various graph patterns, (i.e. subject-predicate-object triples) to be matched in the RDF graph.
Graphically such a match pattern is represented by the “PATTERN TO MATCH” in the following figure below. The same figure also depicts part of the resulting Hellenic Police RDF graph and with green colour, two of the obtained triple subsets that matched the requested pattern. The latter are also visible in the first two records of the SPARQL results. Last but not least, another rejected triple subset is also highlighted with a red colour.