Markus Stocker bio photo

Markus Stocker

Between information technology and environmental science with a flair for economics, the clarinet, and the world of soups and salads.

Email Twitter Google+ LinkedIn Github

Following the short introductions to RDF, RDFS and OWL, and metadata about sensors this post briefly presents how the Semantic Sensor Network (SSN) ontology can be used to represent sensor data, in addition to metadata about sensors. I will stick to the domain of net ecosystem exchange research and the monitoring of CO2 and H2O fluxes using a LI-7500.

Data

The raw sensor data is simple. I will use the following small sample, shown here in tabular form as I received it from a colleague, in Excel.

Year Day Hour Minute Seconds mSeconds CO2 H2O
2012 84 2 0 0 0 389.6 7.1379
2012 84 2 0 0 100 389.9 7.1305
2012 84 2 0 0 200 385.2 7.1866

We have three measurement values for CO2 and three for H2O. As you can see, the LI-7500 samples at 10 Hz, resulting in 10 measurement values a second, or one every 100 ms. The table does not reveal the units but they are μmol/mol (parts-per-million) and mmol/mol (parts-per-thousand) for CO2 and H2O, respectively.

The second column should perhaps more accurately be labelled day of year. For what comes next, it is more useful to convert the day of year to the day and month of the year. In this case it is March 24, 2012. Thus, we have here measurement data for CO2 and H2O concentration on March 24, 2012 at 2 am and 0, 100, and 200 ms.

Observations

I will now describe how the measurement data shown in the table above can be represented as sensor observations. It is the SSN ontology that reveals the specifics, i.e. what an observation is and how it is structured.

First, recall from the last post that we already used the SSN ontology to model our LI-7500 at the Linnansuo cutover peatland site in Finland. Here are the relevant statements:

InfraredGasAnalyzer rdfs:subClassOf ssn:SensingDevice
LI-COR_LI-7500 rdfs:subClassOf InfraredGasAnalyzer
theLinnansuoLI-7500 rdf:type LI-COR_LI-7500
theLinnansuoLI-7500 ssn:observes moleFractionCO2
theLinnansuoLI-7500 ssn:observes moleFractionH2O
moleFractionCO2 rdf:type ssn:Property
moleFractionH2O rdf:type ssn:Property

In line order, we first state that infrared gas analyzers are sensing devices and the LI-COR LI-7500 is a particular type of infrared gas analyzer. Then we model the particular LI-7500 at Linnansuo as an instance of LI-7500 infrared gas analyzer and state that the sensing device observes two properties, namely the mole fraction of CO2 and the mole fraction of H2O.

Now we add an additional few statements, as follows:

ambientAir rdf:type ssn:FeatureOfInterest
ambientAir ssn:hasProperty moleFractionCO2
ambientAir ssn:hasProperty moleFractionH2O

We add ambient air as the monitored feature (of interest), i.e. the environmental (physical) phenomenon, and state that mole fraction of CO2 and H2O are properties of ambient air.

An observation is the result of estimating a value of a property of a feature, using a sensing method implemented by a sensor.

We can now translate the values in the table above into six observations. As an example, I will work out the first value, 389.6, for the CO2 concentration on March 24, 2012, at 2 am sharp.

First we create an individual instance of the class ssn:Observation.

o1 rdf:type ssn:Observation

Then we state that this observation was observed by our LI-7500 at Linnansuo and is for the property of CO2 mole fraction of the ambient air feature.

o1 ssn:observedBy theLinnansuoLI-7500
o1 ssn:observedProperty moleFractionCO2
o1 ssn:featureOfInterest ambientAir

Note that these statements are more expressive than the row in the table above as the observation relates to the sensing device and feature, both uniquely identified. Granted, the property is encoded in the label CO2 in the table header. However, the moleFractionCO2 URI is superior: it is an identifier, while “CO2” is a label, and it is stated to be a (SSN) property.

Next we represent the actual value. Measurement values are sensor outputs and, specifically, observation values. Observation values are regions in a dimensional space and relate to data values, e.g. numbers.

o1 ssn:observationResult so1
so1 rdf:type ssn:SensorOutput
so1 ssn:hasValue ov1
ov1 rdf:type ssn:ObservationValue
ov1 dul:hasRegionDataValue "389.6"^^xsd:double

This may look somewhat overwhelming. Read it as follows. Our individual observation o1 relates to an observation result so1 which is a sensor output (remember o1, so1, ov1 etc. are individuals, here abbreviated but generally URIs). Sensor outputs relate to values, in this case the particular value ov1 which is an observation value. Obervation values finally relate to the actual number, 389.6, which is here stated to be of datatype double.

Finally, we have time. For simplicitly, I am not going to use an ontology for time here, as it slightly complicates the example. For an actual system I suggest you do, use an appropriate ontology. As you may guess, it is again the observation that relates to time, as follows:

o1 ssn:observationResultTime rt1
rt1 rdf:type dul:TimeInterval
rt1 dul:hasRegionDataValue "2012-03-24T02:00:00.000"^^xsd:dateTime

That’s it.

One could go further and, for instance, add the observation value unit, and there exist ontologies for units as well. It should be straightforward now to repeat the exercise for the other measurement values in the table: just create new individuals o2, so2, … and modify the observed property, observation value, and observation result time accordingly.

Notes

There are clear advantages and some disadvantages in representing sensor data using the SSN ontology and RDF as we just did in this example. Starting with some obvious advantages, computers can easily process RDF while the table is a bit more difficult. More importantly, RDF and SSN enable semantically rich descriptions of the data. Individuals and values are typed and they relate to each other via properties described by the SSN ontology. Observations are “self-describing.” They are a unit and relate the relevant data and metadata so that their interpretation is unambigous (or at least less ambigous than the typical comma separated value text file).

That said, the tabular representation is by far more compact than the RDF version. Consequently there is an obvious increase in data volume and storage requirements. This is however partially due to the RDF version being a richer description. Also, remember that RDF is not meant for human consumption: it is for machines to process.

The suggestion here is not to present sensor observations in RDF to humans; people continue to consume tables and figures. Rather, the suggestion is to move beyond comma separated value text and Excel files for (sensor) data “managed” by computer systems, file formats that continue to be in regular use in environmental science.

Compared to (Open Geospatial Consortium) standards of the Sensor Web Enablement, in particular Observations and Measurements, in addition to syntactical interoperability among systems, enabled by XML, the technologies discussed here also support the semantic interoperability of data and metadata.

Exercise

Let’s run some SPARQL queries against data for the example sensor observations in the table above. First, point your browser at this SPARQL query engine. Then select Text for “Output”. If you now copy and paste the following SPARQL query and hit the “Get Results” button you will see a table that pretty much resembles the one above.

prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix dul: <http://www.loa-cnr.it/ontologies/DUL.owl#>
prefix : <http://envi.uef.fi/saicos#>

select ?sensor ?property ?feature ?value ?time 
from <http://markusstocker.com/assets/posts/sensor-observations/observations.rdf>
where {
  ?s rdf:type ssn:Observation .
  ?s ssn:observedBy ?sensor .
  ?s ssn:observedProperty ?property .
  ?s ssn:featureOfInterest ?feature .
  ?s ssn:observationResult ?so .
  ?so ssn:hasValue ?ov .
  ?ov dul:hasRegionDataValue ?value .
  ?s ssn:observationResultTime ?rt .
  ?rt dul:hasRegionDataValue ?time .
}
order by desc(?time)

Some things in the results are redundant, such as the sensor or the feature. The following query is cleaner (note also the different query syntax, result order, and selected variables)

prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix dul: <http://www.loa-cnr.it/ontologies/DUL.owl#>
prefix : <http://envi.uef.fi/saicos#>

select ?property ?time ?value 
from <http://markusstocker.com/assets/posts/sensor-observations/observations.rdf>
where
{
  [ rdf:type ssn:Observation ;
    ssn:observedProperty ?property ;
    ssn:observationResult [ 
      ssn:hasValue [ dul:hasRegionDataValue ?value ]
    ] ;
    ssn:observationResultTime [ dul:hasRegionDataValue ?time ]
  ]
}
order by asc(?time)

In the last post, we described the sensor with metadata, such as the platform on which it is installed and the location of the platform as well as the temperature operating range of the device. You can execute the following SPARQL query to retrieve these attributes of the LI-7500 at Linnansuo.

prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix dul: <http://www.loa-cnr.it/ontologies/DUL.owl#>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix : <http://envi.uef.fi/saicos#>

select ?platformLocation ?operatingRange
from <http://markusstocker.com/assets/posts/sensor-observations/observations.rdf>
where {
  :theLinnansuoLI-7500 ssn:onPlatform [
    dul:hasLocation [
      geo:hasGeometry [ geo:asWKT ?platformLocation ]
    ]
  ] .
  :theLinnansuoLI-7500 ssn:hasOperatingRange [
    dul:hasRegion [ dul:hasRegionDataValue ?operatingRange ]
  ]
}

Sensor data and metadata about sensors, all described and accessed with the same set of technologies using a generic vocabulary, the SSN ontology, which is pretty much a de facto standard within the community. It is sufficient to know how the SSN ontology represents sensors and observations to be able to interact with the data. No need to get familiar with the data model of a particular database.

Pretty cool no?

Stricly speaking, it is sufficient to know basic SPARQL, RDF, and OWL to discover how to interact with RDF data. Try the following query to retrieve the known classes:

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?class
from <http://markusstocker.com/assets/posts/sensor-observations/observations.rdf>
where {
  ?class rdf:type owl:Class
}

Now that you know what classes exist, you may query for instances of LI-COR_LI-7500:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix : <http://envi.uef.fi/saicos#>

select ?sensor
from <http://markusstocker.com/assets/posts/sensor-observations/observations.rdf>
where {
  ?sensor rdf:type :LI-COR_LI-7500
}

This will give you the LI-7500 at Linnansuo, and with it you can discover more, e.g. observations. As a side note, if your query engine implements basic RDFS reasoning, you could substitute the class :LI-COR_LI-7500 with :InfraredGasAnalyzer and you would also get the LI-7500 at Linnansuo. The query engine here does not support RDFS reasoning. The result is thus empty.

This post is part of a series. Previous posts discussed RDF, RDFS and OWL, and the extraction of metadata about sensing devices from various documents. The next one is about the representation of datasets using the QB vocabulary.