Markus Stocker bio photo

Markus Stocker

Between information technology and environmental science with a flair for economics, the clarinet, and the world of soups and salads.

Email Twitter Google+ LinkedIn Github

Environmental sensor networks generate data. Large networks can consist of dozens, in some cases hundreds, of sensors and generate a considerable amount of data. Small networks with sensors sampling at several kHz can also generate a lot of data. Data is accessed and encoded in various ways. Some sensors implement a tiny web server and data is retrieved via HTTP. The data of other sensors is copied on a USB memory stick and processed from there. Data may be streamed on demand, pushed to a queue, persisted to a store. Data may be binary encoded or plain text, formatted in csv or xml or otherwise. The store may be a relational database systems, a non-standard database, a bunch of files. Clearly, sensor data is heterogeneous.

The purpose of environmental sensor networks is often, if not mostly, to monitor one or more properties of one or more environmental phenomena, over time and space. For instance, a thermometer monitors the temperature of air. Temperature is the property and air is the phenomenon. A beta attenuation monitor monitors the concentration of particulate matter. Concentration is the property and particulate matter is the phenomenon. A differential mobility particle sizer monitors the particle number size distribution of an aerosol. You get the point.

A frequent task in research that builds on environmental sensor networks is to process and analyse data in order to acquire information about the monitored environmental phenomenon. Indeed, scientists have data generated by environmental sensor networks but what they want is information, or knowledge, about the environment. They want to study and understand the formation, development, interactions of environmental phenomena. The assumption is that such information can be acquired from sensor data. The problem is that the task is often laborious and non trivial.

During my studies I stumbled upon this task several times. For instance, I had sensor data for the vibration of the pavement of a road section but what I wanted was information about vehicles on the road. Researchers often use Matlab or R or similar software to process sensor data and implement information acquisition in scripts. So did I. A bunch of Matlab scripts for this, some Phyton code for that, and more Java code for everything else. This may work but I think it is far from optimal. To make matters more interesting, and complicated, over time I encountered two application classes: those that process historical sensor data and those that process real-time sensor data.

At some point I figured that the task can be generalized. In fact, the problem consists of the following main three sub tasks: retrieval (and possible decoding to numbers) of sensor data, processing of sensor data, and information acquisition from processed sensor data. I studied these sub tasks further and eventually I came up with a software framework architecture and implementation for the task. This software framework is called Wavellite.

Wavellite supports the development of applications that build on environmental sensor networks to acquire information for environmental phenomena from sensor data. Heterogeneous sensor data is its input and represented information for environmental phenomena is its output. It covers the three sub tasks of sensor data retrieval, sensor data processing, and information acquisition. It integrates the sub tasks and provides a unified representation for processed and generated data and information.

I have used Wavellite at various stages of development in several case studies on sensor data and environmental phenomena. Wavellite is under active development. If you want to know more, please visit the Wavellite project page.