The World Wide Web, or simply the Web, is a vast and distributed system characterized by agents that manipulate resources. Due to its decentralized nature, the Web observes high diversity of resources that results into high data variety.Data Variety is the diversity of data in a problem space. In the Big Data context, it indicates the challenge of processing heterogeneous data coming from many different sources.
Nowadays, agents that populate the Web environment must take data variety. The Web of Data which is the Web extension that aims at addressing the data variety challenge, using Semantic Technologies such as RDF, SPARQL, and OWL. Nonetheless, data variety is not the only challenge that agents have to address when manipulating Web resources. In particular, a new dimension become more and more relevant in the recent years.Data Velocity is the speed at which data are processed and insights are obtained. In the Big Data context, it indicates the challenge of processing data as soon as they are available, and before it is too late.
To this extent, new protocols and APIs (e.g. WebSockets, and EventSource) have been designed to extend the Web's lowest architectural levels. However, it is worth investigating how this challenge may impact higher levels and, in particular, the Web of Data.
Indeed, the Web of Data is also evolving to tame Data Velocity without neglecting Data Variety. The RDF Stream Processing (RSP) community is actively addressing these challenges by proposing continuous query languages and working prototypes. o far, the RSP community efforts have contributed middleware, engines, and vocabularies that address the scientific and technical challenges. Nevertheless, a set of guidelines that showcase how to reuse existing resources to publish and processing streams and events on the Web is still missing. In this paper, we propose a Cookbook for publishing and processing Streaming Linked Data. In the paper, we focus on the publication life-cycle as it is prescribed by W3C. Indeed, the problem of Streaming Linked Data publication received become more relevant and it is currently under investigation. On the other hand, the Web resource describe in details the resources and propose a series of recipes for processing the published streams. In particular, (i) we make use of the following resources, i.e., R2RML and RRML for converting streams in RDF, TripleWave and VOCALS for their publications, and YASPER and RSP-QL for processing. Moreover, (ii) we bootstrap a catalog of Web Streams, highlighting the requirements that drove our selection of three examples of wild streams: DBpedia Live changes, Wikimedia EventStreams, and the Global Database of Events, Language and Tone (GDELT). Last but not least, (iii) we open sourced our code, and make it available on for public use at https://w3id.org/webstreams.
Emanuele Della Valle
Danh Le Phouc