Wikimedia EventStreams

Wikimedia Foundation is an American non-profit organization that hosts, among the other, open-knowledge projects like Wikipedia. In particular, Wikimedia invests financial and technical resources for the maintenance of projects that foster free and open knowledge.

Wikimedia EventStream (WES) was created for internal data analysis and maintenance and open-sourced in 2018. WES is a web service that exposes streams of structured data following the Server-Sent Events (SSE) protocol, i.e., data is pushed to the interested clients. Schema of the data is versioned and available on GitHub. WES API currently generates eight streams:

All the streams are shared as structured data, i.e., JSON.

In the following, we will focus on recentchanges stream that, among the others, is the one with the most complex content. Listing [lst:wes1]{reference-type=”ref” reference=”lst:wes1”} shows an example of rechentchange stream data which is timestamped (line 1). On the recentchange stream, four kinds of events are possible: "edit" (like in Listing [lst:wes1]{reference-type=”ref” reference=”lst:wes1”}) for existing page modification; "new", for new page creation, "log" for log action, "external" for external changes, and "categorize" for category membership change. Moreover, the "title" indicates the page title in a Wiki of the Wikimedia foundation. This practically links the event to an external entity. Listing [lst:wes1]{reference-type=”ref” reference=”lst:wes1”} points to the Wikidata Entity at https://www.wikidata.org/enity/Q31218558.ttl that we can retrive to enrich the stream1.

  1. We excluded metadata about the Kafka Topic from the current modeling and from the examples.