Open Data

Transparency for everyone

  • Technologie
12. Oktober 2017

The use of open data facilitates new insights into science, industry and society. Did you know that there are 21 mountains called “Schwarzhorn” in Switzerland? Or that Bahnhofstrasse in Zurich was named in 1863 and is 1,173 metres long? What historic buildings and monuments are there right on your doorstep? Open data and open government data (OGD) make these and many more snippets of information possible. By combining various datasets, for instance, applications 1 that work with different visualizations can be developed. But how is data made open and what characterizes “open data”? Open Definition 2 words it as follows:

Open data is data that can be accessed freely, and used, altered and shared freely. 3

From historical images and research data to Zurich’s playgrounds...

In principle, all data can be opened, i.e. rendered accessible, whether it be your own photos on image platforms such as Wikimedia Commons, cooking recipes on a forum or travelogues on your own blog. Typically, however, open data increasingly involves larger datasets. After all, a certain amount of data is crucial to develop corresponding applications and thus generate added value.

Meanwhile, many libraries, archives or museums run online platforms where “data” is available openly and freely. This might be digitized images and/or the metadata, i.e. the describing information, on this data. Based on metadata on a historical collection, an application could be developed that shows the places where the objects’ copyright holders come from on a map. Or a user can pinpoint the era from which most works stem. It then becomes fascinating to combine the two pieces of information in a dynamic visualization: do the places of origin change over time?

In academia, it is usually research data, i.e. data collected from observations or experiments, which is rendered publicly accessible. This promotes international exchanges and drives research forward.

Open government data is also increasingly available to the population at large. For instance, a wide range of data can be searched for and subsequently downloaded in the desired format in the City of Zurich’s open data catalogue 4 . This might be information on catering businesses by year and district, locations of fountains, playgrounds, churches or day-nurseries in the City of Zurich or even greenhouse gas emissions over several decades, to name but a few examples.

Open data cityOpen data on the web stems from many producers and exists in a wide range of forms. From an archive’s image files such as on ETH Library’s platform E-Pics to Switzerland’s open data portal (opendata.swiss) and open data collections (known as repositories). There are free access and multifaceted processing possibilities here.

How much openness is allowed? The five levels of open data

For open data to be compatible with various programmes and for it to be downloaded and processed further on computers all over the world, a universal definition of “openness” is required. 5 Tim Berners-Lee, the inventor of the World Wide Web, proposes a five-star open data model. 6 On the first, bottommost level, data – regardless of the format – is provided under an open licence. 7 The next level requires the data to be restructured – as an Excel table, for instance. On the next level, open data is characterized by a non-proprietary, i.e. open, file format, which might be a CSV file, for example. The second-highest level also requires URIs (unique resource identifiers) to enable data to be located simply and persistently. On the fifth level, open data is linked to other data as linked open data to render the context visible.

Five levels of open data modelTim Berners-Lee

Visible and promoting innovation – advantages of open data

The concept of open data has many advantages. The six cited most frequently in the specialist literature are listed here.

1. Transparency

Users can swiftly and easily get an idea of the kind, scope and content of open data.

2. Accessibility

Open data is swift and straightforward to access.

3. Contribution toward the global information infrastructure

The efficiency is increased as data only needs to be collected once, thereby halving the work.

4. Development of innovative applications and services

Open data simplifies and accelerates the development of new services as there is no need to clarify legal issues.

5. Creation of new business models

Thanks to the concept of open data, new business models are born.

6. Traceability in search engines

Open data is indexed and thus displayed in the list of hits during web searches.

Besides the positive aspects and opportunities to gain new insights, naturally open data also poses challenges – none more so than data protection: certain data, such as customer information, must not simply be rendered openly accessible. This concerns patient data in medicine or personal data, for example, where absolute anonymity must be guaranteed. Thus, there always needs to be a thorough legal explanation of which data is allowed to be rendered freely accessible in the first place.

Institutions have to use substantial resources to select data in advance and seek the necessary legal clarifications. A technical download option also needs to be provided for developers and programmers to reach the open data as uncomplicatedly as possible. Moreover, a certain level of expertise on the part of the users is required to programme a visualization or application from the raw data. Otherwise, the finest data streams on the highest open level are useless.

Open data in practice: visualizations and applications

Needless to say, institutions and authorities rendering open data accessible is only one aspect. Processing data further into visualizations or applications is just as important to obtain added value from the open data. So-called hackathons are a prime example of this. Data providers, hackers and anyone else who is interested get together and endeavour to develop new applications or visualizations based on open data in the space of two to three days. The Swiss Open Cultural Data Hackathon was also launched against this backdrop in 2015 and took place for the third time in September 2017. It yielded many new projects, all based on open data.

ETH Library was present as a data provider too and supplied metadata on Carl Gustav Jung’s letters, which are kept in the holdings of the ETH Zurich University Archives. In a project team formed on location called Jung-Rilke Correspondence Networks, an intensive discussion on the letter holdings of C.G. Jung and Rainer Maria Rilke (from the Swiss Literary Archives) took place. The goal was to gradually edit and enrich the data with a view to subsequently visualizing it in a different way. For instance, the geocoding of the sender locations available enabled the correspondence networks to be depicted on maps. 8

Geographic distribution of the sender locations of the C.G. Jung lettersVisualization on the online tool Palladio.
Heatmap of Rainer Maria Rilke’s correspondenceThe bigger and the more red within the points, the more letters have been sent from that location. Visualized with Google Fusion Tables.
Correspondence from and to C. G. Jung visualized as a networkTwo nodes are clearly recognizable: Carl Gustav Jung (below) and his secretary’s office (above). Visualized with the tool Gephi.

Open data in research: open research data

The publication of results is a basic principle of modern academia. It enables scientists to identify errors and support, reject or build on theories. Openness is essential for modern science to be able to self-correct and keep improving. 9 One new concept is the idea that not only evaluations and interpretations of data should be published, but also already pure measurement series and raw data from experiments. This means so-called open research data.

There are good reasons as to why research data should also be published as open data. These include citing academic studies more frequently if the underlying data is publicly accessible. Such studies have greater credibility. Moreover, high-quality research data can be published additionally in specialist journals if necessary and ultimately the research project can be re-used for new projects with reproducible results. 10

In the internet age, open research data is more current than ever. Public-funded data and research results should especially be freely accessible to everyone according to the guidelines of the funding institutions – not least to advance research. Consequently, the Swiss National Science Foundation (SNSF) has demanded data management plans (DMPs) as a component of project applications since October 2017. A DMP outlines how the research data is generated, collected, documented, published or rendered publicly accessible and ultimately archived digitally within the scope of a project and thus helps researchers plan the life cycle of their data. 11

The lifecycle of research data

The Swiss National Science Foundation (SNSF) guidelines 12 for researchers provide key criteria for compiling a data management plan. They are based on the FAIR principles 13 : research data should be Findable, Accessible, Interoperable, Re-Usable. Four areas need to be covered with a data management plan:

  • Data collection and documentation
  • Ethical, legal and security-related issues
  • Data storage and retention
  • Exchange and re-use of data

The Digital Curation Office at ETH Library offers comprehensive advice and support on all aspects of data management. 14 Researchers from ETH Zurich also have the possibility of publishing their data on the Research Collection, ETH Zurich’s repository for publications and research data, from where it is exported automatically into the ETH Data Archive for long-term archiving. 15

Open data at ETH Library

As an information service provider, ETH Library uses the possibility of linked open data by enriching the entries in its own catalogue with dynamic links to Wikipedia, the German Digital Library and other sources.

As a supplier of open bibliographical metadata and digital copies, ETH Library would like to contribute towards the open data movement. Whenever possible, its own data is provided for further use without legal restrictions, i.e. under a public domain mark 16 or what is known as a CC0 licence 17 . If the prerequisites for this are not fulfilled, as open a licence as possible is selected from the Creative Commons “kit”. Specifically, bibliographical metadata as metadata sets of different publication types are available for download as a packet or via a direct interface (Z 39.50). Moreover, ETH Library’s Image Archive enables links to the holdings connected to GND 18 -referenced people via what is referred to as a BEACON file. Thousands of digitized documents – both texts and images – can be viewed and downloaded from various platforms:

  • E-Pics Image Archive Online: ETH Library’s Image Archive with images of the academic and technical history of Switzerland, landscapes and townscapes, the Swissair Photo Archive and the Archive of the photographic agency Comet Photo AG. Freely accessible image series are regularly presented on the Crowdsourcing blog. Selected digital copies and „born digitals” are also provided for re-use with an open licence on Wikimedia Commons, which simplifies their integration in Wikipedia articles.
  • e-rara.ch: digitized books from Swiss libraries from the 15th to the 20th century.
  • e-manuscripta.ch: digitized manuscript materials from Swiss libraries and archives.

Outlook: Open data on the advance?

The fact that the principle of open data is spreading ever further and governments all over the world are increasingly tending to open their data is clearly evident on the Open Data Barometer. 19 Since 2013 the portal has recorded how authorities have been providing open data with regard to accountability, innovation and social impact. Although Switzerland has managed to improve every year, with a score of 43/100 it is only in 22nd place out of 77 nations evaluated. Only if authorities and organisations continue to act with open data in mind and the public at large learns just how much potential open data harbours can the trend towards an open information society be expedited. ETH Library is dedicated to advancing this development and continuing to promote and raise awareness of the notion of open data.

Footnotes

  1. There is a selection of open data applications on the Swiss open government data portal: https://opendata.swiss/de/app (29.08.2017). ↩︎
  2. Open Definition (2017): Open Definition. Version 2.0. Available on: http://opendefinition.org/od/2.0/de/ (08.08.2017). ↩︎
  3. The main features of open data: 1. Availability and access (data must be available as a whole unit and for free download on the internet). 2. Re-use and transfer (data must be machine-readable and available in such a way as to facilitate its unconditional re-use and transfer. 3. University involvement (it must be possible for everyone and anyone to use and re-use the data). Cf. Open Knowledge International (2017): What is open? Available on: https://okfn.org/opendata/ (08.08.2017). ↩︎
  4. Available on https://data.stadt-zuerich.ch/ (11.08.2017). ↩︎
  5. Open Data Handbook (2017): What is open data? Available on: http://opendatahandbook.org/guide/de/what-is-open-data/ (08.08.2017). ↩︎
  6. 5 Stern offene Daten (2012). Available on: http://5stardata.info/de/ (08.08.2017). ↩︎
  7. An open, or free, licence permits the use, proliferation and amendment of copyright-protected works. A prime example of these freely allocated licences is Creative Commons: https://creativecommons.org/ (28.09.2017). ↩︎
  8. The video of the project’s final presentation provides an insight into the Jung-Rilke Correspondence Networks project and the various visualization possibilities: https://vimeo.com/234627486(28.09.2017) and the hackathon’s Wiki project page: http://make.opendata.ch/wiki/project:jung_rilke_correspondance_network (29.09.2017). ↩︎
  9. The Royal Society (2012): Final report – Science as an open enterprise. Available on: https://royalsociety.org/topics-policy/projects/science-public-enterprise/report/ (28.07.2017). ↩︎
  10. Cf. ETH Library, Open Access at ETH Zurich (2017): Publishing research data. Available on: http://www.library.ethz.ch/en/ms/Open-Access-an-der-ETH-Zuerich/Forschungsdaten-publizieren (28.07.2017). ↩︎
  11. Swiss National Science Foundation (SNSF) (2017): Open research data: Das sind die SNF-Guidelines für die Datenmanagementpläne. Available on: http://www.snf.ch/de/fokusForschung/newsroom/Seiten/news-170511-open-research-data-snf-guidelines-fuer-datenmanagementplaene.aspx (28.07.2017). ↩︎
  12. Swiss National Science Foundation (SNSF) (2017): Daten Management Plan (DMP) – Leitlinien für Forschende. Available on: http://www.snf.ch/de/derSnf/forschungspolitische_positionen/open_research_data/Seiten/data-management-plan-dmp-leitlinien-fuer-forschende.aspx (28.07.2017). ↩︎
  13. Cf. Wilkinson, Mark D. (2016): The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. Available on: http://doi.org/10.1038/sdata.2016.18. ↩︎
  14. Cf. ETH-Bibliothek, Digital Curation (2017): Data management. Available on: http://www.library.ethz.ch/en/ms/Digitaler-Datenerhalt-an-der-ETH-Zuerich/Forschungsdaten/Datenmanagement (28.07.2017). ↩︎
  15. It is also possible to archive data directly in the ETH Data Archive. This is especially recommended for large data packets, regular and automated archiving or even for an early structuring of the data on location. A detailed list is provided on the ETH Library website. ↩︎
  16. Cf. https://creativecommons.org/publicdomain/mark/1.0/deed.de (02.10.2017). ↩︎
  17. Cf. https://creativecommons.org/publicdomain/zero/1.0/deed.de (02.10.2017). ↩︎
  18. GND is the integrated authority file and contains datasets for people, families, corporations, conferences, geographics, specialist terms and working titles: https://wiki.dnb.de/display/ILTIS/Informationsseite+zur+GND (29.09.2017). ↩︎
  19. http://opendatabarometer.org (11.08.2017). ↩︎