How FAIR is C-SCALE Data?

A note to the reader: The C-SCALE Data federation does not produce its own data but relies on Copernicus and other spatiotemporal data providers in that department. That is why some of the points in this questionnaire are answered conditionally, with reference to common practices seen at the biggest providers. That said, it is the mission of C-SCALE to make data more findable and accessible so that answering some of these questions is highly relevant. The reader is duly reminded in our comments on questions outside our control.

Findable

It should be possible for others to discover your data. Rich metadata should be available online in a searchable resource, and the data should be assigned a persistent identifier.

A persistent identifier is assigned to your data

A vast majority of data available through the C-SCALE Data Federation originate from the Sentinel programme, which uses a fixed, unique identifier (ID) and a human-readable Title to refer to any of its products. Given the redundant nature of storage providers within the Sentinel collaborative ground segment, or – for that matter – within the C-SCALE federation, a single product (determined by its ID or Title) can be obtainable from multiple sources, and the address of the potential provider is not a part of the identifier. Searching for that identifier through a search interface – be it C-SCALE EO-MQS or any Copernicus dissemination node, however, always leads to the same piece of data, albeit from a different storage site. Moreover, an MD5 checksum is provided among Sentinel metadata, which makes it possible to affirm that the same original data file is indeed being accessed, regardless of the location from where it is being obtained.

There are rich metadata, describing your data

Metadata follow the STAC standard and include rich characteristics of each given product in terms of its origins, parameters, and contents.

The metadata are online in a searchable resource, e.g. a catalogue or data repository

The metadata are available through STAC-API, STAC Browser, and STAC /search endpoints, which are all exposed by the EO-MQS.

The metadata record specifies the persistent identifier

This is partly true. The metadata record specifies only the Title, which is enough to identify the product through searching the catalogue, but not to reference it directly.

Accessible

It should be possible for humans and machines to gain access to your data, under specific conditions or restrictions where appropriate. FAIR does not mean that data need to be open! There should be metadata, even if the data aren’t accessible.

☐ Following the persistent ID will take you to the data or associated metadata

A vast majority of data available through the C-SCALE Data Federation originate from the Sentinel programme, wherewith it is necessary to look up the persistent ID in EO-MQS (or a different catalogue outside C-SCALE) to obtain download links for the data. It is not a locator by itself. A metadata search returns relevant locators that can be followed to download the data from the FedEarthData service.

Furthermore, it is one of the primary characteristics of the C-SCALE Data federation that it reveals the location of multiple copies of the same data file, if available from multiple providers in the federation. It is left to the user do choose among them based on any preference they may have. Using a definite URL as an identifier would defy that trait of the federation.

The protocol by which data can be retrieved follows recognised standards, e.g. HTTP.

The protocol is HTTPS.

The access procedure includes authentication and authorisation steps, if necessary

Authentication is performed when accessing data from most providers of the FedEarthData service. All these sites support OIDC-based single sign-on authentication through EGI Check-in. Sites are also free to offer the data openly, without authentication, if their policies allow that.

Metadata are accessible, wherever possible, even if the data aren’t

Metadata provided by partners and aggregated by the EO-MQS are openly accessible; authentication is only required on actual access when contacting the FedEarthData service.

Interoperable

Data and metadata should conform to recognised formats and standards to allow them to be combined and exchanged.

Data is provided in commonly understood and preferably open formats

A vast majority of data available through the C-SCALE Data Federation originate from the Sentinel programme and follow SAFE (Standard Archive Format for Europe).

The metadata provided follows relevant standards

Metadata follow the STAC (Spatio-Temporal Assets Catalogue) format.

Controlled vocabularies, keywords, thesauri or ontologies are used where possible

Terminology follows the STAC (Spatio-Temporal Assets Catalogue) standard, namely the Core specification.

Qualified references and links are provided to other related data

Related links are provided pursuant to the STAC (Spatio-Temporal Assets Catalogue) specification.

Reusable

Lots of documentation is needed to support data interpretation and reuse. The data should conform to community norms and be clearly licensed so others know what kinds of reuse are permitted.

The data are accurate and well described with many relevant attributes

A vast majority of data available through the C-SCALE Data Federation originate from the Sentinel programme, which is amply documented and supported by numerous tools, and the data themselves are annotated with rich metadata.

The data have a clear and accessible data usage license

Most data available through the C-SCALE Data Federation originate from the Sentinel programme and are released under the Creative Commons CC BY-SA 3.0 IGO license. This is specified in the documentation but not included with actual data.

It is clear how, why and by whom the data have been created and processed

A vast majority of data available through the C-SCALE Data Federation originate from the Sentinel programme, and their origin and level of processing are documented with metadata available both in catalogues and included with the actual data. On top of that, there is a data provenance service by ESA, which provides further detail on each data point’s processing and distribution history.

The data and metadata meet relevant domain standards

Indeed, the data and metadata follow and even define the relevant domain standard.

Based on: Jones, Sarah, & Grootveld, Marjan. (2017). How FAIR are your data?