FAIR stands for Findable, Accessible, Interoperable and Reusable. In Wilkinson, et al 2016 a set of principles were defined for each of these properties. Below, each of the principles are explained further (click on the buttons for detailed information), as adapted from FAIR principles translation.
Data and metadata should be easy to find by both humans and computer systems. Basic machine readable descriptive metadata allows the discovery of interesting datasets and services.
F1: (meta)data are assigned a globally unique and persistent identifier
Action: Ensure that each dataset is assigned a globally unique and persistent identifier. Certain repositories automatically assign identifiers to datasets as a service. If not, researchers must obtain a PID via a PID registration service.
F2: data are described with rich metadata
Action: Fully document each dataset in the metadata, which may include descriptive information about the context, quality and condition, or characteristics of the data. Another researcher in any field, or their computer, should be able to properly understand the nature of your dataset. Be as generous as possible with your metadata (see R1).
F3: metadata clearly and explicitly include the identifier of the data it describes
Action: Make sure that the metadata contains the dataset’s PID.
F4: (meta)data are registered or indexed in a searchable resource
Action: Provide detailed and complete metadata for each dataset (see F2).
Data and metadata should be stored for the long term such that they can be easily accessed and downloaded or locally used by machines and humans using standard communication protocols.
A1: (meta)data are retrievable by their identifier using a standardized communications protocol
Action: Clearly define who can access the actual data, and specify how. It is possible that data will actually not be downloaded, but rather reused in situ. If so, the metadata must specify the conditions under which this is allowed (sometimes versus the conditions needed to fulfill for external usage/“download”).
A1.1: the protocol is open, free, and universally implementable
A1.2: the protocol allows for an authentication and authorization procedure, where necessary
A2: metadata are accessible, even when the data are no longer available
Action: Provide detailed and complete metadata for each dataset (see below in R1).
Data should be ready to be exchanged, interpreted and combined in a (semi)automated way with other datasets by humans as well as computer systems.
I1: (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
Action: Provide machine readable data and metadata in an accessible language, using a well-established formalism. In particular, data and metadata are annotated with resolvable vocabularies/ontologies/thesauri that are commonly used in the field. The RDF extensible knowledge representation model is a way to describe and structure datasets. You can refer to the Dublin Core Schema as an example.
I2: (meta)data use vocabularies that follow FAIR principles
Action: The vocabularies/ontologies/thesauri are themselves findable, accessible, interoperable and thoroughly documented, hence FAIR. Researchers can refer to metrics assessing the FAIRness of a digital resource (if available).
I3: (meta)data include qualified references to other (meta)data
Action: Properly cite relevant/associated datasets, in particular by providing their persistent identifiers, in the metadata, and describe the scientific link/relation to your dataset.
Data and metadata are sufficiently well-described to allow data to be reused in future research, allowing for integration with other compatible data sources. Proper citation must be facilitated, and the conditions under which the data can be used should be clear to machines and humans.
R1: meta(data) are richly described with a plurality of accurate and relevant attributes
- metadata describing the dataset (intrinsic): what does the dataset contain, how was the data generated, how has it been processed, how can it be reused...
- metadata describing the data (submitterdefined): any needed information to properly use the data, such as definitions of the variable names
- Scope of your data: for what purpose was it generated/collected?
- Particularities or limitations about the data that other users should be aware of.
- Date of the dataset generation, lab conditions, who prepared the data, parameter settings, name and version of the software used.
- Is it raw or processed data?
- Variable names are explained or self-explanatory (i.e. defined in the research field’s controlled vocabulary).
- Version of the archived and/or reused data is clearly specified and documented.
R1.1: (meta)data are released with a clear and accessible data usage license
Action: Include information about the license in the metadata. If a particular license is needed, you have to provide it along with the dataset. Where possible it is suggested to use common licenses, such as CC 0, CC BY, etc., which can be referred to by URL.
R1.2. (meta)data are associated with detailed provenance
Action: The metadata to thoroughly describe the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else, potentially transformed or completed? Ideally the workflow is described in a machine-readable format. Criterion I3 is closely linked to this issue when reusing published datasets.
R1.3: (meta)data meet domain relevant community standards
Action: Prepare your (meta)data according to community standards and best practices for data archiving and sharing in your research field. There might be situations where good practice exist for the type of data to be submitted but the submitter has valid and specified reasons to divert from the standard practice. This needs to be addressed in the metadata.
Please find below resources concerning the FAIR principles in form of training, guidance and/or tools.