Research data life cycle


In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public. Whenever possible, domain-specific repositories should be used in order to increase the FAIRness of your research outputs. Click on the buttons below for data type specific information regarding suitable repositories.

ENA (European Nucleotide Archive)
The ENA hosts an instance of the Sequence Read Archive (SRA), the same archive that exists on NCBI. SRA accepts raw sequence data from any sequencing platform, generated in any research project. There are several ways to submit data to ENA, for more information see the documentation.

For convenience, we have created templates for the most frequent data types and their corresponding ENA checklists. The templates come with instructions on how to do an interactive submission, via the ENA Webin Portal, but even when doing a programmatic submission, the template can be useful for collecting all necessary descriptions / metadata. Download an appropriate template, and fill in the sheets according to the instructions in the template:
ArrayExpress is tighty integrated with ENA and similar to NCBI’s Gene Expression Omnibus database it can be used to archive experimental designs and analysis files based on the raw sequence reads. ArrayExpress has its own submission portal where information is available on what can be submitted and how.

EGA (European Genome-phenome Archive)
NBIS is building a local federated version of the European Genome-phenome Archive (EGA) in Sweden (EGA-SE), allowing for the publication of sensitive personal data within a legal framework. Until local EGA is available, the dataset should remain in the secure analysis environment (e.g. at Bianca on Uppmax). We suggest to make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (i.e. a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Once the Swedish EGA is operational, and the dataset deposited there, the access information can be changed to point at the EGA ID. See, for an example.

Depending on the type of image data you have, different public repositories are available, please see the table at BioImage Archive.

MetaboLights is a database for Metabolomics experiments and derived information. The database is cross-species, cross-technique and covers metabolite structures and their reference spectra as well as their biological roles, locations and concentrations, and experimental data from metabolic experiments.

The ProteomeXchange Consortium provides globally coordinated standard data submission and dissemination pipelines involving the main proteomics repositories:
  • PRIDE - admits protein and peptide identification/quantification data with the accompanying mass spectra evidence and any other related data types. Submission is done using the PX Submission Tool, see tutorial.
  • PeptideAtlas - for SRM/MRM data that does not fit into PRIDE (targeted datasets). Submission is done via PASSEL.

For other domain-specific repositories, see e.g. ELIXIR Deposition databases, Scientific Data recommended repositories, EBI archive wizard (help to find the right repository depending on data type), or FAIRsharing (the latter can also assist in finding metadata standards suitable for describing your datasets). For datasets that do not fit into domain-specific repositories, use a general repository e.g. SciLifeLab Data Repository, Figshare and Zenodo.

Resources & Training