Research data life cycle

Collecting

During this phase all necessary data to be analysed in the project is collected, either by generating new datasets or by reusing earlier collected datasets. This phase lays the foundation of the quality of both the data and the accompanying documentation. Hence, it is important that quality measures are implemented and that all steps of collection is appropriately recorded. During this phase possibly large amounts of data might need to be transferred between data producers, compute facilities and storage facilities.

Documentation

Data documentation should clearly describe how the data was collected, so that someone else can understand and correctly interpret the data. Make use of electronic lab notebooks (often offered by the university / institute) and metadata standards, and name and organise the files produced appropriately, see e.g. A Quick Guide to Organizing Computational Biology Projects for advice.

Data producers

SciLifeLab provides access to a range of pioneering technologies in molecular biosciences. Please find below a selection of data generating SciLifeLab services for genomics, imaging, metabolomics, and proteomics data.

Genomics data

Please find below a selection of SciLifeLab Genomics services

National Genomics Infrastructure (NGI) offers an infrastructure equipped with a comprehensive range of technology platforms for next generation sequencing (NGS) and genotyping.

  • Whole-genome sequencing (human)
  • RNA sequencing
  • Functional genomics & Epigenomics
  • De novo genome sequencing
  • Metagenomics
  • Single-cell genomics

Eukaryotic Single Cell Genomics (ESCG) offers high-throughput single cell transcriptomics services.

  • Study heterogeneity within putatively homogeneous cell populations
  • Unbiased discovery of cell types in complex tissues
  • Characterizing the cellular and genetic composition of tumors

 

Imaging data

Please find below a selection of SciLifeLab Bioimaging and Molecular Structure services

Advanced Light Microscopy (ALM) unit give support with advanced fluorescence microscopy for nanoscale biological visualization, single molecule spectroscopy measurement and analysis with fluorescence correlation spectroscopy (FCS), as well as combined with superresolution dynamical studies (STED-FCS). Moreover, light-sheet fluorescence microscopy support allow users to image live and/or optically cleared larger samples.


Cryo-EM offers access to state-of-the-art equipment and expertise in single particle cryo-EM and cryo-tomography (cryo-ET).

  • An internationally competitive infrastructure that is accessible to all academic scientists in Sweden on equal terms.
  • A training environment where researchers can become familiar with cutting-edge methods in cryo-EM.
  • An educational framework of scientific meetings, practical workshops and training courses to enhance the activity of the broad Swedish scientific community.

 

Metabolomics data

Please find below a selection of SciLifeLab Metabolomics services

Swedish Metabolomics Centre (SMC) specializes in analyzing metabolites and lipids with mass spectrometry based methods.

  • Untargeted methods
  • Targeted methods
  • Method development
  • Open Access (i.e. use the instruments to do the analysis yourself).

Swedish NMR Centre provides access to state-of-the-art NMR instrumentation and methodology.

  • Structural biology
  • Metabolomics
  • Chemical biology and small molecule NMR
  • DNP NMR

 

Proteomics data

Please find below a selection of SciLifeLab Proteomics services

Global Proteomics and Proteogenomics offers proteomics information combined with sample specific genomic and transcriptomics information.

  • Personalized proteomics
  • Disease state/Variant proteomics
  • Unbiased proteogenomics in any species with a sequenced genome
  • XenoProteomics
  • Meta proteomics
  • Plasma proteomics

Chemical proteomics is a national unit expert on supporting drug discovery and development by proteome-wide deconvolution of targets and action mechanisms of small molecules.


Biological Mass Spectrometry (BioMS) national infrastructure enables cutting-edge mass spectrometry and related advanced technology platforms.

  • Clinical proteomics
  • Chemical proteomics
  • Protein structure analysis
  • Lipidomics
  • Glycomics & glycoproteomics
  • Large scale quantitative proteomics
  • Proteogenomics

 

Storage

The PI, and his/her academic institution are ultimately responsible for the data, and ensuring that all data is backed-up is essential. The 3-2-1 rule of thumb means that there should be 3 copies of the data, on 2 different types of media, and 1 of the copies at different physical location. This means that even if all the projects research inputs and outputs are located at a backed-up resource, a (third) copy of the data should be maintained.

At least essential data, such as raw data and other data that may be difficult or even impossible to recreate in case of corruption or loss, should be copied off-site (using e.g. SciLifeLab FAIR Storage or storage provided by the institute).

Consider uploading the raw data to a repository already when receiving them, under an embargo (if it is important that the data remains private during the project). This way there is always an off-site backup with the added benefit of making the data sharing phase more efficient.

Resources

Please find below resources concerning the research data life cycle phase collect in form of training, guidance and/or tools.

Training resources

Guiding resources