Collecting
During this phase all necessary data to be analysed in the project is collected, either by generating new datasets or by reusing earlier collected datasets. This phase lays the foundation of the quality of both the data and the accompanying documentation. Hence, it is important that quality measures are implemented and that all steps of collection is appropriately recorded. During this phase possibly large amounts of data might need to be transferred between data producers, compute facilities and storage facilities.
Documentation
Data documentation should clearly describe how the data was collected, so that someone else can understand and correctly interpret the data. Make use of electronic lab notebooks (often offered by the university / institute) and metadata standards, and name and organise the files produced appropriately, see e.g. ’A Quick Guide to Organizing Computational Biology Projects’ for advice.
Data producers
SciLifeLab provides access to a range of pioneering technologies in molecular biosciences. Please find below a selection of data generating SciLifeLab services for genomics, imaging, metabolomics, and proteomics data.
Genomics data
Please find below a selection of SciLifeLab Genomics services
National Genomics Infrastructure (NGI) offers an infrastructure equipped with a comprehensive range of technology platforms for next generation sequencing (NGS) and genotyping.
- Whole-genome sequencing (human)
- RNA sequencing
- Functional genomics & Epigenomics
- De novo genome sequencing
- Metagenomics
- Single-cell genomics
Clinical Genomics facilitates the translation of new high-throughput (HTP) techniques, such as next-generation sequencing (NGS), into clinical use, in order to meet the needs from translational researchers and clinicians in healthcare and public health systems.
- Advanced data analytics
- Epigenetics
- Long-read sequencing
- Metagenomics
- Multi-omics
- Single-cell and spatial omics
- Ultrasensitive detection
Imaging data
Please find below a selection of SciLifeLab Bioimaging and Molecular Structure services
Advanced Light Microscopy (ALM) unit give support with advanced fluorescence microscopy for nanoscale biological visualization, single molecule spectroscopy measurement and analysis with fluorescence correlation spectroscopy (FCS), as well as combined with superresolution dynamical studies (STED-FCS). Moreover, light-sheet fluorescence microscopy support allow users to image live and/or optically cleared larger samples.
Cryo-EM offers access to state-of-the-art equipment and expertise in single particle cryo-EM and cryo-tomography (cryo-ET).
- An internationally competitive infrastructure that is accessible to all academic scientists in Sweden on equal terms.
- A training environment where researchers can become familiar with cutting-edge methods in cryo-EM.
- An educational framework of scientific meetings, practical workshops and training courses to enhance the activity of the broad Swedish scientific community.
Metabolomics data
Please find below a selection of SciLifeLab Metabolomics services
Swedish Metabolomics Centre (SMC) specializes in analyzing metabolites and lipids with mass spectrometry based methods.
- Untargeted methods
- Targeted methods
- Method development
- Open Access (i.e. use the instruments to do the analysis yourself).
Swedish NMR Centre provides access to state-of-the-art NMR instrumentation and methodology.
- Structural biology
- Metabolomics
- Chemical biology and small molecule NMR
- DNP NMR
Proteomics data
Please find below a selection of SciLifeLab Proteomics services
Global Proteomics and Proteogenomics
Global Proteomics and Proteogenomics offers proteomics information combined with sample specific genomic and transcriptomics information.
- Personalized proteomics
- Disease state/Variant proteomics
- Unbiased proteogenomics in any species with a sequenced genome
- XenoProteomics
- Meta proteomics
- Plasma proteomics
Chemical proteomics is a national unit expert on supporting drug discovery and development by proteome-wide deconvolution of targets and action mechanisms of small molecules.
Biological Mass Spectrometry (BioMS) national infrastructure enables cutting-edge mass spectrometry and related advanced technology platforms.
- Clinical proteomics
- Chemical proteomics
- Protein structure analysis
- Lipidomics
- Glycomics & glycoproteomics
- Large scale quantitative proteomics
- Proteogenomics
Storage
The PI, and his/her academic institution are ultimately responsible for the data, and ensuring that all data is backed-up is essential. The 3-2-1 rule of thumb means that there should be 3 copies of the data, on 2 different types of media, and 1 of the copies at different physical location. This means that even if all the projects research inputs and outputs are located at a backed-up resource, a (third) copy of the data should be maintained.
At least essential data, such as raw data and other data that may be difficult or even impossible to recreate in case of corruption or loss, should be copied off-site (using e.g. SciLifeLab FAIR Storage or storage provided by the institute).
Consider uploading the raw data to a repository already when receiving them, under an embargo (if it is important that the data remains private during the project). This way there is always an off-site backup with the added benefit of making the data sharing phase more efficient.
Resources
Please find below resources concerning the research data life cycle phase collect in form of training, guidance and/or tools.
Training resources
Guiding resources
- Infectious Diseases Toolkit on Data description
- Infectious Diseases Toolkit on Quality control
- RDMkit on Collecting data
- RDMkit on Data organisation
- RDMkit on Data quality
- RDMkit on Data transfer
- RDMkit on Metadata
- Research Involving Human Data by Richelle Björvang
- SND on Research material with personal data
- Storage resources available for researchers in Sweden