Collecting

During this phase all necessary data to be analysed in the project is collected, either by generating new datasets or by reusing earlier collected datasets. This phase lays the foundation of the quality of both the data and the accompanying documentation. Hence, it is important that quality measures are implemented and that all steps of collection is appropriately recorded. During this phase possibly large amounts of data might need to be transferred between data producers, compute facilities and storage facilities.

Learn more about data reuse

Learn more about data transfer

Documentation

Data documentation should clearly describe how the data was collected, so that someone else can understand and correctly interpret the data. Documentations can be keept as a README file, both on a higher project-level or on a more detailed level. Make use of electronic lab notebooks (often offered by the university / institute) and metadata standards, and name and organise the files produced appropriately, see e.g. ’A Quick Guide to Organizing Computational Biology Projects’ for advice.

Learn more about README files

Learn more about metadata standards

Data producers

SciLifeLab provides access to a range of pioneering technologies in molecular biosciences. Please find below links to data generating SciLifeLab services.

Genomics

Proteomics

Metabolomics and Exposomics

Spatial Omics

Cellular and Molecular Imaging

Structural Biology

Storage

The PI, and his/her academic institution are ultimately responsible for the data, and ensuring that all data is backed-up is essential. The 3-2-1 rule of thumb means that there should be 3 copies of the data, on 2 different types of media, and 1 of the copies at different physical location. This means that even if all the projects research inputs and outputs are located at a backed-up resource, a (third) copy of the data should be maintained.

At least essential data, such as raw data and other data that may be difficult or even impossible to recreate in case of corruption or loss, should be copied off-site (using e.g. SciLifeLab FAIR Storage or storage provided by the institute).

Consider uploading the raw data to a repository already when receiving them, under an embargo (if it is important that the data remains private during the project). This way there is always an off-site backup with the added benefit of making the data sharing phase more efficient.

Learn more about SciLifeLab FAIR Storage

Learn more about data preserving

Learn more about data sharing