Collecting
During this phase all necessary data to be analysed in the project is collected, either by generating new datasets or by reusing earlier collected datasets. This phase lays the foundation of the quality of both the data and the accompanying documentation. Hence, it is important that quality measures are implemented and that all steps of collection is appropriately recorded. During this phase possibly large amounts of data might need to be transferred between data producers, compute facilities and storage facilities.
Learn more about data reuse
Learn more about data transfer
Documentation
Data documentation should clearly describe how the data was collected, so that someone else can understand and correctly interpret the data. Documentations can be keept as a README file, both on a higher project-level or on a more detailed level. Make use of electronic lab notebooks (often offered by the university / institute) and metadata standards, and name and organise the files produced appropriately, see e.g. ’A Quick Guide to Organizing Computational Biology Projects’ for advice.
Learn more about README files
Learn more about metadata standards
Data producers
SciLifeLab provides access to a range of pioneering technologies in molecular biosciences. Please find below links to data generating SciLifeLab services.
Cellular and Molecular Imaging
Storage
The PI, and his/her academic institution are ultimately responsible for the data, and ensuring that all data is backed-up is essential. The 3-2-1 rule of thumb means that there should be 3 copies of the data, on 2 different types of media, and 1 of the copies at different physical location. This means that even if all the projects research inputs and outputs are located at a backed-up resource, a (third) copy of the data should be maintained.
At least essential data, such as raw data and other data that may be difficult or even impossible to recreate in case of corruption or loss, should be copied off-site (using e.g. SciLifeLab FAIR Storage or storage provided by the institute).
Consider uploading the raw data to a repository already when receiving them, under an embargo (if it is important that the data remains private during the project). This way there is always an off-site backup with the added benefit of making the data sharing phase more efficient.
Learn more about SciLifeLab FAIR Storage
Learn more about data preserving
Resources
Please find below resources concerning the research data life cycle phase collect in form of training, guidance and/or tools.
Training resources
Guiding resources
- A Quick Guide to Organizing Computational Biology Projects
- Handbook for data containing personal information
- Infectious Diseases Toolkit on Data description
- Infectious Diseases Toolkit on Quality control
- RDMkit on Collecting data
- RDMkit on Data organisation
- RDMkit on Data quality
- RDMkit on Data transfer
- RDMkit on Documentation and metadata
- Research Involving Human Data by Richelle Björvang