Skip to content

Latest commit

 

History

History
49 lines (31 loc) · 5.99 KB

project_overview.md

File metadata and controls

49 lines (31 loc) · 5.99 KB
title subsection-title layout theme show_heading
Overview of the Ed-DaSH project
The motivation behind: addressing unmet data-skills needs
page
Project overview
false

Main project (Feb 2021 - Feb 2023)

Executive summary

We aim to develop and deliver workshops that help health and bioscience researchers – in academia, industry and society as a whole – to be both competent and confident in working with their data.

Project overview

Biological and medical research has changed radically in the last 30 years due to new technologies that measure thousands of different molecular components in cells at the same time. For example, it is now routine to measure entire human genomes, all the proteins making up living cells, or DNA from all the microbes in a sample of soil. All this generates huge amounts of data that come in different formats and often at different times and places. So, today all biological researchers need to be good at managing and analyzing data – it is no longer the remit of the specialist. This is not just a UK challenge: international studies show the same trend. The demand for data science training far outstrips supply.

Life science industries are equally dependent on bioscience and health data, across pharmaceutical manufacturers, diagnostic providers, vaccine developers, and agricultural and environmental service providers. New moves towards precision medicine (drugs tailored to the patient) and precision agriculture (tailoring crop management) depend on access to, and accurate interpretation of, high quality data. Many careers in industry, government and society need good data management and analysis skills.

Importantly, the public needs confidence in data and data intensive research – for trust in the scientific process and for harnessing the benefits of data for science and society. Data sharing (Open Access Data) between researchers is important for scientific progress, for all-important reproducibility, and to derive best value for investment in publicly funded research.

In such a data-intensive environment for bioscience and health, it’s important that everyone – whatever their career stage or role – can manage, analyse, store and share their data. This is what we hope to achieve through this project.

Programme aims

We have focused training on areas where we know there is a particular need among health and bioscience researchers.

  • Analyzing data – A good grounding in statistics is needed to analyze large and complex data sets, using modern methods such as machine learning.
  • Managing data – Driving an understanding of how to move data securely around virtual ‘storage’ spaces in ways that information can be retrieved.
  • Sharing data – Understanding and adhering to the FAIR principles{:target="_blank"} (Findable-Accessible-Interoperable-Reproducible) ensures open access to data.
  • Designing portable analysis – Writing complex analysis workflows in a manner that is easily transferred between different computing systems, so other researchers can use them too.

We will deliver these training workshops (online) using a well-established community platform called The Carpentries{:target="_blank"}. This is an inclusive open-access platform that trains people in data and coding skills and encourages learners to become first helpers, and then trainers, as their own expertise develops. Open-access teaching materials mean that small improvements can be suggested every time a workshop is delivered, leading to a constant improvement in quality. It also means that anyone with an internet connection can use the materials for self-study, so work put into developing materials has wider impact. Edinburgh has the largest Carpentries affiliate in the UK, which is keen to extend the reach of its training.

Our programme will help level-up data skills across the UK and develop a growing cohort of confident practitioners across all career stages and industries. This will help meet the growing demand for data-savvy health and bioscience researchers in academia and industry.

Extension (Dec 2023 - Mar 2024)

The Carpentries Laboratory peer review process increases the impact and legacy of the materials produced under the original grant. This ties in to widening delivery, as peer review incorporates “road testing” the materials via external instructor teams. The Carpentries infrastructure was upgraded over the duration of the original grant. Upgrading our workshop materials to the new infrastructure will improve their accessibility and navigation. The final stage of the peer review process is publication of lessons in the Journal of Open Source Education (JoSE), which will increase the reach and visibility of our work. We propose to upgrade and peer review 3 workshops: High-dimensional statistics with R, Workflows with Nextflow, and Workflows with Snakemake. All of these workshops are applicable to data science in fundamental biological research. Furthermore, the extra time will enable us to finish writing and publishing our proposed manuscript on lesson development, that will describe what we have learned from the Ed-DaSH project to aid future efforts in bioscience training.

The original FAIR in Biological Practice workshop was 2 full days of training, but in several teaching contexts and experiences, considerably less time was available. Teaching fewer, targeted topics was attempted, but this resulted in participants missing important information on data management and open science. We propose re-factoring the materials into a shorter schedule and testing delivery to local participants. This will require 1.0FTE over 4 months from the BioRDMS team.

High-dimensional statistics with R and Machine learning with Python (developed by a separate team) were highly popular workshops, and we have had multiple requests to deliver these workshops again. We propose remote delivery of a further instances of each workshop from Dec 2023 to Mar 2024.