Data Science: with a focus on healthcare

This course offers an introductory exploration of the core principles of data science. Through real-world examples, we will explore the inherent value of information and its potential. We will delve into the realm of big data, focusing on its significance within the context of healthcare data. The course will cover the latest advancements in data science approaches, as well as innovations in information storage and processing. You will be introduced to fundamental terminologies in the field, empowering you to engage effectively in data science environments. We will follow a journey from the foundational concepts to more advanced levels. No specific background is required or assumed.

Container

Course content

Aims

This course aims to:

• introduce you to fundamentals of health data science and the latest advancement of this field

• provide you with a robust understanding of terminology used in the field and the distinctive governance required for protection of individuals’ health records while enabling research

• enable you to gain an understanding of working with programming approaches to generate insights from health data

Content

This course provides a comprehensive exploration of the principles of data science within the context of global healthcare systems, commencing with foundational concepts such as healthcare systems, health data at scale, health research and the pivotal role of data-driven decision-making in enhancing patient outcomes. Participants will engage in an in-depth examination of ensuring secure access to patient data and adherence to governance standards, employing methodologies such as the five-safes framework and Trusted Research Environments (TRE). Through practical sessions, they will acquire proficiency in navigating data exploration, data access requirements, feasibility assessments, and the development of study protocols. The course also provides a hands-on project focusing on fundamentals of exploratory data analysis, followed by guidance on reproducible project management practices.

The course will begin with an exploration of the fundamentals of data science in various types of healthcare systems across the globe. We will look at the concepts of designing a medical study, the importance of data-driven decision-making in improving patient outcomes and reproducible approaches in data science.

Next, we will develop an understanding of how to ensure safe access to patient data and adherence to governance requirements are supplemented by the following: principles of five-safes, working in the Trusted Research Environments (TRE), steps of project development in a TRE, essential tools and software used in a health data science project.

The third session will focus on the requirements for a health data science study. We will look at data exploration and data access requirements, metadata availability, feasibility assessment and study protocols.

The fourth session is an interactive coding session, focused on development of analytical pipelines for research, with showcasing stages involved and interactive coding session review.
We will look at data wrangling stages, data visualisation and statistical analysis.

Building on the previous session, the focus on the last day is to ensure all code syntax and resources generated for the small project are saved in a reproducible and accessible manner.
We will review recommendations for generating reproducible pipeline in an agile development environment.

Presentation of the course

The course will take place in a classroom setting using interactive presentation tools to aid with demos of technical methods. It is highly recommended that you bring your personal laptop or IT equipment to be able to follow some of the live voting and technical experiences with tools. Students will be encouraged to contribute to discussions in the classroom by offering opinions, experiences and observations.

Course sessions

1. Introduction to Data Science in Healthcare
The fundamentals of data science in various types of healthcare systems across the globe. Concepts of designing a medical study. The importance of data-driven decision-making in improving patient outcomes and reproducible approaches in data science.

2. Trusted Research Environment
Developing an understanding of safe access to patient data and governance requirements. Principles of five-safes, working in the Trusted Research Environments (TRE), steps of project development in a TRE, essential tools and software used in a health data science project.

3. Health Data Science study design
Requirements for a health data science study. Data exploration and data access requirements, metadata availability, feasibility assessment and study protocols.

4. Interactive project development
Live coding observation focused on development of research pipelines. Starting with exploratory data analysis to real-world examples of healthcare data science projects.

5. Analysis pipeline development
Ensuring all code syntax and resources for the project are saved in a reproducible and accessible manner and that the research pipeline is adaptive in an agile research development environment.

Learning outcomes

You are expected to gain from this series of classroom sessions a greater understanding of the subject and of the core issues and arguments central to the course.

The learning outcomes for this course are:

• to gain an understanding of the principles of designing medical studies, data access requirements and the governance for ethical use of patient data in trusted research environments

• to develop structural and computational thinking skills required for application of theoretical knowledge to real-world scenarios, through a hands-on practical project in exploratory data analysis, including assessing data completeness, utilising data visualisation techniques, and performing feasibility assessments and correlation analysis

APPLY NOW

Course dates

27 Jul 2025 to 02 Aug 2025

Course duration

1 week

Apply by

29 Jun 2025

Course director

Dr Fatemeh Torabi

Academic Directors, Course Directors and Tutors are subject to change, when necessary.

Venue

International Summer Programme

Sidgwick Site

Cambridge

United Kingdom

Qualifications / Credits

Credits dependent on home institution

Teaching sessions

Meetings: 5

Course code

W35Pm33