A step towards making heart health screening accessible for billions with PPG signals

Unfortunately, there are few large datasets that pair PPG data with long-term cardiovascular outcomes. In order to get a statistically useful number of such outcomes in a general population, a dataset needs to be quite large, and typically should cover a span of 5–10 years. Recently, Biobanks have become a popular way to collect such paired longitudinal data for a wide-range of biomarkers and outcomes.

For our purposes, we made use of the UK Biobank, a large, de-identified biomedical dataset involving approximately 500,000 consented individuals from the UK, paired with a large number of long-term outcomes for heart attack, stroke, and related deaths. We use the subset of UK Biobank that contains PPG signals, filtered to participants aged 40–74 to better mirror previous studies on predicting cardiovascular disease. This results in around 200,000 participants, which we then split into training, validation and test sets.

Our method operates in two stages. We first build generally useful representations (model embeddings) of PPGs by training a 1D-ResNet18 model to predict multiple attributes of an individual (e.g., age, sex, BMI, hypertension status, etc) using only the PPG signal. We then employ the resulting embeddings and associated metadata as features of a survival model for predicting 10-year incidence of major adverse cardiac events. The survival model is a Cox proportional hazards model, which is often used to study long term outcomes when individuals may be lost to follow up, and is also common in estimating disease risk.

We compare this method to several baselines that estimate risk scores while including additional signals like blood pressure and BMI. We find that our PPG embeddings can provide predictions with comparable accuracy without relying on these additional signals. One standard way to evaluate the overall value of a survival model is the concordance index (C-index). On this metric, we show that a survival model using age, sex, BMI, smoking status and systolic blood pressure has a C-index of 70.9%, and a survival model that replaces BMI + systolic blood pressure with our easily obtainable PPG features has a C-index of 71.1% and passes a statistical non-inferiority test.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top