The Importance of Integrating SDoH Data with Medical Claims-Based Machine Learning Models

March 6, 2023

The World Health Organization (WHO) defines Social Determinants of Health (SDoH) as “conditions in which people are born, grow, work, live, and age and the wider set of forces and systems shaping the conditions of daily life.” These circumstances are often shaped by money, power, and resources at global, national, and local levels. 

According to recent research studies, health behaviors, social and economic factors, and physical environment influence 80-90% of patient health outcomes, while clinical care accounts for only 10-20%.

This information has become widely understood and recognized over the past few years and health plans and providers have started looking for new ways to address SDoH and its significant effect on individual health outcomes. Still, SDoH data needs to be more utilized. 

From 2017 to 2021, the six top payers in the U.S. spent less than 1% of their net income on SDoH efforts. One way to increase these efforts without a significant financial investment is by incorporating SDoH data into machine learning models.

Why Should We Incorporate SDoH Into Machine Learning Models

The greatest benefit of incorporating SDoH data into machine learning models is that we can help health plans and providers reduce their populations’ likelihood of developing serious, chronic conditions. It presents the possibility of identifying effective interventions for addressing SDoH, boosting model performance, and evaluating fairness/reducing bias in healthcare predictive models. 

The main sources for SDoH data include government and research institutions, surveys, and individual diagnoses. This data can be represented explicitly, such as specific features to describe different characteristics of patients, including but not limited to income, access to healthcare, and living conditions.

When incorporating SDoH into Diagnostic Robotics’ machine learning models, we use composite measures which take multiple SDoH characteristics into account and reduce them into a single ranking metric. 

Composite measures usually represent the socioeconomic status of a specific geographical area and can be weighted by their effects on health outcomes. We organize this data in a way that encapsulates SDoH information across groups of people, which may help reduce machine learning bias.

SDoH data can be especially useful in machine learning models that predict individual health outcomes, such as congestive heart failure (CHF) visits. 

Using SDoH to Predict & Prevent CHF 

Traditionally, machine learning models that target CHF prioritize major population risk drivers such as demographics, past utilization, and comorbidities. Diagnostic Robotics takes it one step further by adding income, education, and housing data to boost model performance.

In a case study, we compared a baseline machine learning model for CHF to its SDoH integrated model based on the comorbidity index and found that for patients with high comorbidity index (i.e. diagnosed with many chronic conditions), the SDoH data caused an increase in their risk scores. This means that risk scores of patients with multiple chronic conditions are most affected by social deprivation measures and they tend to have more avoidable emergency department visits.

Our research also showcased that most sick patients with lower economic status receive a higher risk score after integrating SDoH information and are more prone to chronic condition ER encounters. The more surprising finding though was where the integration of SDoH data changed their risk scores in an opposite direction of their comorbidity index while creating more accurate predictions. See the two patient examples below:

Patient 1 - Male, age 41. Relatively young and healthy, without chronic conditions. After integrating his SDoH information, the risk of chronic illness hospitalizations increased by 50%. This is due to his relatively low socioeconomic status, which indicates deprivation of healthcare services. When examining his future health outcomes, this patient incurred chronic condition-related INP and ER encounters, incurring high costs.

Patient 2 - Female, age 84. Diagnosed with 9 different chronic conditions. Given her high socio-economic status, her probability of preventable INPs decreased by 30%.

This doesn't mean that these patients won’t deteriorate but rather that the exacerbation won't necessarily cause preventable INP or ER utilization, possibly since the alternative, less costly means of treatment are more accessible to higher economic status populations.

What’s Next for SDoH? 

We’ve made significant strides to prove the benefits of using SDoH in machine learning. However, there is still a significant amount of work to do and buy-in to garner from payers and providers to incorporate this into their existing population health management efforts. Our team is actively looking for more SDoH data that we can highlight in our models and we hope that the healthcare industry starts seeing this practice as necessary to achieve the best possible health outcomes among patients. 

I’m optimistic that using SDoH data will help remove the overall socioeconomic bias within today’s healthcare system and give patients fairer access to the care they need. 

About the Author: 

Tal Geller is a Data Science Team Lead at Diagnostic Robotics. He is a graduate of Tel-Aviv University, where he focused his research on machine learning for healthcare applications in cooperation with Stanford University. Tal also served in the Israeli Intelligence Corps prior to finishing his studies. 

Lower Care Costs and Improve Outcomes with Intelligent Care Journeys

Contact us at or fill out the form below

Thank you!

Lorem ipsum about this text that we will decide later on.

Something went wrong...