Pandata Blog

AI design and development for high risk industries

healthcare AI

Healthcare Startup Uses ML to Predict Missing Patient Data

At Thrivable, a market research platform that connect patients and healthcare companies to create better products and services, quality data is at the foundation of all business operations.

Yet over the years, Thrivable began noticing a growing number of patient profiles with missing information. As profiles with data gaps continued to rise, Thrivable recognized a need for data science and machine learning to alleviate the issue, ultimately helping the organization more effectively match companies with targeted patients. For strategic support with these ML initiatives, Thrivable partnered with Pandata.

Assessing the Depth of Missing Patient Information

Thrivable has a robust database of patients, primarily focused on diabetes-related information. The platform collects countless survey answers each day, and while some patients provide complete responses, some surveys are submitted with empty data fields.

Because of this, the platform had 100,000 profiles with various degrees of missing information, ranging from incomplete survey answers to rare disease profiles. As the platform grows, additional diseases and questions will continue to be added, leading to increasing amounts of incomplete data.

To impute missing data and prioritize profiles at this volume, Thrivable required an ML solution its team could trust.

Narrowing the AI Use Case

Before any design took place, Pandata and Thrivable narrowed the core focus of the project: Attempt to impute the missing values within several key survey fields.

Data size, the quality of the data, and the volume of labeled data impacted the teams’ decision for the project—without enough quality training data, it would be impossible to build a predictive model for missing responses.

Developing and Building the Right Predictive Model  

After devising and prototyping a solution for the missing panelist responses, Pandata helped the Thrivable team configure and navigate the required cloud infrastructure: Snowflake and H2O Driverless AI.

By utilizing Snowflake’s cloud storage and its integration with H2O, Pandata was able to significantly reduce the time spent preparing data for modeling, training models, iterating, and optimizing. AutoML tools like Driverless AI enable rapid model prototyping, reducing the data science workload and accelerating time to value.

“You can’t just go buy these tools. They are so prohibitively expensive for a small company. They require so much expertise to set up and use, that if we were starting from scratch, we would not be able to manage this.” — David Edelman, Thrivable CEO/Founder

Using these platforms, Pandata was able to recommend the most suitable model for Thrivable’s problem, allowing the team to immediately test and deploy it for current fields and predictions.

New Data Predictions Improve Outreach Efforts

With the developed model, Pandata validated this approach by initially applying it to 10 challenging fields in Thrivable’s patient panels where up to 70% of panelists did not provide a response. Focusing on these fields, the team successfully trained ML models to predict missing values, including a model that can accurately predict 82% of panelist health conditions.

Rather than reaching out to all 100,000 panelists with missing responses, Thrivable can now perform more selective and cost-effective outreach to panelists with predicted values. ​​This will translate into more panelists eligible for research studies, greater confidence in being able to field new studies, and more opportunities for study improvement and expansion. With organizations regularly reaching out to Thrivable to request new studies, this presents the opportunity to service over $1 million in additional business per year and cost savings of over $70,000 in recruitment expenses.

In the future, Thrivable could also present the predicted data to interested companies. For example, if a company is interested in people who are using a particular blood glucose monitoring system, Thrivable can present them with two options: (1) a survey with partial information or (2) a set of ML-predicted potential responses that could be valuable.

Educating and Empowering Model Users

To ensure that Thrivable can accurately draw conclusions from the predictive models, make changes, and deploy new models in the future, Pandata equipped the team with robust education and training resources.

A greater understanding of how the models work ultimately leads to the responsible use of AI.

Looking for a Similar Solution?

Our data scientists are happy to strategize how responsible AI can make a positive impact on your organization. If you’re ready to strategize what AI can do for your company, let’s set up a time to chat.

Contributors: Bob Wood, Data Science Consultant at Pandata & Stevan Zlojutro, Senior Data Analyst at Thrivable.