Building Cohorts with AI

Harnessing the power of LLMs to unlock data for non-technical users

Summary:

Greenfield AI-powered product aimed at tackling cohort building
Cohort building is important because it’s time intensive and requires multiple team members with varying types of clinical and technical expertise to collaborate.
If we can figure out a way to make this workflow more efficient without losing accuracy, we will have unlocked a major operational win that affects all the stages from sales through data delivery in the company’s primary revenue stream

Background

Cohort building is one of the most fundamental workflows at Flatiron Health. It’s the process by which we work with our clients to identify patients of interest in our data. We start with a rough estimate at the start of the sales process, just enough to make a go/no-go call on whether we have the right data to support a study, and end up with a refined cohort of patients with all of the relevant data all cleaned up by the time that we deliver the data, with many steps in between. The types of cohort building are:

  
    
    Pre/Early Scoping
(customer interest)
    Detailed Scoping
(contract negotiation)
    Execution
(final deliverable)
  

    Expected Turnaround Time
    3–5 days per round
    Varies, but generally several weeks
    <1 week
  

    Complexity of Criteria
    Low–Medium
    Medium–High
    Highest
  

    Margin for Error
    High (10–20%) — directional counts to inform if an opportunity is worth pursuing
    Low (<5%) — high degree of confidence required to commit to SOW
    Zero — inaccurate deliverables incur redelivery costs and reputational damage
  

As you can imagine, this is a problem we’ve tried to solve many ways at Flatiron since it’s so core to our business. With the rise of LLMs though, there was suddenly a new potential way to unlock a lot of efficiencies in what has been a very labor-intensive process. Other attempts have fallen short because it’s been hard to solve the problem of how to translate a clinical need into a technical query that maps to Flatiron’s data models. Could LLMs help us make that translation without having to rely on a back and forth process between a panel of technical experts? We wanted to find out the answer cheaply and quickly before investing in a longer term solution.

Phases of User Research and Product Design

18% of deals are lost for self-inflicted reasons
12 weeks to scope a project
9 functions to assess feasibility

I wanted to figure out many things from user research and I wanted to figure them out quickly while staying one step ahead of the engineering team. In a few months, I went through several stages of research and rapid prototyping, and below is a list of those stages and what I wanted to find out from each:

Discovery

What part of cohort building should we tackle and who should our primary user be?

Process:

Interviewed a wide spread of people across 5-6 departments who all did some sort of cohort building
Wanted to understand who had the most acute pain points and who we had the greatest potential to help with an LLM solution, recognizing its strengths (language processing, ability to learn from all of our internal documents) and weaknesses (precision) as a technology

Findings:

Early scoping had the greatest potential as a beneficiary of this type of tool because it required the least precise cohort and the pain of the current process is acute. It currently takes 12 weeks to scope a project, and a big part of that is the amount of back and forth required to answer the question of what kind of cohort we could offer a client.

Exploratory

What shape should our solution take and what pain points and details about the existing workflows should we understand to be successful?

Process

Focused on 2 key roles involved in early scoping
Dug into their current workflows to understand the nuances of what’s difficult about current processes
Tested rough prototype to gauge user confidence in trusting an AI tool and how chat might play with other types of data visualization

Findings:

Validation

How confident can we be that our sales team can independently use this tool and significantly cut down on the back and forth currently needed to develop an initial cohort?

Process

Because the engineers were still working on building the product to a testable degree, we needed to find a creative solution for testing the riskiest parts of our proposed solution without having to wait for the full product to be built
We decided to test out a spreadsheet filled with realistic dialogue that mimicked how our LLM would talk to our sales team and the questions it would ask of the user in order to get to a useable cohort definition

Findings:

The sales team could answer 90% of the follow-up questions from our LLM independently and with confidence
The remaining 10% depended on their level of experience, but even the least experienced knew who to ask or where to look in order to get the answer
Therefore, we felt confident in our direction and proceeded to go forward with this chat model

Current Phase

We are now working on releasing the MVP for an initial pilot with our sales team. We’ve cut down on some of the initial features to focus on features that will allow us to test whether accurate questions and accurate answers are both being generated by pilot users. After that, we plan on adding in more advanced data visualizations around stratification of patient characteristics, showing survival curves, and Sankey diagrams