Home Podcasts Current

How Data Can Revolutionize Healthcare Research and Innovation

Healthcare Rethink - Episode 39

Healthcare Rethink’s Brian Urban examines data's impact on revolutionizing healthcare research and innovation with his guest Shubh Sinha, the CEO and Cofounder of Integral, a company at the forefront of addressing data compliance challenges in healthcare. Their conversation navigates the intricacies of data management, compliance and how these elements play into healthcare innovation.

Episode Transcript

Brian Urban:

Yes, this is the Healthcare Rethink podcast. I'm your host, Brian Urban, and today we are talking all things data, data, data. We have co-founder and CEO of Integral Shubh Sinha. Shubh, welcome to our little show.

Shubh Sinha:

Yeah, thank you for having me. I'm looking forward to chatting today.

Brian Urban:

This is going to be really exciting. We've gotten to know each other a little bit before the episode here, and I'm really excited to get into your organization that you've co-founded here. It seems like a production out of Purdue University. Maybe your learnings there and your influence across the ecosystem as well. But with every episode we first got to get into who you are and how this organization became real and some of the impacts you're looking to make as well. So take us back before you have the CEO title. Who's Shubh and how did you find yourself founding a healthcare technology data organization?

Shubh Sinha:

Yeah, for sure. It's been a while since I thought about the early days, so I'm looking forward to talking about it. So my name is Shubh, as you mentioned. I'm from the Nashville area originally, so a suburb of Nashville, about 30 minutes south, so pretty typical smaller suburban town. And then my story goes from there to Purdue where I really got my first exposure to software engineering, programming, all things data, I would say. I became pretty interested in large data systems. Think about applications like streaming interfaces like Netflix, Hulu, all these things. They're very fun. But also I personally started seeing, "Oh, these are cultural snapshots in a way just captured in data formats." And so that's where a lot of that nerd behavior came out and then started working as a software engineer.

All the meanwhile, I would say the only other part of my life that probably got as much importance as tech was soccer. And so that's a big part of me as well. Indoor, outdoor, just anything I can make time for it these days. But overall then made my way out to the bay for a couple of years. So I was living in SSF for about two and a half years working at this data company called LiveRamp. LiveRamp is a public ad tech company, and the reason I call them a data company is because even though they're enabling targeted ads, like most online ads you've ever seen are probably at some point have been through the LiveRamp infrastructure. The core product was actually helping massive retailers like Walmart, Target, really any kind of big company you can think of leverage all of their data sets.

So let's take a simple use case of Walmart has their app data, they have their in-store data, they have online consumer data sets they can buy. They put all that together such that they can get a 360 degree view of people like you and me, and then they use that for better inventory ordering, better optimization around supply chain, whatever you want. The example I always go with is if there's a bunch of Snickers eaters in a particular town, Walmart can understand that scale and order more Snickers to that particular location. And that's the most benign example I would say. But I was over there for a while and I was actually on this Google X like team. So I was on the side doing a bunch of experiments with the core tech in different areas. So basically how do you apply this core data technology to different areas?

And healthcare was one area that caught fire. I joined kind of in the early days of when they were forming their healthcare business. And idea was the same help the Pfizers and the Modernas of the world as well as small health tech companies take all this very sensitive healthcare data. So like our diseases, our history of surgeries, our really any kind of medical data you can get your hands on and pair that with these other rich data sets so that you can find out how to get medications to people in a much more custom way or get awareness of certain medications.

Because as you might imagine, your non-medical profile really heavily influences your medical profile. And so for a data problem that's pretty sizable, and so I discovered a certain amount of HIPAA compliance problems and whatnot that led me to Integral and starting it and happy to dive more there, but I would say that's a lot of my background where I noticed something in another industry and then I noticed something in healthcare that was just not possible. And so I wanted to bridge the gap and that's what Integral is an attempt to do.

Brian Urban:

And I love that background. It seems like it's almost a emerging trend in a couple ways. One of a public health perspective, which public health is often shoved in the research corner and not brought to the forefront of saying, "Hey, there's a lot of nonclinical impacts that are affecting someone's health outcome, health status, access to care," things like that that you touched upon. But also something you said kind of makes me think of the future versions of artificial e-commerce almost knowing who's wanting a Snickers bar in your example and when in that order showing up at your door in your car or something like that. It's so funny. We'll talk about future things in a moment, but love your background because it speaks to where healthcare needs to go in terms of seeing the person first and then the patient. And with healthcare and Integral in particular, it seems like we're finally adopting these nonclinical data workflows and getting them integrated.

So to improve some of the speed and privacy of blending new data types in healthcare. I think a lot of organizations look to encryption or tokenization where it can make it easier for data exchange to happen and for healthcare to start to see some of those nonclinical things like you were mentioning. I was thinking of Integral in that context almost, but I want to get our audience more familiar with Integral and some of the work you're doing. So do you play in that space? Are you touching that today? Get us up to speed on some of the work Integral is doing.

Shubh Sinha:

For sure and happy to dive into Integral and our mission and what we've done and where we want to go. The quick example I like to just serve as a North star for Integral is derived from my experience at LiveRamp. So what I noticed being on just one side of the house and a particularly small part of this business segments at LiveRamp was we were helping Pfizer, Moderna, et cetera, basically operate just like Walmart In that one example I made except instead of Snickers bars, it's medications. And so the crux of the problem I found was that I was a product manager and I was talking to other product managers from all across the company. I was like, you get to analyze data in hours to days and we have to wait weeks to months because everybody wants to create the same master dataset, whether it's Walmart or Pfizer.

The difference is Pfizer is taking heavily regulated healthcare data. So take a set of pharmacy claims, for example, the medications that people have been prescribed, if Pfizer wants to combine that with a social demographic dataset to understand, "Hey, what kind of population is using a particular kind of medicine such that we can either raise awareness of it faster or we can distribute it in a more bespoke way. Because there's a difference between how I get my medication in Smyrna, Tennessee, where I'm from versus Manhattan where I reside today, Pfizer. And there's a very legitimate business decision to make. Pfizer has to wait 13 or 14 weeks to go through data compliance processes. And where that comes from is that HIPAA states that if you're going to combine all these sensitive de-identified healthcare data assets together with any other type of data asset, you have to go through a HIPAA approval on that specific dataset.

The way I like to think about it is Pfizer has to ask permission to run a query basically, whereas a Walmart doesn't, they can just go out and buy their data sets and do whatever they need, whereas Pfizer has to wait. And while it's a standard in the healthcare industry to wait long times for compliance processes, commercial quarters, delivery of medications, that stuff doesn't really understand compliance. You still have to have real world decisions with real world timelines. You can't just wait on the compliance piece. And so that core perspective is what drove me to start Integral, where Integral reads data, so sensitive healthcare data assets for example, programmatically assesses those data sets through a number of privacy models that we've baked in, and then it flags the problematic parts of a dataset via a UI. And those problematic parts are defined as noncompliant columns basically.

And it lets a data scientist just quickly customize a dataset to their liking such that a data scientist can take the business context that's in their head and the business goals they want to satisfy and then quickly experiment with different cuts of the data to understand what is compliant and what does not. And we have a privacy score that in real time goes up and down and really shows you, "Hey, am I okay or am I not okay to analyze this data set?" And so once you're okay below a certain threshold, we basically surface this go button for you or this finish button for you where you can hit the button and we'll spit out a clean remediated compliant dataset. And we'll also give you all of the documentation you need such that compliance departments and business departments have no friction.

Brian Urban:

I love that you touched on friction, you ended with that. Frictionless is a theme that is going across the healthcare ecosystem, whether it's a revenue management, whether it's data exchange, whether it's medication access, everything you can think of in terms of creating a more of a seamless process for a lot of different things. So I'm glad you walked us through that because that gives a very real example of what Integral is doing today. I actually wanted to take it one step further and talk about something that you had recently posted in your blog about Robin AI. This is a new project that you all have underway, I believe, and still ongoing. Can you get us up to speed about this? Because I think this is where a lot of the real things you're doing is going to showcase not only the challenges we have with what you're saying in terms of compliance and speed and finding these risk indicators and how do you remediate them from a data perspective. So tell us about Robin AI. This is an exciting project.

Shubh Sinha:

Yeah, for sure. So Robin AI, it's kind of like this side product that we launched because it was an internal tool for a while. And so just kind of jumping back to how we are as a company, Integral is an automated compliance company, but I also think about us as a data company because really what we're powering is extremely rich, insightful, compliant data connections. And so internally we have any number of data optimizations happening with our internal models. So whether that's our internal reference data sets, whether that's our code reviews, whether that's our technical architecture, we noticed we want that continuously improving and as autonomously as possible. And so with the AI wave going on and with my co-founder and CTO being as brilliant as he is, he leveraged a lot of what was coming out to build Robin AI, which enhances a lot of our technical work internally, whether that's the models we use, whether it's the code we push all the way down to even the types of data sets that we can quickly analyze and some stuff that needs custom work.

Robin AI is an AI agent of sorts that helps us increase our technical proficiency, which as a result gets the clients faster results, higher fidelity results. And so the idea is let's improve as much as we can internally with regards to our data systems and our data processing so that we can produce faster results, which I would say Integral's main value is speed and maybe a secondary value is quality of data as well, just because today it typically takes 13 or 14 weeks to accomplish the same thing we do in hours to days because today it's primarily consultant driven.

And so anyway, back to Robin AI, it was an internal tool that we used for a while to speed up processes and increase the fidelity of processes. And we had the idea why not open it up and let organizations use it. And so while it's not the main product that we power, I do think it speaks to the ethos of the company that we're continuously improving, continuously providing infrastructure for other companies to be able to build on and really increase the fidelity of whatever their data operations are. And so hopefully that provided a little bit of insight to Robin.

Brian Urban:

I think it's really interesting and two things that you said, Shubh, one being agent and two autonomous. You make me think of RPA robotic process automation. And I think being able to have something that can continue to learn, take labor away from humans that provides more accuracy and speed is a win in so many different ways, especially when we're talking about compliance. And on that I'm looking across the healthcare ecosystem and there's just loads of different lawsuits, loads of different hurdles and barriers for all kinds of creative things that need to be put to market. And I think about privacy being one of the biggest challenges and one of the biggest things with interoperability, and we were talking to Dr. Lane recently, the chief medical officer of Health Gorilla, and he had gotten into privacy risks. So I want to get your perspective on privacy risks in our healthcare ecosystem today. Do you have any that come to mind? Is there a top three list of privacy risks that you are seeing emerging or that are just staples today?

Shubh Sinha:

So I think about privacy risks in two ways. One, you have the very tactical risk elements of some companies have infrastructure that may be outdated and is more susceptible to attack or more susceptible to leakage. And so those are a bit more of the very quantitative, like, Hey, your system needs an upgrade or you need to switch into more compliance systems, or you're trying to keep up with modern tooling, but the speed of innovation outpaces the speed of compliance. So there's a couple of those challenges that I think are just kind of head on. You can explicitly see the other challenges I see that are bigger in the system and probably span out a bunch of sub challenges are the human behavior challenges. And so what I mean by that is these massive massive companies or even small healthcare startups, everybody wants to have the best data, the best analysis, the best ETL tool, whatever that product might be, they want to provide maximum value to their customers.

And today, privacy and compliance, and I'll use those together for the sake of this conversation, those are seen as blockers, not enablers. And I think that's a human behavioral challenge in this space, meaning there's a lot of reasons to be cautious around working with healthcare data. That being said, the value of healthcare data is this treasure trove of insights that you can't find anywhere else. And so what part of our cultural values and just I would say even mission as a company is treat privacy as an enabler to build trust and treat privacy as this sort of mechanism to further your work, not block it. And so traditionally with solutions like consultants who are the main alternative to Integral, privacy has always been this long process that you have to throw multiple internal resources at. And oftentimes the resulting data set is, or in my case, the resulting data set isn't so great. And so privacy resulted not only in long processing times, but also the end result of what you get doesn't really get you what you need.

Whereas what we've done is turned privacy and compliance into an infrastructure to the degree that it's almost an afterthought because when you're using Integral, you're getting a compliance output. So you're really focused on the dataset, the end business goal. Whatever, is basically outside compliance concerns that's what you focus on. And I think that's where we've noticed when we solve that challenge, we tend to unlock a lot of value and solve a bunch of the smaller challenges. And so that's how we think about compliance and privacy risk on our side where if you turn it into an infrastructure and if you transform it from this stop gap to this enabler, you tend to produce a lot of value very quickly because that means compliance departments are happy, the people are happy, their data is being used very well, and then the business people who wanted to, or the business stakeholders who wanted to maximize the usage of the data they just spent millions of dollars on, they're happy as well. Win-win-win is possible.

Brian Urban:

It's always a good thing to happen is a win-win-win. I like what you mentioned a few statements ago, which was the end business goal or end business objective. And I think a lot of healthcare data in terms of regulation and compliance is so tightly wrapped and we think about nonclinical data, that being of orientation to finance credit, socioeconomic data, a lot of that comes down to permissible use. And I don't think that it all connects into the healthcare side. But in terms of ingesting nonclinical data into a clinical workflow to show more of a person's challenges and how a healthcare institution or research institution should think about these and address these when we're thinking about advancing health equity is a whole nother game.

But making sure that it is identified at the match for an individual and it's still within the right use is very important. So I'm thinking about Integral in terms of data quality, patient privacy. A lot of your tech does that simultaneously, I think about speed. But is there anything else that you can tell me why this would matter to the healthcare ecosystem, having data quality and privacy at the patient level synthesize simultaneously? Is there more than just speed?

Shubh Sinha:

Certainly. Just thinking about privacy and data quality, I was taught in my foray into healthcare that those two cannot be maximized together. And for me, coming from a technical and business background, I realized you don't need everything you think you need. You only need the key variables or the key aspects. There's always a power of law of sorts where some amount of a dataset, for example, gets you most of your learnings. So it was that insight that inspired Integral, and that's why I wanted to venture into Integral because I realized if I can power this for experimentation across the board, so the Pfizers and the Moderna that we all know as well as the aspiring health tech companies that seek to provide better services, better products, even new drugs, that has an outsized effect. Because once you automate the question of "Can I do this," you get a lot of learnings on what are the results of doing this over the course of many companies, many years, many workflows.

And so that's where, to me, data quality and privacy have never been in odds because it's all a matter of making sure you customize with a specific use case and you enable that to happen as quickly as possible so that you can try many use cases, you can try many approaches. I think that's been a fundamental difference in healthcare versus other industries where in healthcare, you've had to pick and choose your shots in a way because processes take so long. But I think being data-driven in healthcare is no different than being data-driven in any other industry. And Integral enables Pfizer to be just as data driven as like a Walmart for example.

Brian Urban:

That's extremely helpful because it has been thought as competing [inaudible 00:20:31] or two different buckets, quality and privacy. And I like how you broke it down, and I think this is a miss across a lot of individuals that don't sit inside a data analytics, data science or even healthcare informatics type of role is you don't need to have everything. You need to have the right stuff at the right time in particular to whatever you're developing, program a better UI for a care manager and a health plan, et cetera. So I'm glad that you broke that down because that's so overwhelming for any organization to think I need to have 100% on every possible data element on the lives we serve. It's a very big feat and it's unrealistic to think in those terms. But you also made me think about remediation too. So I want to go through a data journey real quick here, if you don't mind.

So I saw your piece on synthetic data analysis and I want to have our listeners understand a little bit more of your work here. So your risk engine that you put out, some of your findings in terms of what you kick back after analyzing billions of records, not just analyzing them but executing remediation. Can you tell us a little bit more about your risk engine and the remediation portion of that work? And we're referencing something that's, you can easily find this for everyone listening here on the Integral website under not only your blogs, but your resources section there too.

Shubh Sinha:

Yeah, for sure. I think the Integrals privacy engine, as we call it, is the meat of our offering. And so I love talking about it as much as I can. And so the engine itself, it's meant to function as a software that reads data and then programmatically reads the schema. So first name, last name, all these different column headers and whatnot. It can work with both structured and unstructured data. If it's unstructured, it structures it for us, which is pretty neat, I'd say. But either way, it reads data, it reads the schemas, and then it reads the values in the columns, and then it programmatically analyzes every single column and does a combinatorial analysis on the entire set of data that's seeking to be joined. So say you upload two data sets or 10 data sets, whatever it might be, the risk engine will programmatically analyze the privacy risk of joining all of those data sets together.

And then it'll give each column a score. And that's how we wait, which columns are more privacy intensive than others. And then once we have that, we actually have a UI to the risk engine that we call our platform. That UI can power remediation. So I as a data scientist, for example, can deploy the risk engine and then I can log into the platform and see the findings and see that, okay, let's say social security number is one of them the most obvious no in the industry, it'll flag that as being the highest problematic column. And it'll say, "Do you want to drop this? Do you want to truncate it? Do you want to remove it?" Or I guess that's drop as well, but it'll offer all these factory options and then it'll also offer a customization option. Do you want to only keep one number as opposed to all the rest?

And so that one is a pretty obvious, let's drop it to make it compliant, but a first name is a good example. Where do you need Shubh or can you just take S, do you need my race or can you just take broader? Can you say Asian instead of Indian? Those are the kinds of things that we can power flexibly. Whereas today, if you went to a consultant, it would take a couple of back and forths to the point where you could actually get a remediation, whereas we kind of provide that experimentation in real time. And you don't have to ask anybody, you just put it into the platform, hit the rerun button and you see like, "Hey, can I do this?" And that's where the score comes in where it real time changes. And so if you see it go up, you probably are still privacy intensive, but if you see it go down continuously, you know you're on the right track. And so the open questions in your mind get answered as easily as just clicking a couple of keys on our platform.

Brian Urban:

Yeah, and that probably adds to the whole speed of being able to prioritize going forward in terms of regulatory and through going through the process just faster. So rather than having back and forth with a lot of different teams or colleagues that you'd have to work with. So that's really, really helpful. And what's interesting too about the structuring unstructured raw data side of it, so many use cases across healthcare and probably beyond that are very meaningful use.

Being able to put different pieces together of someone's profile for meaningful use that's related to healthcare coordination or anything that would go as a notable downstream vital piece of information for physician or researcher to see at an individual level. There's so many applications to that. We don't have time to go into that today, but I love that you said that a part of your core of your platform can do that, and that is really neat. And that's a labor of love to be able to take unstructured raw data and figure out how to put things together, almost puzzling a little bit, but let me take a look in the future here. Shubh, before I go down another track of thinking. So Integral is young, you are a young CEO. Help me understand your path to impact in healthcare over the next three years. Who is Integral going to be as an organization

And maybe what's your biggest impacts or contributions to the healthcare ecosystem at large? What do you see happening?

Shubh Sinha:

I think about our long-term mission. I would say a decent amount, if not most of my job day to day. And that's where I've always seen that Integral can create impacts in various ways, but it all tends to come back to one major theme of today. We get data sets ready as quickly as possible, which unblocks clinical research, it unblocks commercial messaging so that people know what medications are available, it unblocks commercial research. So understanding who are my people that I want to distribute a medicine to or who are the people I want to build a medicine for or any kind of number of other use cases there. And so today, I think our main value is that we unblock very important work that's centered around people's health. Where in three to five years I want to be with Integral is not only creating data sets as quickly as possible, but also helping companies evaluate data sets that they want to buy.

So today, for example, we come in, post dataset has been bought. I as a business manager for example, need to get a dataset ready for production. And so today we come in, we unblock that part, and then we do that over and over and over. What we're increasingly building out tooling for and where we're increasingly going is understanding your data to almost an extreme degree where we provide this feature called privacy analytics in our platform where you can see the privacy intensive variables. You can also see a breakdown of where your data [inaudible 00:27:57]. So for example, if you have a lot of one geography, it's possible you don't know that in your dataset. And so we call that out. And so we're building this bespoke data tooling not to provide necessarily analysis, but to provide a privacy analysis such that you understand where your data is leaning and where it's going.

And our hope with that is to provide a magnifying glass into data sets. Because as you probably know, healthcare data is incredibly messy. In the structured world, it's somewhat cleaned up in unstructured world, it's definitely not cleaned up. And so where we want to go is doubling down on that magnifying glass by providing more of the privacy analytics piece, by providing synthetic data generation to mimic what a real world dataset would look like. And let's go a step further, not only just mimic a real world dataset, but mimic a real world compliant dataset, meaning you understand what a compliant dataset looks like even before you go out and buy the dataset because you used Integral, you gave us a certain amount of inputs, so say your desired column headers and your desired use case, and we were able to generate a dataset for you that serves as a legend on a map in a way where you can say, "This is what I need to buy."

And then a director or a VP can instruct direct reports to say, "Hey, go buy the data sets that Integral has recommended," because we've taken their data inputs or their desired inputs and really taken our privacy technology today and applied it to what could be. And so where we want to be is almost like a data compliance copilot in a way where we can power data buying needs, we can power data sharing needs, we can power richer analysis. And so I think that all fundamentally however, builds on getting data sets ready as quickly as possible and then going earlier and earlier into the process.

Brian Urban:

I love that. It's very niche, service-oriented end-to-end synthetic data generation it seems like in most part. But it goes beyond too. That is extremely exciting. Well, I'll tell you what we got into a lot today, and I'm so thankful, Shubh, that you were able to join us, talk about yourself, your journey to CEO stardom here with Integral and the path forward. So thank you so very much for joining our little show today, Shubh.

Shubh Sinha:

Thank you very much for having me. I really enjoyed the conversation and the insightful questions. It's a ton of fun.

Brian Urban:

Well, thank you very much again, and for more exciting excerpts and insights, please visit us at finthrive.com.

Episode Transcript

Imagine revenue cycle management without the friction