How data analytics handle massive amounts of data
Get in-depth insights into a step-by-step process coupled with a real case scenario depicting data analytics frameworks handling vast amounts of data.
Data Is Good.
It’s the Use
In this modern age we all have to adapt to a changing landscape in order to function in the paradigm of a data-driven society. We, at Avenga, are committed to empowering everyone with the opportunity to share unique perspectives through a series of webinars specifically designed to respond to the global digital developments in pharma, pharmatech, biopharma, healthtech and life sciences.
Today, we are highlighting data as a property and data ownership, which is the topic we addressed during our webinar on Reframing Data and Privacy: The new ethics, optics and relationships. Our special guest, Michael DePalma is a renowned expert in the technology, healthcare and life sciences space and a devoted advocate for the utilization of meaningful technologies in human health. He shared a number of points I’d like to pass on to you.
Let’s diﬀerentiate between data protection, data privacy and data ethics. Data protection is how we treat the data we already possess. It does not address the data we do not yet have, nor does it address how we acquire data, what data we acquire, the consent status of that data, or how we use that data.
Data privacy, broadly written, is what practices we use to ensure that we can protect the person or persons to whom the data refers. We’ll get back to that issue later as it’s a very broad topic, as well as a part of this discussion, which is what even does data privacy mean today.
Finally, data ethics is a broad framework that applies to how we make decisions related to data. Privacy and security are components of data ethics, but in addition, we need to understand the accuracy (veracity) of the data, it’s quality, sample sizes, how it was collected, and what, if any data may be missing, etc. So we’re establishing a set of principles and standards for data science and analytics.
→ Explore data services for business by Avenga
Data ethics is one of those things that we all just assume exists. We all assume these things are common sense. But, the thing about common sense is that it isn’t always so common, in particular to when it comes to data. For decades, our default setting in technology has been to “collect everything and ﬁgure out how to use it later”. And if you think about digital health or digital therapeutics, and even if you think about systems, websites, and apps, the default setting for them is that we’ll collect all the information about users and store it.
So, assuming that you have all that data, data ethics is a framework within which we can make decisions about this data. There are always assumptions about data: that we have enough data, that it’s accurate data, and it’s correct, and so on. One of those that Michael was interested in is if we are supposed to have all that data in the ﬁrst place, which is a huge issue if you start to unpack it.
A lot of organizations have adopted a “Privacy by Design” approach, which is a framework that makes decisions BEFORE there is potentially trouble. We all, as stewards of the data, have literal ﬁduciary responsibilities regarding this data. We’re on the hook for its protection and use. It’s a very valuable thing. Also, it’s a dangerous thing in the wrong hands, and an incredibly positive powerful thing in the right hands.
Essentially, data is good. It’s the use cases around data that can be problematic. It’s not a data protection issue, it’s a business model. It’s like having carte blanche with the data and to do whatever we want with it . So what are the things you’re going to do if you think about a specific framework with this?
→ Read more Why Consider Enterprise CRM for Your Business
Ann Cavoukian is one of the people Michael looks to for these ‘privacy’ approaches. Ann is the former Information and Privacy Commissioner for Ontario, Canada. She’s developed a 7 Step Privacy By Design framework that he thinks is really helpful. Again, it seems like common sense, but you’d be surprised. And Michael usually adds an 8th point.
→ Read about Customer experience in the financial industry
Yes, there are differences but they’re inextricably related. Let’s begin with data and ask ourselves a few questions. What data do we have? How was it sourced? From Whom? To whom does it refer? What rights does the data provider have to give/share or sell you this data? What is the consent status of this data? What is the time frame of this consent? In other words, what practices are in place BEFORE we even have the data in our possession?
Algorithms simply help us categorize and make decisions about this data. Which is where things like bias show up. Bias can be human-based or data-based. For example: Are we making the right decisions about the right data? Are we making the right decisions about the wrong data? Are we making the wrong decisions about the right data? All of those things can occur.
On a side note, Michael shared about attending a dinner party and there was a conversation where people said things like “AI should be built to human ethical or moral standards”. To which he replied, “Which human? Which ethical standards?” It’s his way of saying if you’re not having those conversations, you’re at the wrong dinner parties.
Practices generally follow these questions as well. Which is why data ethics is so much larger than people perhaps give it a credit for. Often because we choose to apply it to one area; for example, “I’ll apply ethics to my AI . . . but I have all this massive data that I probably shouldn’t have, in the first place.”
The answer is . . . there is no good answer. If we ask Michael legally, he’ll tell you it depends on where you are geographically. If you ask him practically, he’ll tell you that most laws, at least in the US, say whoever owns the media it’s stored on, owns it.
Unless of course, you “bought” it from a data aggregator, in which case you have a license to “use” it, based on a predetermined set of parameters which you agree not to broach, for a predetermined period of time, after which you either “give it back”, which is honestly meaningless, or you verify you’ve “destroyed” it. Those are vestigial words and ideas when it comes to the digital world.
Michael shared this example: ‘If data is on paper, and there is one copy and I burn that paper, that data is destroyed and cannot be reused. If however, there are inﬁnite copies of that paper, then the destruction of one has zero impact on the usability of that data. Worse, what if I don’t even know who has those papers containing the data? And still even worse, is that maybe I never agreed to anyone having those papers, let alone that these other parties would buy and sell them to each other without my knowledge or consent.’
In terms of value, for years, we’ve heard the “Data is the new oil” trope. Which at its best, can orient the listener to the idea that data is valuable and market making, the way ‘sold’ is/was.
But it falls apart right after that point. Michael continued with his example: ‘With oil, we have a clear sense of provenance, where it came from, how it was acquired, from where, from whom, and for what price. We know that if I have a barrel of oil, there aren’t inﬁnite copies of it. It’s a physical object in the real world. If I burn it, no one else can. Oil, by the definition of economists, is a rivalrous good. Data, however, is non-rivalrous. There can be inﬁnite copies of it (and often are) and the use of one of those copies doesn’t preclude someone else from using their copy of it.’ It’s what makes data so compelling and valuable.
If you look at GDPR, legally, it’s clear that the individual owns data. They are empowered to say who can have what, for what purposes, and can elect to say “nope, you can’t have it.”
In the US, we’re on the other side. We regulate data using speciﬁc laws that depend on the type of data and the type of use. Medical data is HIPAA. Employment data is ERISA. Driving data, ﬁnancial data, credit data, or pick something and there is a separate law or set of laws that govern it. So we have a patchwork of regulations with holes in them so big you can drive a truck through them. It’s these holes that the data industry uses to support the business models.
For Michael personally, he believes that we need a simple test to determine who “owns” the data. It’s this: “If the data in question is a byproduct of a good or service I paid for, in whole or in part, AND that data can be bought, sold, or shared with a party that I don’t have a direct relationship with, contractually or otherwise, it’s MY DATA.”
Here is a hypothetical example: If I spend $600 for a pair of sneakers that tracks my activity, my steps, my heart rate, respiratory rate, body temp, geolocation, etc. who owns that data? Should I have a subscription to my data? A practical application of this is if I visit my physician for a physical, and receive an EKG, an Xray, and maybe I ﬁll a script for a hypertension medication . . . where does that data reside? Who has access and for what purpose? If the purpose is not for my treatment or care, what happens to it? It’s a huge question that we can talk about for hours.
Let’s look at some use cases. Michael shared more from the example above: ‘I visited a Doctor for a physical. In the US, healthcare providers are considered “covered entities” under HIPAA. That means they have carte blanche access to ALL my data for the purposes of my treatment and care, and for their own healthcare operations, things like quality control, etc. That all makes perfect sense.’
Now, assuming we agree on that, Michael asked us to look at what happens next. Doctors, Nurses, PAs, NPs, technicians, and hospital administrators are all very good at their jobs. But they’re not good at other things. Like building massive EHR systems. So that falls to the tech part. So we have a massive industry of companies that build systems that collect, and in many cases, utilize the data collected by the healthcare providers. Good so far.
The thing is, those companies don’t have the same rights to “USE” the data that the providers do. After all, your Dr. is caring for you, your EHR system is not. So they’re restricted in terms of their “USE” of the data.
But here’s the rub, by stripping some info and packaging the data up, these parties can sell that data into a larger data value chain, where it is combined with other data from other sources; it may be aggregated or it may be used to make decisions about us.
In the case of research, again, there are clearly deﬁned corridors for research in most US laws. Usually, that is as long as it’s de-identiﬁed, something Michael mentioned earlier, or that it’s been approved for use by an IRB, that it’s ok for research. He, like others, actually wants to see a system that accelerates use of the data, not restricts that data. The difference is that it’s known who and for what purpose.
So what would a transparent data ecosystem look like? Where individuals were empowered to willingly and knowingly participate in research and in data collection and use, but without the laborious machinations currently in play. What if research had a real relationship with a data subject or consumer? What if there was value to be exchanged directly at the level?
Michael shared that he believes that people have a right to know who has their data, what they have, and what is being done with it. That a transparent relationship between data steward and data subject should be the base case, always.
Data is a value. There is no refuting that. Yet the only people NOT participating in the data economy are the data subjects, the individuals themselves, the consumers. So what he was saying was: ‘What does the world look like if you change that? Clearly what I think is that we need to create a value chain that includes the individuals who are the basis of that value.’
The answer, and some people hate when Michael says this, “it depends” on the data in question, its use and a host of other factors. He took us through a thought experiment: ‘I’m a musician so I collect guitars. Let’s say a certain guitar is worth $500 new. Great, ostensibly, it’s $500 new, no matter where I get it. And it should be worth less than that if it’s used. Less still if it’s in bad shape. Now let’s say that guitar was played by Jimi Hendrix. Suddenly, it’s not $500, it’s $1M. Same guitar. Or is it? Let’s say it was played by Jimi, but somebody swapped out the electronics and repainted it 20 years ago. Still worth $1M? Probably less. So it’s dynamic.’
There are innumerable types of data. Personal/health data is highly valuable. Financial data is highly valuable. Perhaps you can argue that your buying habits are valuable, they certainly are to marketers. But not all data is equal. Much of it is frankly, noise.
Michael actually worked with Avalon Health economics to look into this question. What is a PHR worth to an individual? To a community? To Industry? It’s a fascinating area, and frankly, one of the answers it settled was, that the value also depends on where in the supply chain you assess the value and to whom.
Great question, Michael says, then he shared that he would begin looking at what data assets your company already possesses. What do you have? Where is it from? What does it contain? Where was it acquired? What is the consent status of this data? What use cases is it to be suited for and authorized for? Once we can understand what we have and understand our existing risks, we can move forward to the establishment of a data ethics framework within which we can apply it to legacy data as well as future data collection and use.
Michael wanted to say something here “about trust. A word that we throw around a lot. Pharma is NOT a trusted industry, in the US at least”. In fact, they’re viliﬁed. The amount of misinformation and mistrust out there about what Pharma does is staggering. Which always leads to some cognitive dissonance. We, the Pharma and healthcare industry, are literally working day and night to try to make human lives better and then they hear all sorts of baseless nonsense. You’ve all heard your share, no doubt.
The point is, Michael says, “we need to help ourselves here.” Michael Pierson, the Chair of Social Entrepreneurship at Fordham University said “If people don’t know what you value, they impute cynically, that it’s only ﬁnancial”.
Kristen Martin, the Chair of Strategic Management and Public Policy at George Washington University said “When people don’t trust you, you start to lack legitimacy until you don’t have a leg to stand on to opine about what others should do…”
If you look at things like COVID-19, anti-vaxxers and the like, you’ll see a lot of this is real. Michael suggested Pharma does itself a favor and gets ahead of this issue.
Facebook survived its Cambridge Analytica scandal. But Michael wasn’t so sure it would go well if it happens to Pharma or healthcare.
→ Read about Data science perspective on COVID-19: a real life example
Get in-depth insights into a step-by-step process coupled with a real case scenario depicting data analytics frameworks handling vast amounts of data.
Explore the top 10 financial industry tech trends to obtain a competitive edge through harnessing the power of ongoing and upcoming innovations.
Learn what ChatGPT, the successor of the very successful GPT-3, is capable of when it comes to personal and corporate usage.
Learn how Stable Diffusion generates images and why it significantly outperforms other generative models.
Confidently navigate the complex landscape of pharma and life sciences with the knowledge of these critical pharma trends.
Learn how to make the most of the Stable Diffusion AI image generator, as Avenga Labs have tested it in and out for you. Check the article for the experiment results and engine usage guidelines on a variety of software a…
Discover the intricacies of clinical trial automation with SDTM standards.
Find out about AI in banking and explore its role in the new approaches to credit risk management which make credit AI an integral part of better and more secure banking.