Health Data and AI: Key Drivers for Transforming Therapeutic Innovation in Oncology
Data quality, automation via LLMs, Early Access Programs: How can Artificial Intelligence help refine precision medicine?
Jan 7, 2026
5 minutes
Over the last decade, the number of oncology clinical trials has doubled, and therapeutic innovations are on the rise. To keep pace with this momentum, research and personalized medicine require increasingly more data to better characterize specific patient subpopulations
However, this data presents numerous challenges: collection, structuring, quality, and interoperability. While existing administrative healthcare databases (such as the SNDS in France) are rich, they have limitations regarding the precision required by modern medicine. Integrating digital tools and AI into the value chain offers new perspectives for unlocking the full potential of real-world data.
Marco Fiorini, Executive Director of the FIAC (Artificial Intelligence & Cancer Network), shares his insights in this interview taken from our conference "Health Data: Challenges and Prospect for Accelerating Therapeutic Innovation". We thank him for sharing his vision on the future of data in oncology.
The pharmaceutical industry is a major player in therapeutic innovation. What are its specific health data needs today, and what limitations do you encounter with current tools?
In the pharmaceutical industry, we are more secondary users of the data, once it has been produced.
We have extraordinary medical-administrative data at our disposal, such as the SNDS (National Health Data System) and the carte vitale health insurance card. The latter is very interesting, but was designed to reimburse healthcare costs, not for epidemiological purposes.
The pharmaceutical industry, the diagnostics industry, and the healthcare industry in general need to know what is happening in the field before implementing an innovation. They will want to test it to gain market access and find out how it will transform practices. In this sense, the SNDS is quite remarkable. But it does not do everything: we need more recent, more precise, and more accurate data.
For example, if your innovation is aimed at BRAF V600E-mutated NSCLC patients, you will have very few cases. If you are unable to filter them, you will find it difficult to see what the journey of these patients is. This will become more complex with personalized medicine as treatment sequences become more refined. The data must therefore allow for precise focus in order to make accurate decisions. The coarser your vision, the less relevant your measurements will be. That is why we strongly believe in what Resilience and other players are doing with language models to transform refinement and the production of accurate and recent data.
Ultimately, none of this is worth anything if we cannot be sure that the data reported is accurate and truly reflects what patients are experiencing. The less certain we are, the less valuable the data is, and this is true for all stakeholders. At FIAC, we are in dialogue with the High Authority for Health in France (HAS) to define an accuracy score to guarantee the accuracy of the data reported. It is important for us to refine this score so that when we exceed the threshold of this accuracy score, we can be sure that the data no longer raises questions for the HAS or the French Health Products Economic Committee (CEPS) about its value.
The Artificial Intelligence & Cancer Network (FIAC) is currently conducting several field experiments. What specific types of projects are you deploying to leverage this real-world data?
Our rationale is to create prototype projects, in the industrial sense of the term, i.e. projects intended to go into production. Today, we have 14 prototype projects, all of which use real-life data. To give you an idea of the themes involved, we have projects focusing on:
- Patient pathways: with the ability to see more clearly, thanks to increasingly precise focal points, what they experience in the heterogeneity of therapeutic areas and the geography of France.
- Patient quality of life: can AI help us, in everyday clinical practice, to generate quality of life that is recognized by regulatory authorities and by patients undergoing treatment?
- Prevention with AI: in colon cancer, for example, can AI, with a risk score and a simple blood test, enable you to go for screening if you haven't already done so? Will this changemedical practices in private practice? How do practitioners behave when faced with an AI solution? Today, there is an imperative for human oversight. By sampling, there is therefore a panel of experts who must periodically review whether what AI produces is relevant to their practice. Paradoxically, by testing this AI-based prevention, we realized that the greatest risk is that practitioners will place complete trust in AI and no longer verify the results. This is something interesting that I did not expect when we launched this project designed by Roche Diagnostics.
- Early access: use cases utilizing language models.
- We also have some interesting projects on automatic report generation.For example, if I talk to you and there is an automatic transcript, this can be used when a practitioner talks to their patient. We can even imagine there being three reports: one for epidemiological research,one for care, and one for the patient and their caregivers.
These things are not out of reach. We just need to see how we deploy them and think about acceptance, because none of this will work if people don't buy into it. And they won't buy into it if it doesn't save them time and add value to what they do in medical terms and, why not, economic value if we reuse the data.
To conclude, I think that producing high-quality data more easily with language models can also be transposed to imaging. For example, we want to create cohorts that can extract data directly with LLMs (Large Language Models) to build the cohort. Except that the source is not always text. We also want to ensure that we use models based on imaging because all AI models involving images work very well.
Historically, this may even have been the best approach until the explosion of ChatGPT. So perhaps tomorrow's cohorts for research will be built from texts with language models and fromimagery.
France benefits from an advantageous system with its Early Access Programs. How can AI and Large Language Models address the data collection challenges within this specific framework?
Early access is an opportunity for France, as it can provide access to innovations before they are brought to market. Data from early access is valuable, as it shows how innovative molecules behave in real life. But when this data is entered manually, it is very incomplete. We have read about completion rates of around 40%, but when we look at what is actually happening, we are even lower, at around 20 to 25%. We therefore asked ourselves: can small, medium, and large language models help us automate data reporting during these early access programs? Several colleagues from pharmaceutical laboratories have designed a project aimed at comparing two arms:
- a “traditional” arm, for data collection to monitor early access;
- an arm that would involve the same patients but whose data would be generatedautomatically by language models of any kind.
We therefore want to compare, across around thirty concepts, what is reported by AI and what is reported by traditional collection methods. We have set the bar quite high, as we are fortunate to have three batches of early access. We will collate the data and see how they compare in order to try to assess an accuracy score, co-defined with the HAS, as an element of value for the datareported by AI. We are talking about early access, but once this kind of practice is implemented on a French scale, we can use it for many other things.
To conclude, what are the essential drivers to ensure the adoption of these new tools and their real impact on our healthcare system?
I think the key issue is the adoption of these new tools. If you save practitioners time, improve the relevance of epidemiological research and the quality of life for patients, and if you bring something to people who are experiencing things, either in practice or in their own flesh, then AI models will be adopted.
But this is not an end in itself; these are just tools that must serve a purpose. If we manage to be agile enough to do this, the tremendous decentralization and accessibility of these models will create a truly modern system. Let's not forget, however, about centralization. At the state level, there need to be major projects to try to provide a common vision of what we can do. So here are the important elements: usefulness for practitioners and patients; a decentralized approach with decentralized access to AI tools; and also a centralizing effort, with shared visions, particularly on shared use cases.
Finally, one last point: we must not serve only the patient or only the practitioner, start-ups, the pharmaceutical industry, or the CEPS. We must try to see how data can be valuable so that we all have a more accurate picture of what is happening in our healthcare ecosystem. If we are facing problems with the sustainability of our healthcare system, it is also because we have a coarse-grained view of it. And that is because we do not have the data. The more the data serves to provide a clear picture of what is happening, the more effective the decisions we make will be in ultimately achieving something modern and efficient. This is truly a battle we must all fight together: to ensure that AI serves to produce more detailed data in an agile and accurate manner, in order to gain a more relevant view of what is happening in our healthcare ecosystem. Today, you would be surprised to learn the analytical basis for certain decisions that are, by nature, made from fairly coarse granular data.



