A new generation of clinical research is arising, thanks to a profusion of health data and data-powered technologies such as in silico modeling.

More data, by itself, does not guarantee better results or more accurate predictions from in silico modeling. Those depend on the quality and completeness of source data, the accuracy of the algorithms, and the analyses of algorithm output.

Each kind of data has limitations. Clinical studies provide data collected under rigorously controlled conditions, but often don’t capture the variability found among actual populations. Real-world data – of increasing interest and value to industry and regulators – go some way to addressing that variability, but come with their own weaknesses, such as uncontrolled inclusion criteria and inconsistent quality.

Understanding the drawbacks of different data types is critical to assessing the strength and quality of in silico model outputs or predictions.

Assigning data a “trustworthiness” score

The quality of an analysis and of a model, which means its predictive capabilities, depend on the quality of input data. At Novadiscovery, the most accurate models – those which make the best predictions – are built using individual patient data. But useful models can also be built using aggregated data (distributions, means, medians, quantiles, standard deviations…) They may be less accurate, but that doesn’t mean they don’t tell us anything at all! Even imperfect data can sometimes produce helpful results.

The important thing is assessing how much better or worse a model will be, as a function of the quality / type of data used to build and calibrate it.

This is where scoring data and the resulting models on their quality can be very useful. Scores provide researchers, engineers, clinicians and the general public with a sense of a model’s predictive accuracy. They offer a quantitative measure defined by a formal and explicit process.

Nova’s Strength of Evidence and predictivity scoring

At nova, we assess and score the quality of the input data used to build our knowledge and computational models. By ‘quality’, we mean trustworthiness: our level of confidence in the data’s accuracy.

In building the knowledge model, we measure the Strength of Evidence of the research results, facts and scientific assertions (including numerical values that will directly be implemented in the model) that we extract from research papers. For instance, data or assertions that are corroborated across several well-designed experiments will score more highly than those based on fewer, or less water-tight, studies. Then, when calibrating the computational version of the model, we ensure it meets certain requirements – expected system behaviors – defined from the literature data and, where possible, from previous clinical studies.

For example, let’s say that we know from a previous clinical study that, for a given patient profile, the drug half-life time is 12 hours. We assign a 100% predictivity score to the model if it predicts almost exactly that half-life; if it’s +/- 30 min, then it scores 50%. If its predictions are 2 h, it would score 0%. Each requirement is weighted according to the Strength of Evidence (the trustworthiness) of the piece of knowledge (in our example, the previous clinical study) that allowed to define such a requirement. The overall predictivity score of the model, computed over all the requirements’ scores weighted by the Strength of Evidence (the quality score of the input knowledge), gives a measure of the model’s predictive capability.

This scoring system allows us to appropriately weight the influence of different inputs on the system. The higher an input quality score, the more it counts in the algorithm or the analysis results.

Note that the reliability of input data is not the only factor that influences the weight of a requirement that we deduce from it: another factor is the importance, in biological terms, of the expected behavior within the whole system.

Assessing and scoring data quality is likely to be critical to building understanding of, and trust in, this new era of data-driven health research. The concept can be generalized, with the appropriate qualification measures, to any kind of data or modeling results.

General qualitative data quality estimation is sometimes provided by data sharing initiatives and hubs. However, from a scientific perspective and for traceability purposes, it is important to give a quantitative measure, and most of all the formal description of the underlying scoring process and calculus.

In order for in silico modeling to be trusted as an evidence source in scientific research/drug R&D, industry stakeholders and regulators must collaborate on a common system for scoring data quality, helping provide the transparency and continuous learning that is required for in silico to reach its full potential.