Sunil Kumar Raja
In some ways, scientific discovery shares similarities with a highly disciplined culinary craft and scientists are like Master Chefs. They combine skill, precision and creativity, and need an environment where everything is in the right place.
The more organised the kitchen, the more the chefs can focus on creating good food or new ideas. In the same way, well-organised data is essential in research. It means scientists can focus on their core job: innovation.
Just as it’s easier to cook in a kitchen that has clearly identified, accessible, and reliable ingredients, scientific work depends on data that is relevant, organised, and traceable. When those foundational elements are in place, scientists can direct their effort towards generating hypotheses and discovering new medicines that can help patients instead of sorting through logistical data hurdles.
Even the most skilled scientist will find it difficult to generate new insights without well-structured data infrastructure. By making data easier to find, understand, and reuse, it increases efficiency, and lowers risks and costs. It accelerates the research process and can be one of the deciding factors in whether a promising scientific question translates into a viable, impactful study.
Data is most powerful when different data sources are combined. For example, a scientist analysed a 16-year study comprising data from over a thousand children. By combining data from the study with other data sources, such as lifestyle and social factors, and hospital health records - new insights emerged. This integrated view revealed early signs of antibiotic resistance and the probable reason for the cause in a specific group. Importantly, these insights would not have been visible by looking at individual datasets on their own. The findings helped shape new research questions, supported the design of a larger clinical study, and ultimately contributed to the development of preventive strategies and new treatment options.
Machine learning and prediction tools can greatly improve how we understand diseases, but only when there is access to sufficiently large, high-quality, and relevant data. In one case, we had to stop a promising study to predict how a pneumonia-based respiratory disease might progress because the right data wasn’t available at the depth, scale, and specificity required to train reliable predictive models. This is one of the major challenges in modern research: innovation is often constrained by insufficient data availability, standardisation, and readiness.
There are examples in cancer and diabetes research that show what is possible when data is integrated across institutions through global partnerships and collaborations. By pooling data, researchers have built models that predict relapse risk in cancer or progression from pregnancy-induced diabetes to type 2 diabetes. In contrast, many rare diseases lack the data needed to make those similar advances. And this is where I see our greatest opportunity and responsibility: building data banks for rare diseases.
Our vision at CSL is to build a strong foundation of trusted, high-quality data. By carefully organising, standardising, and responsibly sharing data, we can turn small, fragmented datasets into collective knowledge. This enables better predictions, accelerates discoveries, and unlocks new treatment possibilities. In doing so, we’re shaping the future of medical innovation.
While technology accelerates progress, scientific insight still comes from people. It’s having the curiosity to ask the right questions, the expertise to interpret patterns, and the imagination to build predictive models that turn data into knowledge and knowledge into a therapy or medicine.
And it’s that impact that motivates me, knowing that every database we build, every platform we improve, and every dataset we connect contributes to a larger purpose: helping people live healthier lives.
Back to overview