A recent article in R&D Magazine saw researchers warning that a flood of unassembled genetic data is being produced much faster than current computers can turn it into useful information. The piece caught my eye as it highlights a critical issue: that informatics is the real ‘dark matter’ of the genomic revolution. Just as in the early days of high throughput drug discovery, when measurement and automation technologies generated mountains of data, informatics can still be the difference between active management of the data deluge problem or simply just coping with it.
Informatics is not just about technology, it’s the combination of people and IT. Not only do we need bigger and faster machines, we need the software that enables bioinformaticians to work more efficiently. Too often highly skilled scientists spend their time fixing spreadsheets or finding files rather than being able to focus on the science, which is where they add significant value. Generating self-built systems to support themselves and stakeholders is normal at the early stage of any new endeavour but it is no long-term solution to effective data management.
The storage, provenance and sharing of scientific data as part of the analysis process is often overlooked and this is a critical failure. Without the ability to contextualize and store files, orchestrate analyses and share insights gained, ‘omics analysis remains transactional, creating vast amounts of rework and lost innovation potential.
Commercial systems do exist to address these challenges. They drive efficiency and capture the fruits of bioinformaticians’ work so that their impact is greater.
Before we rush to write the next great algorithm and build the quantum-based supercomputer we must put in place a robust data management foundation which will support this information revolution and those who will drive it.