A new software collaboration makes a vast range of biological databases more accessible for exploitation
Data mining and visualisation software created to help plant scientists trawl diverse biological databases for clues to design better crops can now support less specialist users and across a range of disciplines, including human disease research.
KnetMiner, with a silent “K” and standing for Knowledge Network Miner, is a suite of open-source software tools developed at Rothamsted Research for integrating and visualising large biological datasets. Genestack, with whom Rothamsted is collaborating, provides a secure, commercial platform.
The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes or publications. The aim is to provide leads for scientists who are investigating the molecular basis for a particular trait or ways of improving the organism’s performance in some way.
“It takes the slog out of preliminary investigations when you need to explore multiple online resources to look for what might be happening,” says Chris Rawlings, head of computational and analytical sciences at Rothamsted. “It helps people to understand the complexity of the biology that underpins traits, of how different genes contribute to a phenotype.”
KnetMiner has proved itself at Rothamsted as it has evolved over the past five years from being a visualisation component of a data integration system, known as Ondex, developed at Rothamsted more than a decade ago.
“Genotype to phenotype analysis is at the core of what biologists do,” says Keywan Hassani-Pak, head of bioinformatics at Rothamsted and KnetMiner’s lead developer. “With KnetMiner, we have created software that enables biologists to take their own high-throughput experimental data and to see them in the context of all the public knowledge that is out there. This can help them to interpret their own data faster and more effectively.
“For a particular target species, such as a crop plant, KnetMiner integrates all the relevant genomics and multi-omics information that is present in more than 25 sources under a multitude of formats…and brings it together in the form of a heterogeneous knowledge network,” says Hassani-Pak.
He adds: “We don’t only integrate the data; we also create new relationships based, for example, on co-occurrences of genes and phenotypes in the scientific literature. We are the first in the UK to develop such detailed networks and make them mineable.”
With security-conscious corporations keen to use the software, KnetMiner has now advanced from being a research tool to a commercial product by joining with the Genestack software platform that is designed to overcome the challenges of bioinformatics in research enterprises.
“The Rothamsted researchers could spend months collecting all the data that was available for a particular organism, cleaning the data and writing scripts to transfer it into a format that was usable in KnetMiner and then presenting it so that other scientists could use the information,” says Misha Kapushesky, chief executive of Genestack.
After migrating KnetMiner onto the Genestack platform and automating the collection process, says Kapushesky: “It is now possible to simply ‘point and click’ on data that is in the public domain to create a network and then overlay your own data, using KnetMiner to visualise it.
“You can build your own network with collaborators in a secure environment. It is no longer a fixed set of data on the Rothamsted website but a dynamic tool that can be made commercially available,” says Kapushesky.
Genestack now hosts more than 40 plant and crop networks, as well as a prototype human disease network. Although the software originated in agri-research, network mining for gene discovery is generic and Genestack provides an environment for building and distributing these large-scale knowledge networks.
“There are a lot of tools out there that will return a list of ranked genes when you are conducting a gene candidate analysis, and of course KnetMiner also does that with its evidence-based gene rank algorithm. But most of them also stop there” says Hassani-Pak.
“KnetMiner is unique as it allows users to see how and why the prediction was made. They can fully understand the results because the process is completely transparent and the provenance is visualised” says Hassani-Pak. “There is no black box approach here.”
Hassani-Pak and Kapushesky say that this approach supports human-augmented knowledge discovery, which puts human experts – rather than machines – at the core of the decision-making process.
“We need to free [the human brain]from tedious tasks,” says Kapushesky. “By reducing the complexity, it makes it easier for researchers to see the patterns and links that push the frontiers of science further, and the tools also make it possible for others to apply the findings in a commercial environment.”
Rawlings adds: “This is a good example of how research software can be translated into a commercial platform for industry, with potential revenues, through royalty payments, returning to Rothamsted to fund further research.”