The pace of publication for new biomedical research makes it impossible for humans to keep up. Advanced algorithms using machine learning and natural language processing can help.
In 2016 alone, more than 1.2 million new citations were added to PubMed®, bringing the total number of citations to more than 27 million. While it would take many lifetimes for human readers to get through it all, computer algorithms can quickly sort through large scientific corpora.
People already rely on search engines to help them find information. However, these simple programs use keywords to return search results—they do not “read” and “understand” the content of the publications they are searching. The most relevant information is often buried deep within the search results.
Battelle Sematrix™ is different. It uses natural language processing and machine learning to extract meaning from unstructured sources and documents written in natural (human) language, such as scientific papers. Using ontology-based categories and axioms, it is able to construct a knowledgebase that enables it to draw meaning out of the documents it “reads.” An inference layer allows it to make connections between knowledge from different sources: e.g., if one document says A=B, and another says B=C, it can infer that A must also equal C.
Using these methods, Sematrix can vastly reduce the time and effort required to review scientific literature, identify the most relevant citations for a specific research question, and find hidden connections between different studies or data sets.
Researchers at Battelle have already applied Sematrix to a number of knowledge management problems, including mining PubMed and other large corpora of scientific documents to find relevant research for development of new healthcare quality measures for the Centers for Medicare & Medicaid Services (CMS). Sorting through published research for measure development can take hundreds of hours per measure for human readers. Sematrix quickly identifies the most relevant documents, extracts the information needed to support measure development, and puts the information into a format that human measure developers can use. This could ultimately reduce the time it takes to develop a new healthcare quality measure from months or years to weeks or even days.
To learn more about how Sematrix works and how it is being used, download the full white paper here.