Literature-based Medical Knowledge Discovery

Due to the sheer volume of new research taking place, it is nearly impossible for practicing physicians to keep up with the latest findings. This calls for automated knowledge discovery from research literature. Even though the problem is extremely difficult because of the complex and highly-specialized language, an AI-driven solution could lead to automatically keeping all medical databases updated with state-of-the-art knowledge. Such databases can be combined with clinical notes to improve diagnostics and patient care. Arguably, the most important knowledge in clinical practice is the understanding of the relation between a drug and a disease or symptom. This can be broadly understood in terms of whether a drug is beneficial or harmful for a patient.

As part of his doctoral thesis, Banerjee designed a global inference system (modeled as a linear programming optimization) based on the pharmacodynamic similarities between drugs. In the prototype AI system, he demonstrated that even if the system has no prior knowledge of newly studied drug classes, it can identify these drugs as being potentially beneficial for specific treatments. For example, for type-2 diabetes patients, this system identified the class "sodium glucose co-transporter 2" as a potentially beneficial drug, in spite of no drug from this category being known to the system in its training phase. The key novelty in this work was that the similarity between medical entities was computed based on their pharmacological actions, which were carefully extracted from a vast amount of research literature using state-of-the-art NLP methods (Banerjee 2015).

Research Group

Research Products

[Banerjee 2015]
  • Ritwik Banerjee. Knowledge Extraction from Diverse Biomedical Corpora with Applications in Healthcare: Bridging the Translational Gap. Ph.D. Thesis, Stony Brook University. 2015. [ PDF ]