Understanding human language plays a pivotal role in creating intelligent systems. With that in view, Banerjee's research spans multiple areas that bring together machine learning (ML) and natural language processing (NLP): biomedical knowledge discovery for better healthcare, misinformation analysis, and linguistics for security and privacy.
Language use varies a lot depending on the what (content), why (intent), who (speaker/writer and audience), and how (style). Natural language understanding can be improved based on these insights, and intelligent systems built using a deeper understanding of human language can be employed for immense social good. The language used in medical research, for instance, is highly specialized as it is meant for technical comprehension by other researchers in that field; but a system capable of understanding it, and extracting useful information from it, can help healthcare practitioners and patients. Quotidian language use, on the other hand, is dictated by various aspects of individual and collective human behavior. An intelligent system can be refined to provide better help to individuals and/or social groups by interpretation of (and inference from) language.
Prospective students please see the section at the bottom of this page!
Current Research Projects
Privacy Compliance of Live Medical Data
Developers face hurdles understanding complex privacy laws in handling medical data. Our project seeks to simplify legal jargon, aiding app design for data safety. We aim to create tools offering transparency and user control in health apps, closing the knowledge gap for a secure digital health landscape.
(Project page under development)
Information Extraction in Clinical Nephrology
Chronic Kidney Disease (CKD) poses significant health risks, necessitating the identification of new early-stage risk factors. Major depression and post-traumatic stress disorder (PTSD), often linked, may relate to CKD progression. The study uses supervised machine learning and natural language processing based on clinical notes and radiology reports to predict PTSD's connection to CKD.
(Project page under development)
Fallacious Argumentation in Information Disorder
The study targets recognizing subtle misinformation that propagates through manipulative argumentation in the language used in contemporary information ecosystems. It aims to develop computational models detecting such manipulative and fallacious argumentation, expanding beyond explicit falsehoods to address epistemic corruption.
(Project page under development)
Past Research Projects
Tracking Semantic Change in Medical Information
This project created algorithms and datasets to distinguish unchanged medical information from their distorted versions circulating online. We achieve this by identifying semantic differences in the information found in news articles compared to the corresponding information in original scientific publications.
Extraction & Classification of Financial Information
A multi-year project focused on fine-grained document-type classification and information extraction from complex financial texts applied to scalable automation of a broad range of tasks.
(Technical details of this research are proprietary)
Semantic Similarity of Clinical Texts
Hospitals amass crucial textual data for healthcare, often in disorganized forms within Electronic Health Record (EHR) systems. Measuring semantic similarity between clinical texts (STS) mitigates the associated problems by streamlining data, reducing redundancy, while preserving valuable information and highlighting new information.
Literature-based Medical Knowledge Discovery
This research designed an AI-driven solution, and developed a prototype system, to automate updating medical databases with new findings. It uses pharmacodynamic similarities between drugs to identify potentially beneficial drugs and drug categories for specific diseases, symptoms, or syndromes, despite lacking prior knowledge of such drugs during the model's training.
Deception in Language
We think of language as a medium of conveying information, but unfortunately, it is also often used to deceive. This research explored the detection of such deception in online reviews using deep interpretable linguistic properties.
Healthcare faces a challenge in utilizing patient-specific data for personalized medicine. The AI developed in this research extracts relevant patient details from diverse data sources, facilitating tailored lab tests, detecting drug reactions, and linking symptoms to safe medications.
Network Outage Analysis
A collaboration with researchers in computer networks to extract network outage information and develop supervised machine learning approaches to categorize them into multiple causal categories.
If you are not a student at Stony Brook University, and want to join Banerjee's research team, please understand that
direct application for any position in Banerjee's research group is not currently possible for candidates outside Stony Brook University, and
temporary or short-term research positions cannot be accommodated.
Banerjee may be unable to respond to individual emails or queries regarding research positions. If you are interested in joining his research team as part of your M.S. or Ph.D., please apply to one of the following graduate programs offered by the Department of Computer Science at Stony Brook University:
Students at Stony Brook University
If you are a current Ph.D. student in the Department of Computer Science at Stony Brook University, and have a strong background in
mathematics and/or statistics, and
programming (mainly in Python, and particularly in the use of modern machine learning libraries like PyTorch or Keras),
feel free to contact Banerjee via email about potential opportunities within his research group.
Ph.D. students are expected to have the ability to formulate a research problem that they want to pursue. Please be prepared to start your journey by presenting a convincing proposal for your research.
Research opportunities in the form of CSE 487 or CSE 495/496 exist for exceptional B.S. students. Outstanding performance in coursework relevant to machine learning is a prerequisite (ideally, in CSE 353 and/or CSE 354).
If you are a current undergraduate student of computer science or a closely-related area, feel free to contact Banerjee via email about these options.
At the end of the project/thesis, the goal is that your work should be deemed publishable at a peer-reviewed research conference or journal. If you have made significant contribution, you will be a co-author in the published work (subject to acceptance at a reputable venue).
If you are a student in the MS or 5-yr BS/MS program in the computer science department, and want to work with Banerjee for your advanced graduate project (CSE 523/524) or your graduate thesis (CSE 599), contact him via email with the following in mind:
A successful applicant will typically have good grades, a strong programming background in Python with knowledge of version control, and a good understanding of machine learning fundamentals. Experience with libraries like PyTorch or Keras is a strong plus.
As prerequisites, the graduate machine learning and/or natural language processing courses are strongly recommended. This is an important indicator of your area of interest. If contacted, please be prepared for an initial technical interview on NLP/ML concepts. If your interview is satisfactory, you will be put into a specific project team, with a Ph.D. student leading the project. Banerjee and his team are dedicated to their research, and expectations from the team members include:
Approximately 10-12 hours of diligent work on a weekly basis.
Attending one weekly research group meeting. Individual and project progress will be discussed and assessed in these weekly meetings. They will also often include discussions and presentations of research papers.
Attending at least one weekly internal meeting with your project team. These meetings will be chaired by the lead Ph.D. student, and the specifics will be dictated by the project requirements.
At the end of the project/thesis, the work is expected to be of a caliber that is publishable at a peer-reviewed research conference or journal. If you have made significant contribution, you will be a co-author in the published work (subject to acceptance at a reputable venue).