MSc Thesis Proposal by: Siyam Sajnan Chowdhury

Investigating Large Language Model Embeddings for Predicting High-Frequency Drug Side Effects-MSc Thesis Proposal by Siyam Sajnan Chowdhury

By School of Computer Science

Date and time

Starts on Thursday, May 9 · 11:30am EDT

Location

401 Sunset Ave

401 Sunset Avenue Windsor, ON N9B 3P4 Canada

About this event

The School of Computer Science is pleased to present…

Investigating Large Language Model Embeddings for Predicting High-Frequency Drug Side Effects

MSc Thesis Proposal by: Siyam Sajnan Chowdhury


Date: Thursday, 09 May 2024

Time: 11:30 am – 1:00 pm

Location: Essex Hall, Room 122


Abstract:

Large language models brought about a paradigm shift in the domain of natural language processing, characterized by their large scale, deep architectures, and pre-training on massive amounts of data, enabling them to learn rich and nuanced representations of language. They have demonstrated impressive performance in natural language understanding tasks across different domains. This research aims to investigate large language models' performance in the drug side effect frequency prediction domain. Our methodology uses Galeano's dataset, a standard benchmark dataset for drug side-effect frequency prediction. We used ChemBERTa, a large language model based on the BERT architecture, to embed the chemical structure of the drugs. We used SimCSE, another large language model based on BERT incorporating contrastive learning, to embed the side effects. Utilizing these embeddings, we predicted the high-frequency side effects using a deep learning model. Measuring the frequency of the side effects can help determine the therapeutic efficacy of a drug in clinical settings and help weigh the potential risks and benefits of certain drugs. The key objective of this research is to look into the performance of various large language models for predicting the frequencies of drug side effects.


Keywords: Large Language Models, ChemBERTa, SimCSE, Drug Side Effect Frequency


Thesis Committee:

Internal Reader: Dr. Dan Wu

External Reader: Dr. Andrew Swan

Advisor: Dr. Alioune Ngom

Organized by

Sales Ended