Job title
Black Leaders in Cancer PhD Programme (Ke Yuan & Crispin Miller)
Job reference
REQ00368
Date posted
21/10/2024
Application closing date
25/11/2024
Location
Glasgow
Salary
Stipend - £21,000 per annum
Package
All tuition fees will be covered
Contractual hours
Blank
Basis
Blank
Job category/type
PhD Students
Attachments
Blank
Job description
Novel Large Language Models for Biological Sequences and TheirInteractions
Background
The identification of genomic alterations driving cancer progression has historically relied on identifying recurrent mutations across patient cohorts. While some driver mutations are well-characterized due to their high recurrence, the vast majority of mutations are rare, shared by only a few patients. This limits the scope for functional studies and potential therapeutic targets. Recent advances in large language models (LLMs) trained from vast biological sequences offer a promising solution by enabling a deeper understanding of the structural and functional consequences of genomic alterations in DNA, RNA, and protein sequences. By leveraging LLMs, we can model the effects of even rare mutations in cancer with unprecedented detail and precision.
This project aims to train and utilize LLMs to explore the impact of genomic mutations, specifically focusing on protein sequences, RNA sequences, and their interactions. These models will provide novel insights into the consequences of genomic alterations and pave the way for improved understanding and therapeutic targeting in cancer biology.
Research Question
Objective 1: Building LLMs for Protein Sequences and Protein-Protein Interactions Mutations in protein sequences can have significant functional consequences, particularly in the context of protein-protein interactions. Current LLMs are trained on single protein sequences, limiting their ability to model these interactions. In this objective, we aim to build a novel protein language model specifically designed to learn and predict protein-protein interactions. This model will be applied to both mouse and patient data, enabling the study of mutations in the context of protein interaction networks.
Objective 2: Building LLMs for RNA Sequences and Protein-RNA Interactions Mutations in RNA sequences, including those in non-coding regions, have profound effects. In collaboration with RNA-focused research groups, we will develop LLMs to model the interaction between RNA and proteins, as well as the functional consequences of mutations. With an existing RNA model and a wealth of data for fine-tuning and validation, we will explore how these interactions influence gene regulation and cancer progression.
Objective 3: Fine-Tuning LLMs for Mitochondrial Genomes Mitochondrial mutations are increasingly recognized for their role in cancer and other diseases. Glasgow's ongoing initiative in mitochondrial genomics provides an ideal opportunity to fine-tune LLMs for mitochondrial DNA. These models will enhance our ability to assess the functional consequences of mitochondrial mutations and could potentially inform the design of genome- editing tools targeting the mitochondrial genome.
Skills/techniques that will be gained