Projects | SAGE Lab

Antibiotic resistance: We seek to build the next generation of machine learning models to interpret the genetic variation found in bacterial genomes. Genomic data can be represented as a string of letters – ACGT – but can also be represented by the information it encodes, such as protein sequences, protein three-dimensional structures, regulatory sequences, RNA sequences, and enzymes in biological pathways. We are seeking to augment our models of genomic data with additional information, to improve our ability to predict antibiotic resistance variants before they spread.

Dataset curation for machine learning on genomes: The ability to train, evaluate, and interpret foundation models of genomic data relies on the availability of diverse and meaningful tasks and data. Our lab is focusing on curating datasets of tasks related to bacterial genomics, and evaluating the ability of genomic foundation models to perform on these tasks.

ML-guided drug discovery: Based on our knowledge of the proteomes of pathogenic bacteria, can we determine which compounds will bind to and inhibit proteins, presenting possible novel antibiotics? Can we learn which chemical compounds will be permeable through bacterial membranes, and ultimately have bactericidal activity?

Collaborations:We are always looking for new collaborations with machine learning, computational biology, and molecular biology labs in academia and industry. Please reach out to discuss!