
Selected Research
MBE: model-based enrichment estimation and prediction for differential sequencing data
Akosua Busia and Jennifer Listgarten
Genome Biology 2023
We introduce model-based enrichment (MBE) to overcome key shortcomings of current approaches to differential analysis using high-throughput sequencing data. MBE is based on sound theoretical principles, is easy to implement, and can trivially make use of advances in modern-day machine learning classification architectures or related innovations.
Read MoreOptimal trade-off control in machine learning-based library design, with application to adeno-associated virus for gene therapy
Danqing Zhu, David H. Brookes, Akosua Busia, Ana Carneiro, Clara Fannjiang, Galina Popova, David Shin, Kevin. C. Donohue, Edward F. Chang, Tomasz J. Nowakowski, Jennifer Listgarten, David. V. Schaffer
bioRxiv 2021
We develop and showcase a machine learning-based method for systematically designing more effective adeno-associated virus capsid libraries---ones that have broadly good packaging capabilities while being as diverse as possible. Such carefully-designed libraries stand to significantly increase the chance of success in engineering any property of interest.
Read MoreA view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
David H. Brookes, Akosua Busia, Clara Fannjiang, Kevin Murphy, Jennifer Listgarten
Genetic and Evolutionary Computation Conference (GECCO) 2020
We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs.
Read MoreA deep learning approach to pattern recognition for short DNA sequences
Akosua Busia, George E. Dahl, Clara Fannjiang, David H. Alexander, Elizabeth Dorfman, Ryan Poplin, Cory Y. McLean, Pi-Chuan Chang, Mark DePristo
bioRxiv 2019
Inferring properties of biological sequences--such as determining the species-of-origin of a DNA sequence or the function of an amino-acid sequence--is a core task in many bioinformatics applications. These tasks are often solved using string-matching to map query sequences to labeled database sequences or via Hidden Markov Model-like pattern matching. In the current work we describe and assess an deep learning approach which trains a deep neural network (DNN) to predict database-derived labels directly from query sequences. We demonstrate this DNN performs at state-of-the-art or above levels on a difficult, practically important problem: predicting species-of-origin from short reads of 16S ribosomal DNA.
Read MoreNext-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction
Akosua Busia, Navdeep Jaitly
Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017, Poster
Recently developed deep learning techniques have significantly improved the accuracy of various speech and image recognition systems. We adapt some of these techniques to create a chained convolutional architecture with next-step conditioning for improving performance on protein sequence prediction problems.
See More