Dr. Faruck Morcos

Dr. Faruck Morcos is an Associate Professor in the Departments of Biological Sciences and Bioengineering (affiliate) and member of the Center for Systems Biology at UT Dallas. Dr. Morcos directs the Evolutionary Information Lab which focuses on solving problems at the interface between biology, computation, information theory and biological physics. Morcos lab develops methods to extract and analyze biological information from sequence and genomic data to create models for molecular evolution, protein structure, function, and design as well as to characterize bimolecular interactions. Recently, Morcos Lab works on machine learning approaches to study protein sequence space;  as well as predict the effects of mutations in protein families to design chimeric repressors,  biosensors and to discover sequence variants  that could restore the effect of deleterious mutations in disease related proteins

Latent Generative Landscapes of protein families as roadmaps for protein functional diversity, evolution and design

Characterizing the enormous space of protein sequences is an arduous task given the complex nature of the rules determining functional effects of sequence variability. The current availability of sequencing data, high throughput experiments and novel algorithms to model the joint probability of sequence composition are shifting the panorama from an intractable problem to the ability to not only characterize but also generate novel functional sequences. One example of these learning models is Variational Autoencoders (VAEs), when applied to sequence data can be useful to classify members of a protein family and generate diverse members of a given family by still satisfying higher order statistics of the training data. In our current work, we evaluate the underlying latent manifold of VAEs in which sequence information is embedded. We utilize an amino-acid Potts model like direct coupling analysis (DCA) and its sequence Hamiltonian to investigate the properties of the latent manifold. Together they constitute what we call a latent generative landscape (LGL). LGLs can be used to determine phylogenetic groupings as well as functional and fitness properties of distinct systems including the Globin family, β-lactamases, ion channels, and transcription factors. LGLs provide a guide to uncover the effects of sequence variability observed in experimental data and of directed and natural evolution of proteins. We showcase applications for protein engineering and design by combining the generative properties and functional predictive power of variational autoencoders and coevolutionary analysis.