Latest software developments in cryo-EM

Conference: 2021: 71st ACA Annual Meeting
08/02/2021: 12:00 PM - 3:00 PM
Oral Session 


This session will highlight recent developments in cryo-EM software for image processing and structure analysis, including topics such as: on-the-fly processing pipelines; higher resolution single-particle and tomographic structure determination; heterogeneity analysis; 3D reconstruction in situ; molecular modeling from density maps; and micro-electron diffraction for crystallographic structure determination.


Welcome & Opening Remarks

Advances in heterogeneous reconstruction with cryoDRGN

Technological advances in cryo-electron microscopy (cryo-EM) have produced new opportunities to study the structural heterogeneity and dynamics of macromolecular complexes. However, this structural heterogeneity complicates 3D reconstruction and is traditionally addressed with discrete clustering approaches that fail to capture the full range of biomolecular dynamics. In this talk, I will overview cryoDRGN, a heterogeneous reconstruction algorithm that leverages the expressive representation power of deep neural networks to reconstruct continuous distributions of cryo-EM density maps. Trained on single particle cryo-EM images, cryoDRGN is capable of reconstructing complex distributions including both discrete compositional and continuous conformational changes. The openly available cryoDRGN software contains automated and interactive tools to inspect the volume ensemble, segment the dataset for further refinement in traditional tools, and filter impurities from datasets. Through a series of vignettes, I will highlight advantages and disadvantages of this approach, extensions, and new opportunities to extract the full spectrum of functionally relevant macromolecular states with single particle cryo-EM. 

View Abstract 766


Ellen Zhong, MIT Cambridge, MA 

Additional Author(s)

Bonnie Berger, MIT Cambridge, MA 
Joey Davis, MIT Cambridge, MA 

Singular value decomposition (SVD) of particle movements for motion analysis in cryoEM movies

Singular value decomposition (SVD) is an efficient method that can be used to find patterns in data. The motions observed in cryoEM movies can be decomposed with SVD, because alignment of multiframes in movies collected in cryoEM SPR provides natural vectorization. The SVD components obtained from such SVD informs how many types of motion are present in a particular experiment.
We developed and implemented efficient SVD decomposition to map the motion features in space and time. SVD can be used as a guided restraint allowing larger motions at the start and dampening them at the later stages. Another use is to filter data for excessive and unusual type of motions, e.g. ice layers collapsing, to allow for automatic data selection for subsequent steps of structure solution. Finally, SVD provides an unbiased, comprehensive, and dataset-specific estimate of the magnitude and character of the largest initial motions driven by ice expansion and bulging. We observed that these highly detrimental initial motions depend on sample features, e.g. particle density, buffer components, and imaging conditions. This approach significantly simplifies motion analysis and make the interpretation more objective.
We will present the results of this analysis for selected cryoEM SPR reconstructions. 

View Abstract 711


Raquel Bromberg, Ligo Analytics Dallas, TX 

Additional Author(s)

Dominika Borek, UT Southwestern Medical Center Dallas, TX 
Zbyszek Otwinowski, UT Southwestern Dallas, TX 

PySeg in Scipion: making easier template-free detection and classification of membrane-bound complexes in cryo-electron tomograms.

The cellular environment is characterized by the presence of many different molecular complexes, stable or transient, which underlie critical cellular functions. Cryo-electron tomography is uniquely suited to high-resolution, direct three-dimensional imaging of unperturbed cellular environments. The opensource software package PySeg enables a comprehensive analysis of cryo-tomograms for template-free detection and unsupervised classification of heterogeneous membrane-bound molecular complexes. PySeg has proved fundamental for analyzing membranous organelles with heterogenous and/or sparse composition of complexes such as endoplasmic reticulum and synapse. However, PySeg is a package of python functions and scripts, consequently its adequate usage requires certain programming skills from users. Here, we present the integration of PySeg as plug-in for Scipion, a well-known image processing framework for electron microscopy, which facilitates input parametrization, results visualization, traceability, and communication with other software packages. 

View Abstract 760


Antonio Martinez-Sanchez, University of Oviedo Oviedo

Coffee Break

Beam image-shift accelerated data acquisition for near-atomic resolution single-particle cryo-electron tomography

Single-particle Cryo electron microscopy (SP Cryo-EM) has been the method of choice to obtain high-resolution structures by Cryo-EM because of its fast data acquisition schemes and well-developed image processing tools. Cryo electron tomography (Cryo-ET), is the method of choice of in situ imaging by acquiring multiple tilted projections of each area to reconstruct a 3D subvolume of each particle or structure. The need to compensate for errors in targeting introduced during mechanical navigation and tilting of the specimen significantly slows down tomographic data collection to a point where it would be too costly to acquire datasets large enough to achieve high-resolution reconstruction. Combined with the limited toolset for data processing, Cryo-ET cannot consistently reach high resolutions. Solving these limitations would bridge the gap between SP Cryo-EM and Cryo-ET and open the door to in situ structural biology

Here, we introduce BISECT (beam image-shift electron cryo-tomography) protocol for tilt-series acquisition that accelerate data collection speed by up to an order of magnitude. Like single-particle Cryo-EM, we achieve this by using beam-image shift to multiply the number of areas imaged at each stage position and iteratively correct the geometrical constraints during imaging to achieve high precision targeting at each area. Finally, by performing per-tilt astigmatic CTF estimation and data-driven exposure weighting, we improved final map resolution. The method was validated by determining the structure of a low molecular weight target (~300 kDa) at 3.6 Å resolution where density for individual side chains is clearly resolved. 

View Abstract 732


Jonathan Bouvette, NIEHS Durham, NC 

Additional Author(s)

Mario Borgnia, National Institutes of Environmental Health Sciences
Hsuan-Fu Liu, Duke University Durham, NC 
Alberto Bartesaghi, Duke University Durham, NC 
Roel Schaaper, NIEHS Durham, NC 
Bradley Klemm, NIEHS Durham, NC 
Andrew Sikkema, NIEHS Durham, NC 
Juliana Da Fonseca Rezende E Mello, NIEHS Durham, NC 
Ye Zhou, Duke University Durham, NC 
Xiaochen Du, Duke University Durham, NC 
Rick Huang, Laboratory of Cell Biology/CCR/NCI/NIH Bethesda, MD 

Mapping atomic models back into cells - visual proteomics and in situ structure determination.

A new spin, on an old computer vision technique, template matching, enables us to use the high-resolution details provided by macromolecular models determined by MX, NMR or cryoEM to determine the location and orientation of macromolecules in images of frozen-hydrated cells. Compared to classic detection approaches, like 3D template matching in tomograms, the extra information used in our approach enhances the specificity of detection, raising the level of "surprise" one would have to measure a false-positive detection. Aside from providing a means to determine where (and when) a particular complex may be in a cell, the approach also enables us to "fish" out interacting partners. For example, we can search a cell with a limited subset of stable 50S ribosome proteins and RNAs, and reconstruct volumes that contain information not found in the template, like conformationally variable 30S components as well as other translational cofactors. Detection in situ is currently limited to targets that have a molecular mass of ~300-400 kDa. I will present work in our lab to reduce that mass limit by improving our radiation damage model as well as incorporating a more complete description of inelastic scattering in our forward model used to generate templates for the approach. 

View Abstract 741


Benjamin Himes Smithfield, RI 

Statistical estimation of spatially-resolved heterogeneity from cryo EM images

There are many methods for characterizing the heterogeneity of an ensemble of particles from single-particle cryo EM images. This talk concerns a method [1-3] based on describing the electron scattering intensity of the particle as a Fourier series and describing the coefficients of the Fourier series as random variables that are independent and identically distributed from instance to instance of the particle. The heterogeneity is characterized by estimating the mean and variance of the coefficients from the image data by a maximum likelihood estimator. The mean results give a reconstruction and the variance results give a spatially-resolved characterization of the heterogeneity of the ensemble of particles. When symmetry is present, the method can allow each instance of the particle to lack symmetry while imposing the symmetry on the statistics of the particle. This avoids anomalous peaks in the variance map located on and near symmetry axes of the particle [4]. Imposing symmetry on the statistics also allows the computation of ensemble averages of the product of the electron scattering intensity at two different locations which can be used to detect allosteric interactions between the different locations. We demonstrate the method on the bacteriophage HK97 where we show that binding of the maturation protease on the inner surface of the capsid has wide-ranging effects on the heterogeneity of the outer surface of the capsid.

[1] Y. Gong et. al., J. Structural Biology, 193(3):188-195, March 2016.
[2] N. Xu et. al., J. Structural Biology, 202(2):129-141, May 2018.
[3] N. Xu et. al., IEEE Trans. Image Processing, 28(11):5479-5494, 2019.
[4] S. J. Ludtke, "Methods in Enzymology", 579:159-189, 2016. 

View Abstract 763


Peter Doerschuk, Cornell University Ithaca, NY 

Additional Author(s)

Yunye Gong, Cornell University Ithaca, NY 
Nan Xu, Cornell University Ithaca, NY 
John Johnson, The Scripps Research Institute La Jolla, CA 

Interrogating macromolecular complex assembly by systematically analyzing the composition of highly heterogeneous structural ensembles

Cryo-EM represents a unique and powerful opportunity to structurally characterize biomolecules at the single-particle level, and to draw biological insights from the heterogeneity observed within structural ensembles. Doing so, however, represents a significant computational challenge, and necessitates improved methods for studying extremely heterogeneous datasets. Here, we present an approach that combines our recently-published cryoDRGN method to reconstruct highly heterogeneous structural ensembles with a high-throughput compositional analysis that allows us to quantify the presence and absence of individual domains or whole proteins across hundreds-to-thousands of cryo-EM density maps. This analysis produces a highly interpretable representation of the compositional heterogeneity present within a dataset. Using this representation, we can identify cooperative and mutually-exclusive occupancy relationships between various subunits, extract subsets of particles for traditional high-resolution refinement, and define pathways of structural change including complex assembly. We have applied this approach to understand the role of a universally-conserved methyltransferase in biogenesis of the 30S ribosomal subunit. By comparing the structural ensembles observed in the presence and absence of this factor, we have uncovered that this factor performs a novel proof-reading role in ribosome assembly. In sum, this work establishes a framework for systematically interrogating compositionally heterogeneous structural ensembles produced by tools such as cryoDRGN, and it highlights the value of this framework in illuminating underlying biological mechanisms. 

View Abstract 762


Laurel Kinman, Massachusetts Institute of Technology Cambridge, MA 

Additional Author(s)

Jingyu Sun, Center for Structural Biology, McGill University Montreal, Quebec 
Joaquin Ortega, Center for Structural Biology, McGill University Montreal, Quebec 
Joey Davis, MIT Cambridge, MA 

Scipion for tomography: An expansion of Scipion software framework towards integration, reproducibility and validation in cryo-electron tomography.

As happened some years ago with cryoEM-SPA, image processing in cryoET is far from having its workflows well defined and providing a smooth user experience. One of the main reasons is the heterogeneity among the different software packages developed by different groups and focused on different steps of the data processing. Even more, file formats are far from being standardized. Scipion framework was originally developed for cryo-EM SPA, and it is currently being extended with a batch of tomography plugins (referred as ScipionTomo from now on), with the same purpose: allow the users to be focused on the data processing and analysis instead of having to deal with multiple software installation issues and the inconvenience of switching from one to another, converting metadata files, managing possible incompatibilities, scripting... ScipionTomo is developed by a collaborative multidisciplinary team composed of Scipion team engineers, structural biologists and some of the developers whose software packages have been integrated. The result is an extension that combines the acquired knowledge of developing Scipion, the close collaboration with other developers, and the on-demand design of functionalities requested by the final users. In this talk, the current state and some highlights of ScipionTomo are shown. It's expected to be released in the following months, including other differential features such as the currently under development functionalities of alignment and picking consensus to get a refined result using the goodness of the best software packages. 

View Abstract 769


Federico de Isidro Gómez, Spanish National Research Council Madrid

Additional Author(s)

Jorge Jiménez de la Morena, CNB-CSIC
Pablo Conesa, CNB-CSIC
David Herreros, CNB-CSIC
Estrella Fernández-Giménez, CNB-CSIC
Yunior C. Fonseca, CNB-CSIC
David Strelak, CNB-CSIC
José Javier Conesa, CNB-CSIC
Ana Cuervo, CNB-CSIC
Patricia Losana, CNB-CSIC
Carlos Óscar Sánchez-Sorzano, CNB-CSIC
José María Carazo, CNB-CSIC

Advances in modelling continuous heterogeneity from single particle cryo-EM data

Single particle cryo-EM excels in determining static structures of biological macromolecules such as proteins. However, many proteins are dynamic, with their motion inherently linked to their function. Recovering the continuous motion and detailed 3D structure of flexible proteins from cryo-EM data has remained an open challenge. In this talk, we describe two new algorithms that allow both motion and structure of flexible proteins to resolved from cryo-EM data.
First, 3D variability analysis (3DVA), an algorithm that fits a linear subspace model of conformational change to cryo-EM data at high resolution. 3DVA enables the resolution and visualization of detailed molecular motions of both large and small proteins, revealing new biological insight from single particle cryo-EM data. Experimental results demonstrate the ability of 3DVA to resolve multiple flexible motions of α-helices in the sub-50 kDa transmembrane domain of a GPCR complex, bending modes of a sodium ion channel, five types of symmetric and symmetry-breaking flexibility in a proteasome, large motions in a spliceosome complex, and discrete conformational states of a ribosome assembly. 3DVA is implemented in the cryoSPARC software package.
Second, 3D Flexible Refinement (3DFlex), a motion-based deep neural network model of continuous heterogeneity. 3DFlex directly exploits the knowledge that conformational variability of a protein is often the result of physical processes that transport density over space and tend to conserve mass and preserve local geometry. From 2D image data, the 3DFlex model jointly learns a single canonical 3D map, latent coordinate vectors that specify positions on the protein's conformational landscape, and a flow generator that, given a latent position as input, outputs a 3D deformation field. This deformation field convects the canonical map into appropriate conformations to explain experimental images. Applied to experimental data, 3DFlex learns non-rigid motion spanning several orders of magnitude while preserving high-resolution details of secondary structure elements. Further, 3DFlex resolves canonical maps that are improved relative to conventional refinement methods because particle images contribute to the maps coherently regardless of the conformation of the protein in the image. Together, the ability to obtain insight into motion in macromolecules, as well as the ability to resolve features that are usually lost in cryo-EM of flexible specimens, will provide new insight and allow new avenues of investigation into biomolecular structure and function. 

View Abstract 765


Ali Punjani, University of Toronto Toronto, ON 

Additional Author

David Fleet, University of Toronto Toronto, Ontario