Innovations in computational biology: RECOMB 2024 Special Issue

  1. Bonnie Berger2,3
  1. 1Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
  2. 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
  3. 3Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
  • Corresponding authors: jianma{at}cs.cmu.edu, bab{at}mit.edu
  • The 28th International Conference on Research in Computational Molecular Biology (RECOMB 2024) took place from April 29 to May 2, 2024, in Cambridge, Massachusetts, USA. This year, the conference received 352 full paper submissions, with 57 ultimately accepted after a rigorous peer-review process involving at least three reviewers per paper. Authors had the option to submit full-length papers or concise versions to the Conference proceedings while pursuing journal publication elsewhere. A subset of these papers was invited for further review at Genome Research, ultimately leading to the publication of this Special Issue.

    RECOMB is the leading international conference on algorithmic computational biology, bridging computational, mathematical, statistical, and biological sciences. It is held annually in the Spring and is paired with its typically adjoining satellite conferences. It provides a scientific forum for the cutting-edge theoretical advances in computational biology and their applications in molecular biology and medicine, emphasizing advancements in computational biology methodologies, ranging from algorithmic innovations to developments in AI and machine learning.

    Genome Research has been a leading journal for publishing high-impact studies on genome structure, function, and their roles in evolution, population variation, and disease. The partnership between RECOMB and Genome Research is crucial in bringing novel computational methodologies to a broader audience, fostering their application to significant genomic challenges. This collaboration exemplifies how new computational methods presented at RECOMB can drive discoveries in genomics and molecular biology as a whole.

    In this Special Issue of Genome Research, we introduce a diverse collection of 20 papers from RECOMB 2024. These include algorithmic innovations in genomic variation analysis, privacy-preserving algorithms, DNA structural properties, cancer genomics, transcriptomic studies, gene regulatory networks, biomolecular representation learning, and metagenomic data analysis, all of which reflect the field's rapid evolution. This year, two cross-cutting algorithmic themes have emerged in the field: (1) Clever sketching and summarization techniques, along with advanced data structures, to handle the massive data sets generated by modern genomic studies, including single-cell, metagenomic, and collaborative large-scale databases; we are seeing papers rise to this challenge. (2) More sophisticated probabilistic models, many inspired by recent advances in statistics and AI/ML, including large language models.

    The first five papers in this Special Issue focus on new algorithms for analyzing genomic variation. Chandra et al. (2024) introduce new algorithms for haplotype-aware sequence-to-pangenome graph alignment. Sens et al. (2024) present a novel framework to integrate multiple genetic risk factors to improve disease risk predictions. Jeong et al. (2024) develop SUM-RHE to estimate heritability using summary statistics, and Fu et al. (2024) introduce QuadKAST to detect epistasis using a scalable algorithm for large data sets. Burch et al. (2024) use matrix sketching to accelerate linear mixed model computations for genome-wide association studies.

    The importance of genomic privacy algorithms is highlighted in the next two papers. Hong et al. (2024) introduce SF-Relate, a secure federated algorithm for identifying genetic relatives across large, distributed genomic data sets while ensuring privacy. This paper received the “Best Student/Young Scientist Paper Award.” Goldenberg et al. (2024) present a new approach to improve both privacy and efficiency for inferring biological age from DNA methylation data.

    DNA structural properties are key to understanding genome function and aberrations. Yang et al. (2024) report SEM to estimate nucleosome positions and subtypes from MNase-seq data. Zhu et al. (2024) and Giurgiu et al. (2024) present algorithms, CoRAL and Decoil, respectively, that leverage long-read sequencing to resolve extrachromosomal DNA structure, with significant implications for cancer research.

    Transcriptome and gene regulatory networks remain an active area of research. Zahin et al. (2024) develop TERRACE to assemble full-length circular RNAs from RNA-seq data using a splice graph model. Schrod et al. (2024) introduce SpaCeNet to model both intra- and intercellular molecular interactions from spatial transcriptomics data. DIISCO, by Park et al. (2024), infers dynamic cell–cell interactions from single-cell RNA-seq data, and BONOBO, by Saha et al. (2024), estimates sample-specific gene regulatory networks.

    New methods for biomolecular representation learning are also featured in this issue. Lal et al. (2024) present regLM, which uses autoregressive language models to design synthetic cis-regulatory elements. Li et al. (2024) propose the ANDES framework for improving gene set similarity analysis using a new gene embedding approach. Additionally, Iovino et al. (2024) introduce a novel approach for protein similarity search based on protein domain embeddings, and Zeng et al. (2024) leverage parameter-efficient fine-tuning of large protein language models to enhance signal peptide prediction.

    Advancements in metagenomic sequencing analysis are presented in the last two papers. Şapcı and Mirarab (2024) develop KRANK to optimize memory usage for k-mer selection in large genomic reference libraries, and Azizpour et al. (2024) introduce GraSSRep, which classifies DNA sequences as repetitive or nonrepetitive using graph neural networks and self-supervised learning in metagenomic assembly graphs.

    We would like to thank the authors, reviewers, and the Genome Research editorial team, especially Executive Editor Dr. Hillary Sussman, for their efforts and support for the RECOMB–Genome Research partnership. We hope readers enjoy these excellent RECOMB 2024 papers and look forward to future submissions from the computational biology community to RECOMB in the coming years.

    Competing interest statement

    J.M. served as Program Chair for RECOMB 2024 and had access to earlier versions of all papers included in this Special Issue of Genome Research prior to publication. B.B. is the Chair of the Steering Committee for the RECOMB series of conferences and was Co-chair of the Organizing Committee for RECOMB 2024.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents

    Preprint Server