A somatic hypermutation–based machine learning model stratifies individuals with Crohn's disease and controls

  1. Gur Yaari1,2,8
  1. 1The Alexander Kofkin Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel;
  2. 2Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002, Ramat Gan, Israel;
  3. 3Institute of Gastroenterology, Nutrition and Liver Diseases, Schneider Children's Medical Center of Israel, Petah Tikva 4920235, Israel;
  4. 4Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel;
  5. 5Pediatric Gastroenterology Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan 5262100, Israel;
  6. 6Institute of Pathology, Sheba Medical Center, Ramat Gan 5262100, Israel
  1. 7 These authors contributed equally to this work.

  2. 8 These authors contributed equally to this work.

  • Corresponding authors: dror.shouval{at}gmail.com, gur.yaari{at}biu.ac.il
  • Abstract

    Crohn's disease (CD) is a chronic relapsing–remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276683.122.

    • Freely available online through the Genome Research Open Access option.

    • Received February 13, 2022.
    • Accepted December 7, 2022.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server