A somatic hypermutation–based machine learning model stratifies individuals with Crohn's disease and controls
- Modi Safra1,2,7,
- Lael Werner3,4,7,
- Ayelet Peres1,2,
- Pazit Polak1,2,
- Naomi Salamon5,
- Michael Schvimer6,
- Batia Weiss4,5,
- Iris Barshack4,6,
- Dror S. Shouval3,4,8 and
- Gur Yaari1,2,8
- 1The Alexander Kofkin Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel;
- 2Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002, Ramat Gan, Israel;
- 3Institute of Gastroenterology, Nutrition and Liver Diseases, Schneider Children's Medical Center of Israel, Petah Tikva 4920235, Israel;
- 4Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel;
- 5Pediatric Gastroenterology Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan 5262100, Israel;
- 6Institute of Pathology, Sheba Medical Center, Ramat Gan 5262100, Israel
Abstract
Crohn's disease (CD) is a chronic relapsing–remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276683.122.
-
Freely available online through the Genome Research Open Access option.
- Received February 13, 2022.
- Accepted December 7, 2022.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.











