GenTree, an integrated resource for analyzing the evolution and function of primate-specific coding genes
- Yi Shao1,2,14,
- Chunyan Chen1,2,14,
- Hao Shen3,15,
- Bin Z. He4,15,16,
- Daqi Yu1,2,
- Shuai Jiang5,6,
- Shilei Zhao2,7,
- Zhiqiang Gao2,8,
- Zhenglin Zhu9,
- Xi Chen10,11,
- Yan Fu2,8,
- Hua Chen2,7,12,
- Ge Gao5,6,
- Manyuan Long13 and
- Yong E. Zhang1,2,12
- 1Key Laboratory of Zoological Systematics and Evolution and State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China;
- 2University of Chinese Academy of Sciences, Beijing 100049, China;
- 3College of Computers, Hunan University of Technology, Zhuzhou Hunan 412007, China;
- 4FAS Center for Systems Biology and Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts 02138, USA;
- 5State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China;
- 6Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China;
- 7CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China;
- 8National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China;
- 9School of Life Sciences, Chongqing University, Chongqing 400044, China;
- 10Wuhan Institute of Biotechnology, Wuhan 430072, China;
- 11Medical Research Institute, Wuhan University, Wuhan 430072, China;
- 12CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China;
- 13Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637, USA
Abstract
The origination of new genes contributes to phenotypic evolution in humans. Two major challenges in the study of new genes are the inference of gene ages and annotation of their protein-coding potential. To tackle these challenges, we created GenTree, an integrated online database that compiles age inferences from three major methods together with functional genomic data for new genes. Genome-wide comparison of the age inference methods revealed that the synteny-based pipeline (SBP) is most suited for recently duplicated genes, whereas the protein-family–based methods are useful for ancient genes. For SBP-dated primate-specific protein-coding genes (PSGs), we performed manual evaluation based on published PSG lists and showed that SBP generated a conservative data set of PSGs by masking less reliable syntenic regions. After assessing the coding potential based on evolutionary constraint and peptide evidence from proteomic data, we curated a list of 254 PSGs with different levels of protein evidence. This list also includes 41 candidate misannotated pseudogenes that encode primate-specific short proteins. Coexpression analysis showed that PSGs are preferentially recruited into organs with rapidly evolving pathways such as spermatogenesis, immune response, mother–fetus interaction, and brain development. For brain development, primate-specific KRAB zinc-finger proteins (KZNFs) are specifically up-regulated in the mid-fetal stage, which may have contributed to the evolution of this critical stage. Altogether, hundreds of PSGs are either recruited to processes under strong selection pressure or to processes supporting an evolving novel organ.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.238733.118.
-
Freely available online through the Genome Research Open Access option.
- Received April 19, 2018.
- Accepted January 29, 2019.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











