RT Journal A1 Raychaudhuri, Soumya A1 Schütze, Hinrich A1 Altman, Russ B. T1 Using Text Analysis to Identify Functionally Coherent Gene Groups JF Genome Research JO Genome Research YR 2002 FD October 01 VO 12 IS 10 SP 1582 OP 1590 DO 10.1101/gr.116402 UL http://genome.cshlp.org/content/12/10/1582.abstract AB The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method,neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how “functionally coherent” the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.