Translational genomics: The challenge of developing cancer biomarkers

  1. James D. Brooks1
  1. Department of Urology, Stanford University, Stanford, California 94305, USA

    Abstract

    Early detection and definitive treatment of cancer have been shown to decrease death and suffering in epidemiologic and intervention studies. Application of genomic approaches to many malignancies has produced thousands of candidate biomarkers for detection and prognostication, yet very few have become established in clinical practice. Fundamental issues related to tumor heterogeneity, cancer progression, natural history, and biomarker performance have provided challenges to biomarker development. Technical issues in biomarker assay detection limits, specificity, clinical deployment, and regulation have also slowed progress. The recent emergence of biomarkers and molecular imaging strategies for treatment selection and monitoring demonstrates the promise of cancer biomarkers. Organized efforts by interdisciplinary teams will spur progress in cancer diagnostics.

    Since the war on cancer was declared in 1972, cancer death rates, after rising for several decades, have begun to slowly decline (Siegel et al. 2011). This drop can be attributed to preventive efforts (e.g., smoking cessation), improved treatments for advanced disease, and early detection and treatment of localized cancers. Future progress in prevention and treatment of advanced disease requires a fundamental understanding of the underlying causes and mechanisms of cancer. Early detection, on the other hand, can be agnostic as to cause, and merely requires a method (imaging, cell collection, measurement of a bioanalyte) that correlates with a disease state, followed by the application of localized treatments (surgery, radiation, or tissue ablation) that have been developed and refined over the past century. Randomized trials of early detection and definitive local treatment have demonstrated improved survival for breast, colon, prostate, and lung cancers (Glass et al. 2007; Levin et al. 2008; Schroder et al. 2009; National Lung Screening Trial Research Team 2011). The results from these trials suggest that development of cancer diagnostic biomarkers is a desirable and attainable goal in the struggle to decrease deaths from cancer. Yet, why has progress been so slow, and where are the new cancer diagnostic biomarkers? Nearly all of the cancer biomarkers currently used in the clinic, such as prostate specific antigen (KLK3) or PSA (prostate cancer), ERBB2 (breast cancer), MUC16 (also known as CA-125) (ovarian cancer), alpha-fetoprotein (AFP), and beta-human chorionic gonadotropin (CGB) (testicular cancer), were discovered serendipitously. The advent of discovery-based approaches, such as array-based detection of gene expression and proteomic approaches using mass spectrometry, seemingly opened a fire hose of candidate cancer biomarkers over the last decade (McDermott et al. 2011). Relatively complete cataloguing of the genomic alterations for most major malignancies is now under way using rapidly evolving ultra-high-throughput sequencing approaches in large international consortia including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium. Therefore, there should be no shortage of new biomarkers available for clinical translation. A search of the literature for the term “cancer biomarker” results in thousands of candidates, and many have been tested on clinical samples and show some utility. There the story usually stops (Ludwig and Weinstein 2005; Ioannidis and Panagiotou 2011). In fact, the number of biomarkers approved by the FDA each year for clinical use is in the single digits. This drop-off rate shows that the development of biomarkers is every bit as difficult as the development and approval of a new drug.

    Features of biomarkers that affect their performance

    Many factors contribute to the failure of candidate biomarkers to realize clinical utility. The ideal cancer detection biomarker should be found uniquely in the malignant tissue and should generate a positive signal that can be measured without confounding noise from normal tissues or other non-malignant pathologies. PSA, for instance, has been criticized because it can be elevated by a variety of pathologies, making the positive predictive value (the chance that a man with an elevated PSA has prostate cancer) quite low, at 20%–40%. Therefore, many men undergo unnecessary biopsies with attendant side effects. An ideal test would limit false-positive tests, while enriching for cancers that are more likely to be lethal.

    While gene expression profiling has revealed hundreds or even thousands of genes expressed at higher levels in malignant compared with benign tissues, virtually no transcripts or proteins have been identified that are uniquely elevated in cancer. Many candidate biomarkers belong to pathways intrinsic to normal cells and tissues, such as those mediating proliferation, apoptosis, differentiation, angiogenesis, cell death, and inflammation (Hanahan and Weinberg 2011). Some biomarkers have failed because their cognate protein levels, which are the preferred analyte for most clinical assays, do not correlate with transcript levels. Other candidate transcripts or proteins show only a relative increase in expression in cancer compared with normal tissue and therefore fail as biomarkers because low-level expression from the parent normal tissue, from other organ sites, or from non-malignant pathologies effectively drowns out the signal from the malignancy. Candidate biomarkers expressed in the nucleus or cytoplasm are not accessible to clinical assays since most biomarkers currently in use are cell surface or secreted proteins.

    One proposed way to limit false-positive tests is to limit screening to a high-risk population based on identified germline “high-risk” alleles. Unfortunately, large genome-wide association studies in several malignancies have thus far identified alleles that confer only small, clinically insignificant increases in cancer risk (Garcia-Closas et al. 2008; Kiemeney et al. 2008; Zheng et al. 2008; Song et al. 2009). Assays of these SNPs either singly or in groups often contribute little to risk assessment beyond asking a patient whether they have a family history for a particular malignancy (Zheng et al. 2008). Of course, there are notable exceptions, including BRCA1 and BRCA2, VHL, TP53, and other germline mutations. But as a class, these hereditary cancer syndromes constitute only a small fraction of patients in a population and usually occur in clinically recognizable familial clusters. Therefore, tests that assay either rare or low-risk alleles are not useful for screening a population.

    Somatically acquired alterations in cancer DNA show potential for detection since they are unique to the cancer and detectable with available technologies. In serous ovarian cancer, for example, TCGA has recently reported that 96% of cancers possess a mutated TP53 gene (The Cancer Genome Atlas Research Network 2011). Point mutations in specific oncogenes have been identified in many other malignancies, such as KRAS mutations in colon cancer, and diagnostic tests have been designed to detect these mutations non-invasively (Dong et al. 2001). Cancer-specific hypermethylation of cytidine residues in CpG islands is common, and diagnostic tests to detect methylation at specific loci have been developed for several malignancies (Hoque et al. 2004, 2005, 2006; Topaloglu et al. 2004; Chen et al. 2005). Recent genome-wide methylation studies suggest that there may be many such methylation events and many biomarker candidates (Houshdaran et al. 2010; Kobayashi et al. 2011). Fusion transcripts, such as TMPRSS2-ERG in prostate cancer, represent another promising source of cancer-specific biomarkers (Tomlins et al. 2005). A urine-based test for TMPRSS2-ERG transcripts has been developed and is currently under evaluation in several patient cohorts (Salami et al. 2011).

    While these and other avenues are promising, comprehensive analyses of cancer genomes have revealed a great deal of heterogeneity in the spectrum of mutations and structural alterations within cancers of a single histologic type. Ovarian cancers show startling variation in DNA structural changes between tumors at more than 100 loci throughout the genome (The Cancer Genome Atlas Research Network 2011). In glioblastomas, genomic alterations are restricted to a few discrete pathways, yet the frequency of alterations for any single member of those pathways is relatively low, making it difficult to design strategies for detection (The Cancer Genome Atlas Research Network 2008). In prostate cancer, only half of tumors have TMPRSS2-ERG fusion transcripts, and no frequently occurring point mutations have been identified (Tomlins et al. 2005; Berger et al. 2011). Therefore, in virtually all malignancies, assays of any single structural alteration or mutation will identify only a fraction of prevalent cancers, necessitating development of multiplex assays to interrogate entire pathways or many chromosome loci. Even in cases in which a single gene is inactivated by mutation, such as TP53 in ovarian cancer, cost-effective assays will have to be developed for detection of the spectrum of mutations occurring in the gene. Furthermore, these detection strategies also must allow sensitive detection of mutated sequences against a background of wild-type sequences that are found in any clinical sample (e.g., blood or urine).

    Biomarker challenges in the context of cancer biology

    To be effective, a screening strategy must detect malignant cells that are destined to grow, metastasize, and cause death. Unfortunately, little is known about the steps that lead transformed cells to become malignant and ultimately lethal, and this has major implications for biomarker performance. Cancers are complex tissues composed of many cell types. It is possible (and likely) that features of the host, such as the innate immune response to the malignancy, interactions of the malignant cells with the surrounding stroma, or stochastic factors that are not captured by any biomarker, are important in the progression of early lesions (Chaffer and Weinberg 2011). Therefore, focusing on mutations or structural alterations within the malignant cells alone will be of limited utility in predicting biological and clinical behavior. Cancers possess heterogeneous populations of malignant cells including small populations of cancer stem cells that might be the source of metastatic cells (Reya et al. 2001). Biomarkers developed against the bulk mass of the tumor could miss the attributes of the stem cells that ultimately determine the clinical course of a malignancy. In addition, many biomarkers fail because most malignancies display genomic instability and require multiple genetic hits to become metastatic (Liu et al. 2009; Shah et al. 2009; Stephens et al. 2011). Measurement of a biomarker at a particular time might not predict acquisition of those future genetic alterations that are the product of this underlying genomic instability.

    Performance of a diagnostic biomarker will be influenced by the natural history of the malignancy. Many cancers take years or even decades before they progress to lethality, suggesting that there might be a large window of opportunity for detection and eradication of transformed cells. In pancreatic cancer, for instance, high-throughput sequencing of primary and metastatic tumors suggests that 15–20 years transpire between initiation and cancer death (Yachida et al. 2010). In breast and prostate cancer, that window might be even longer (Liu et al. 2009; Shah et al. 2009). In current clinical practice using standard histology, however, that window is much shorter. For example, in serous ovarian cancer, there appears to be a 4-yr window from the moment cancer cells become histologically identifiable until they become metastatic (Brown and Palmer 2009). While this window appears to provide ample time for detection, the average size of these cancers at the time of metastasis is 9 mm in diameter. More importantly, to produce a 50% decrease in ovarian cancer mortality, it is estimated that tumors would need to be detected when they are 5 mm in diameter. Even if an ovarian cancer-specific protein is identified (one has not), detection of that protein from a 5-mm tumor diluted in a 5-L blood volume of a 60-kg woman is well beyond the sensitivity of currently available technologies.

    Further complicating cancer detection strategies are genetic and histologic changes that occur with aging. Autopsy studies show histologically identifiable cancer precursor lesions (dysplasias and frank neoplasias) in a high proportion of the population and in many organ sites (Henson and Albores-Saavedra 2001). This leads to a fundamental problem in cancer detection: If a biomarker is able to detect these initiated lesions of which only a fraction will progress, how does one sort out which lesions need to be treated and which can be safely ignored? Inability to predict future cancer behavior (or prognosis) will inevitably lead to overtreatment. For prostate cancer, data from the European Randomized Study of Screening for Prostate Cancer suggested that 48 PSA screen-detected men must undergo surgery to save one man's life 9 yr after treatment (Schroder et al. 2009). Since prostate cancer treatments have life-altering consequences and financial costs, the U.S. Preventive Task Force has recently recommended against PSA screening since it leads to overtreatment of indolent prostate cancers (Sanda et al. 2008; Barry 2009). Undoubtedly, overdetection and overtreatment will occur in other malignancies as tests to detect early disease are developed (Bach et al. 2007). Therefore, successful development of a biomarker for cancer detection must be coupled with development of biomarkers of prognosis to avoid overtreatment of clinically indolent cancers.

    Challenges in developing and commercializing biomarker assays

    Until recently, measurement of protein biomarkers has been constrained by the limits of detection of ELISA assays (usually in the nanogram per milliliter range) or by the signal that can be generated with imaging approaches that exploit metabolic pathways (such as 18F-FDG-PET) or use affinity reagents such as tagged antibodies (Gao et al. 2011). Several clever strategies using microfluidic methods and nanofabrication have been devised for affinity detection, often with several orders of magnitude increases in sensitivity (Gaster et al. 2009). However, both imaging and nanodetection strategies rely on high-quality affinity reagents–predominantly antibodies. Unfortunately, clinical translation of many candidate biomarkers is often stalled by the lack of well-characterized, highly specific monoclonal antibodies. Several biomarker assays are based on the detection of nucleic acids, such as the PCA3 test for prostate cancer (which entails detection of a non-coding RNA in the voided urine) and the Oncotype DX test (which assays a panel of transcripts in breast cancer tissue samples) (Lee et al. 2011; Tang et al. 2011). Each of these tests requires special handling of samples because of endogenous nucleases. These special handling procedures can limit dissemination of biomarker assays because practice patterns do not accommodate them.

    Economic and business considerations can slow cancer detection biomarker development. Many of the diagnostic and prognostic markers that have been reported in the literature are in the public domain and lack intellectual property protection. Companies have shied away from developing clinical tests absent those protections. Even in cases in which an assay is protected, companies face challenges in developing a clinical-grade assay with performance characteristics in reproducibility and accuracy that far exceed academic research standards. Assays have to meet stringent performance criteria regulated by the Clinical Laboratory Improvement Amendment (CLIA) under the Centers for Disease Control (CDC). FDA approval of a clinical diagnostic requires scrutiny of the assay and demonstration of effectiveness in affecting clinical decision-making in ways that favorably impact health. With the advent of evidence-based medicine, the standards for efficacy and approval have become more stringent. Cancer detection approaches need to show improved patient outcomes in morbidity or mortality, necessitating large, costly randomized clinical trials. The difficulty in designing, carrying out, and funding these types of trials has been highlighted in the failure of recent studies of “established” screening strategies (PSA screening and mammograms) to show patient benefit (Andriole et al. 2009; Gotzsche and Nielsen 2011). To complicate matters, rules governing traditional clinical assays are in the process of being rewritten as new, multiplexed assays are being brought forward for approval, leading to additional delays that can be costly for industry.

    Despite these challenges, favorable markets have emerged for several cancer biomarkers. Most of these new biomarkers target treatment selection, and most require tumor biopsy samples in order to be performed. Biomarker assays that predict tumor aggressiveness have been developed for breast and colon cancers (Oncotype DX and MammoPrint) and are being used in clinical practice to select patients for adjuvant chemotherapy in early-stage disease. Sequencing for EGFR mutations in lung cancer identifies a relatively small set of cases (5% of lung adenocarcinomas) that will respond (often dramatically) to the EGFR inhibitor gefitinib (Iressa) (Paez et al. 2004). In fact, this discovery rescued gefitinib after clinical trials showed a lack of effectiveness in unselected lung cancer patients and has demonstrated the utility of using biomarkers in new drug development (Giaccone et al. 2004; Herbst et al. 2004; Wang et al. 2011). By using biomarkers for rational selection of patients for cancer clinical trials, companies improve their chances of success and thereby speed the drug approval process. In patients with Gastrointestinal Stromal Tumors (GISTs), response to therapy can be interrogated within days after administration of imatinib (Gleevec) by using 18F-FDG-PET imaging (Gayed et al. 2004). Many molecular imaging approaches are now under investigation for similar uses in other cancers. Use of molecular imaging or biomarkers to show lack of treatment efficacy allows patients and physicians to quickly abandon futile therapies in favor of other potentially effective treatments. While the clinical goals and measurement approaches for these types of trials differ in many ways from those required for early detection, successful application of molecular imaging and biomarkers for treatment selection and monitoring will lay important groundwork for future progress in cancer detection.

    Clinical considerations in biomarker development

    Discovery-based genomic studies have relied on samples of convenience—namely, tissue samples that have been banked from surgeries on individuals with relatively advanced disease. For example, virtually all of the serous ovarian cancer samples analyzed in TCGA were harvested from women with advanced (Stages III and IV) tumors (The Cancer Genome Atlas Research Network 2011). The number of genomic changes in these advanced cancers is extraordinary, making it difficult to identify critical early changes that could be used as diagnostic biomarkers. To be meaningful in a screened population, diagnostic biomarkers must be discovered in early-stage, non-metastatic cancers since biomarker expression can change over the course of a disease. In prostate cancer, for example, PSA expression per cancer cell usually decreases as tumors become dedifferentiated and metastatic, making PSA an unreliable predictor of therapeutic response in late stages of the disease (Eisenberger and Nelson 1996).

    The use of convenience samples can also affect the performance of prognostic biomarkers, leading to subsequent failure to validate the biomarker in another patient population (Ioannidis and Panagiotou 2011). Quite commonly, cancer prognostic biomarkers are tested in patient samples from cases with early treatment failure or death, and these are compared with cases without recurrence many years after their treatment. This design, however, pits the worst cases against the very best. Clinical practice encompasses patients with a spectrum of risk, and biomarkers developed on samples from the tails of the bell-shaped curve are destined to fail. Therefore, development and validation of biomarkers need to be performed in the context of a discretely defined clinical question, with appropriately selected patients and adequate statistical power.

    The behavior of a screening biomarker will also be influenced by how frequently the malignancy occurs in a population of individuals. Because the incident rates for any single cancer are fairly low in the population, any screening tool must display relatively high sensitivity (the portion of cancer cases that have a positive test) and specificity (the portion of individuals without cancer that have a negative test). Tests with poor performance characteristics can miss cases (false negatives) or identify individuals as harboring the disease when they do not (false positives). False-negative tests have obvious consequences, since dangerous cancers will go undiagnosed. False-positive tests will lead to ancillary tests, such as imaging, laparoscopy, or biopsy, and thereby produce considerable costs to the patient, who suffers from anxiety and from the morbidity of those ancillary tests. False-positive tests and ancillary testing also incur significant financial costs to the patient and the health care system. The potential for harm is great, and this necessitates development of tests with high sensitivities and specificities. For instance, ovarian cancer occurs at a rate of 10–14 cases per 100,000 women per year. If one sets a relatively low bar for an ovarian cancer screening test (e.g., a positive predictive value of 10%, meaning that 10% of women with a positive test will have ovarian cancer, while 90% will have a false-positive test), the test specificity must exceed 99%, even for sensitivity values at 80% (i.e., 20% of cases would be missed by the test) (Lutz et al. 2011). These performance characteristics are well beyond most biomarkers currently in use or in development. Application of a screening test to a population raises additional questions including when to start screening, how frequently to screen, which population to screen, and how well the test performs in different ethnic populations.

    A final challenge in clinical application of cancer biomarkers is with the end-users—the physicians who order the tests. Clinicians tend to use cancer screening tests in a binary fashion: A test is normal or abnormal based on whether it exceeds a cut-off value. However, emerging data suggest that existing cancer biomarkers should be used to assess risk as continuous variables, much as cholesterol is used to assess cardiovascular risk (Thompson et al. 2004). Several genomics-based tests, such as Oncotype Dx and MammaPrint in breast cancer, provide a risk score that correlates with a meaningful clinical outcome such as the subsequent chance of developing metastases. Yet, discussion of the relative risks of a quantitative test outcome requires considerable time, often a scarce resource in a busy practice. These discussions also require high levels of sophistication since patients will usually ask questions about steps downstream from the diagnosis of cancer that require an understanding of tumor aggressiveness and choice of therapy. How to deliver this information to patients compassionately, thoroughly, and accurately in the context of the current care delivery system and in the context of our limited understanding of cancer biology is a significant challenge.

    The future

    Development of cancer detection biomarkers will be propelled by technological improvements in how biomarkers are objectively measured (mutations, methylation, protein expression, molecular imaging). For example, as ultra-high-throughput sequencing technology improves and becomes more cost-effective, whole genome sequencing of germline DNA could identify rare, highly penetrant, high-risk alleles for many cancers that can be used to tailor cancer screening protocols to individuals at high risk. Development of robust sequencing protocols for use in small tissue samples or single cells will provide opportunities to investigate genomic changes early in the malignant process. Comparison of early- and late-stage samples (such as are being generated in TCGA) could help identify biomarkers associated with progressive disease. Whole transcriptome sequencing will likely reveal RNA splice variants and non-coding RNAs (like PCA3) that can be used in cancer detection (Prensner et al. 2011). Finally, very deep sequencing of cell-free DNA in the plasma or urine could be used to identify tumor-derived DNA fragments with mutations, gene fusions, DNA methylation changes, or structural rearrangements that are pathognomic for specific malignancies.

    Moving forward, it is clear that progress will come only through team-based science. Work of consortia, most notably the Early Detection Research Network and the Canary Foundation, taught us that those teams must include genomic scientists, molecular imaging and laboratory medicine experts, engineers, epidemiologists, clinicians, industrial partners, and patients. Given the complexity of the carcinogenic process, cancer heterogeneity, and tumor microenvironment, it is unlikely that any single diagnostic will be effective. Rather, diagnosis will be a staged process beginning with identification of individuals at risk, performance of a targeted screen using an easily accessed biosample such as blood or urine, followed by localization of the lesions with molecular imaging (Lutz et al. 2011). Once localized, tumors can be biopsied to assess risk and select therapy. Deployment of screening technologies in the clinical setting will depend on their ability to improve clinical outcomes in an efficient and cost-effective manner.

    Acknowledgments

    The NIH, Early Detection Research Network, 1U01CA152737-01, the Canary Foundation, and the Department of Defense (W81XWH-11-1-0447 and W81XWH-10-1-0510) are acknowledged for their support.

    Footnotes

    References

    Related Article

    | Table of Contents

    Preprint Server