Academia-Industry Collaboration: An Integral Element for Building “Omic” Resources

  1. David E. Hill1,10,
  2. Michael A. Brasch2,
  3. Anthony A. del Campo3,
  4. Lynn Doucette-Stamm4,
  5. James I. Garrels5,
  6. Judith Glaven6,
  7. James L. Hartley7,
  8. James R. Hudson, Jr.8,
  9. Troy Moore9, and
  10. Marc Vidal1,10
  1. 1Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
  2. 2Atto Bioscience, Rockville, Maryland 20850, USA
  3. 3Office of Research and Technology Ventures, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
  4. 4Agencourt Biosciences Corporation, Beverly, Massachusetts 01915, USA
  5. 5Garbrook Associates, Beverly, Massachusetts 01915, USA
  6. 6Harvard Medical School, Boston, Massachusetts 02115, USA
  7. 7SAIC/NCI, Frederick, Maryland 21702, USA
  8. 8CityScapes, Huntsville, Alabama 35801, USA
  9. 9Open Biosystems, Huntsville, Alabama 35806, USA

The availability of ∼200 nearly completed genome sequences and >900 additional sequencing projects underway is changing the very fabric of biological research endeavors. With access to enormous amounts of sequencing data and rapidly expanding cloned gene collections, scientists have the opportunity to pursue research projects at any scale, from highly focused, one-gene-at-a-time studies to broader, more global genome and proteome-wide approaches. Although the former efforts are well within the standard purview of traditional research laboratories, global approaches necessitate a more complex collaborative environment involving multidisciplinary teams from academia, government, and industry. Such “large-scale science,” most recently demonstrated by the Human Genome Project, also demands open access to data and resources, regardless of where the primary data are generated, and a commitment to provide as complete a resource as is feasible. This special issue focuses on creating, improving, and using cloned “ORFeomes” and exemplifies successful partnerships between academia and industry. In this perspective we argue that long-term academia-industry collaborative relationships provide optimal solutions to the specific problems of discovery science.

From Blueprints to Finished Goods

The human genome sequence and that of various model organisms provide a necessary framework for a transition from molecular biology to systems biology. Although the human genome sequence is sometimes referred to as the “parts-list,” it is crucial to realize that genome sequence annotations, as they are available today, provide rough drafts of blueprints for the parts. The challenge to establish the precise number of parts, namely, the encoded proteins and RNAs, their actual structure, and their respective interactions, requires a dedicated effort to convert the blueprints into an accessible warehouse of available, well-characterized manufactured parts.

This issue of Genome Research highlights recent developments in the generation of various genome-wide resource collections that are expected to contribute to a more integrated understanding of biological processes (Ideker et al. 2001; Vidal 2001). As such, the efforts described herein (Dricot et al. 2004; Dupuy et al. 2004; Lamesch et al. 2004; Rual et al. 2004a,c), along with other established collections of cDNAs and ORFs (Hudson Jr. et al. 1997; Strausberg et al. 2002; Carninci et al. 2003; Reboul et al. 2003) constitute a foundation on which it will be possible to investigate and manipulate both specific genes and proteins and the global networks in which they participate. The creation of multiple types of genome resources, from large-insert genomic DNA libraries to specialized collections of individually cloned genes, cDNAs, and ORFs and their utilization across multiple disciplines as a way to understand biology from a systems approach is a direct consequence of the highly collaborative, interdisciplinary efforts such as those required for the Human Genome Project. However, to take full advantage of the growing collection of genome sequences and associated databases requires focused and committed efforts to create comprehensive resource collections that are not encumbered in any way (Collins et al. 2003a).

The public availability of ∼200 genome sequences for humans, model organisms, and microbial species (Bernal et al. 2001) has provided tremendous impetus for creation of large-scale sets of cloned genes, expanding by orders of magnitude the numbers of genes and proteins readily accessible for further study (for review, see Rual et al. 2004b). Efficient utilization of these gene resources will require the deployment of robust and facile technologies for isolating full-length open reading frame (ORF) clones in order to carry out proteome-wide, protein-based studies and corresponding promoter regions for transcriptional regulation and localization studies. Analogous to the collaborative efforts at sequencing the human genome, “discovery science” will depend on collaborative arrangements involving both public and private partnerships (Committee on Large-Scale Science and Cancer Research 2003). A significant aspect of these collaborative enterprises is that the results and corresponding resources must be available to the public in an open-access setting, free of intellectual property (IP) entanglements.

Academia-Industry Collaborations: A Relationship Fostered by Governmental Action

Relationships among United States colleges and universities and commercial firms have existed since at least the 1860s, when the Morrill Act established the United States land-grant system of colleges, which fostered the transfer of new agricultural methods and technologies to farm operations (for review, see Hasselmo and McKinnell 2003). The Morrill Act also provided a mechanism for the federal government to fund investigator-initiated research projects (Committee on Large-Scale Science and Cancer Research 2003), a prelude to the subsequent development of extramural funding by the National Institutes of Health (NIH).

Throughout the 20th century, scientists have relied on commercial firms to provide critical reagents, materials, and technical know-how for their investigator-initiated efforts. For example, Fisher Scientific, founded in Pittsburgh in 1902 by Chester Garfield Fisher, was one of the first commercial sources of equipment and reagents for United States laboratories, initially as a reseller of quality instruments imported from Europe (http://www.fisherscientific.com). Various products from Fisher were used in government laboratories during the Manhattan Project to build the atomic bomb, one of the first of many “big science” projects undertaken by the United States government.

Although academic research has relied on industry for consumables and technology, much of the intellectual foundation and initial proofs-of-principle supporting a significant fraction of commercially available products generally derive from academic research endeavors. Obviously, both groups have developed seminal technologies (see below), and industry has provided the necessary means by which individual discoveries become value-added reagents, quickly and efficiently disseminated to the entire research community. The wealth of antibody-based commercial products available online from over 250 suppliers (for listing of companies with online antibody resources, see http://www.antibodyresource.com/) is directly attributable to the research efforts of Kohler and Milstein (1975) and is just one example of the commercialization process. It can be argued that this plethora of antibody products is available because there were no patents filed on the initial hybridoma technology by the inventors.

Commercialization Versus Public Access

Commercialization of the knowledge arising from academic and governmental research in the United States was inefficient at best prior to 1980. In that year, the United States Congress passed the Bayh-Dole and Stevenson-Wydler Acts, which changed the landscape of academia-industry relations. By these two acts, congress set forth a policy to expedite commercialization of products resulting from the federal government's investment in basic research (Blumenthal 2003; Committee on Large-Scale Science and Cancer Research 2003; Hasselmo and McKinnell 2003). A major consequence of the Bayh-Dole Act was a shift in Federal policy that allowed the IP rights resulting from the fruits of federally funded research to remain with the academic centers where the innovations were made. This decentralization of IP management, combined with incentivized academia, became a motivating factor in the transfer of technology to the commercial sector (Committee on Large-Scale Science and Cancer Research 2003).

Prior to embarking on a full-scale effort to sequence the human genome, NIH was actively involved in filing patents on expressed sequence tags (ESTs); this activity elicited concern among many scientists (Olson 2002; Sulston and Ferry 2002), with public debate actually having an impact on the Human Genome Project (Sulston and Ferry 2002). Fortunately, there was a growing sentiment for public release of data that had evolved from the collaborative efforts to sequence model organisms, notably the Caenorhabditis elegans sequencing effort, coupled with a commitment by Merck to fund efforts leading to the public release of over 400,000 ESTs (Sulston and Ferry 2002). Concern over ownership of and restricted public access to the human genome resulted in the Human Genome Project providing daily release of DNA sequences to public databases once NIH abandoned its efforts to pursue patent protection on human genes (Olson 2002; Sulston and Ferry 2002). This very successful mechanism of public release of data has subsequently been adopted by the rat and mouse sequencing consortia (Waterston et al. 2002; Gibbs et al. 2004) and by the SNP Consortium to map human single nucleotide polymorphisms (SNPs; Holden 2002). The SNP Consortium is particularly noteworthy in that 13 companies provided funding to generate a SNP map in which all of the data were to be in the public domain with no IP entanglements. Clearly, the corporations funding the SNP project and Merck funding of EST sequencing concluded that the overall goals of the respective projects outweighed the potential monetary value accessible through patent protection of a small fraction of the total data set.

Today, as a direct consequence of the Human Genome Project and the development of super high-throughput sequencing technologies, DNA sequencing has become a commodity in which academia and industrial laboratories can “outsource” their sequencing (Salisbury 2004). The trend to commodity-based reagents and services for sequencing has enhanced collaboration between academic and industry because the critical issue is assigning functions to a gene rather than simply sequencing a gene. Our own efforts at ORFeome cloning and interactome mapping have benefited by access to such IP-neutral relationships with corporations (Walhout et al. 2000a; Reboul et al. 2001, 2003; Lamesch et al. 2004; Li et al. 2004), albeit not without initial concerns over “ownership” issues that were eventually resolved by adhering to open access principles and practices (see below).

Large-Scale Resource Collections: Build-It-By-Collaboration

Public-private partnerships are considered critical elements for the future of basic and clinical research in the recently announced NIH “Roadmap” (http://nihroadmap.nih.gov/) and for successful outcomes involving “large-scale science” projects according to a recent report from the National Academy of Sciences (Committee on Large-Scale Science and Cancer Research 2003). However, building large-scale resource collections requires substantial and secured funding. Inevitably, these efforts entail extensive collaborations between public and private institutions, an issue that NIH has recently embraced through various initiatives such as the ENCODE Project (http://www.genome.gov/10005107). In academia, the traditional tenure-track, independent investigator-based environment is not ideally suited to large-scale, resource-building projects because the emphasis of such projects is perceived to be not based on hypothesis-driven inquiry (Committee on Large-Scale Science and Cancer Research 2003). Nevertheless, academia is insulated from the uncertainties of the business climate, which allows projects to be completed even when corporate partners are unable or unwilling to continue their involvement in the project due to a change in the partner's strategic direction. Industry, on the other hand, is well suited to carry out large-scale “production” style work and can keep overall costs low due to economies of scale. By incorporating industrial quality assurance and assessment practices and methods of production, suitably funded academic laboratories can embark on large-scale resource building projects and achieve a high degree of success. The major academic/institute sequencing centers that carried out a substantial portion of the full-scale sequencing of the human genome had adopted such production-style manufacturing processes (Hawkins et al. 1997; Huang 1999; Stojanovic et al. 2002; Collins et al. 2003b).

Completely Complete?

A key aspect of the Human Sequencing Consortium of public and private agencies is that the project has forged ahead with completing the sequencing effort and making the data freely available as they are generated. Although completion of the sequencing effort is essential for the building of comprehensive cDNA and ORFeome resources, it is arguably the hardest aspect of any sequencing project. Annotation and reannotation of partial and completed genomes have become the rate-limiting steps for building comprehensive resources of cloned ORFs and cDNAs as exemplified by the ongoing reannotation of genome sequences (Bernal et al. 2001; Gene Ontology Consortium 2001; Flybase Consortium 2002, 2003; Garrels 2002; Crowe et al. 2003; Daraselia et al. 2003; Kellis et al. 2003, 2004; Meyers et al. 2003; Stein et al. 2003; Gibbs et al. 2004; Imanishi et al. 2004; Lamesch et al. 2004). As we use genome sequences to move further into the “discovery science” phase of resource building, efforts comparable to genome reannotation, that is, iterative versions of cDNA, ORFeome, and promoterome collections, will be required to achieve an acceptable level of completeness (Lamesch et al. 2004). Furthermore, comprehensive databases that compile multiple features for each gene and protein, are regularly updated, and are readily accessible to the entire scientific community will be essential for moving “discovery science” and systems biology forward (Costanzo et al. 2000; Bernal et al. 2001; Gene Ontology Consortium 2001; Matthews et al. 2001; Garrels 2002; Flybase Consortium 2003; Prince et al. 2004). Our own efforts in generating the C. elegans ORFeome and promoterome have been guided by the premise that “single-pass” high-throughput cloning has a success rate of ∼65%, mainly due to limitations in gene annotation. To clone the remaining 35% requires integration of data from multiple venues, including genome reannotation and comparative genomic analyses. These subsequent efforts to build upon the initial successes are analogous to improved versions of computer software. As with multidisciplinary efforts to reannotate genomes, academia-industry collaborative endeavors to achieve completeness in ORFeome and promoterome resources will be essential.

The C. elegans ORFeome Project: A Microcosm of Genome-Wide Resource Building

The C. elegans ORFeome project could not have been undertaken without some form of collaboration, especially with respect to the actual cloning of ORFs and their subsequent structural analyses. In that regard, we were fortunate to have Research Genetics (ResGen), Life Technologies, Inc. (LTI), and Genome Therapeutics Corporation (GTC) as collaborators during the entire project. Each of the three collaborating institutions provided critical expertise to the overall project, and key individuals from each institution are coauthors on the various publications that resulted from this project.

We relied on ResGen for pairs of oligonucleotides, each primer nearly 50 nucleotides in length to accommodate the dual needs of being ORF-specific and containing the necessary elements for recombinational cloning (Hartley et al. 2000; Walhout et al. 2000b; Marsischky and LaBaer 2004), synthesized and delivered in 96-well format at a time when few groups were even contemplating high-throughput synthesis of thousands of primers for PCR, and for help in overall organization of clone collections. Robotic-based liquid handling systems are still a rare entity in most academic laboratories, whereas companies such as ResGen had already developed processes for reliably handling large resource collections. Understanding those processes and learning from ResGen were important for our overall organization of the project.

Standard methods using restriction endonucleases for cloning ORFs are adequate at small scale but inefficient when attempting to clone entire ORFeomes (see Brasch et al. 2004; Marsischky and LaBaer 2004). Fortunately, we had been working with researchers at LTI on the commercialization of yeast two hybrid assay systems (Vidal et al. 1996). This established relationship subsequently evolved into a collaboration furthering the development of the Gateway recombinational cloning system. The collaborative efforts resulted in adapting Gateway for PCR-based cloning of C. elegans ORFs (Hartley et al. 2000; Walhout et al. 2000a) as the system was evolving from a basic corporate research and development project to a manufacturable product. Essentially, we were able to both access Gateway as a “beta-tester” (MacNeil 2004), albeit for a very large project involving 19,000 reactions, and expand the capabilities of the product. Although the specific components of Gateway can be produced successfully, nominally in small quantities, in any well-equipped molecular biology laboratory, a quality-controlled manufacturable process capable of producing large quantities of a critical reagent is a hallmark and strength of successful biotech companies. It made more sense to use the expertise at LTI than to try and cut corners by producing the critical reagents in-house.

The use of the Gateway technology did pose a potential problem, namely, the issue of clone ownership and the risk that there would be legal entanglements that would reduce or prevent open access to any gene cloned that way. For model organisms such as C. elegans, clone ownership was not an issue because it was very unlikely that a C. elegans gene might be the basis of a commercial product or human therapeutic agent, whereas for human genes, there was concern over such ownership issues. Invitrogen, which had acquired the rights to Gateway via acquisition of LTI in 2001, eventually responded to these concerns by publicly announcing a policy of open access to any gene cloned by using Gateway technology, most notably human genes obtained from the Mammalian Gene Collection (www.invitrogen.com/gateway/; http://home.businesswire.com/portal/site/google/index.jsp?ndmViewId=news_view&newsId=20040504006140&newsLang=en), thereby clearing the way for the development of Human ORFeome resource collections (Rual et al. 2004c), analogous to what was done for C. elegans (Reboul et al. 2003). Such open access to physical resources complements the basic principles enunciated by Collins et al. (2003a) for access to other resource collections and databases.

Any cloning project is heavily dependent on DNA sequencing. Our ability to provide experimental verification of C. elegans predicted genes and to correct predicted exon-intron structures was based on obtaining high-quality sequences in a high-throughput manner. GTC, one of only two corporate entities that participated in sequencing efforts of the Human Genome Project (Lander et al. 2001), was able to provide sequencing services to the ORFeome project (and the interactome mapping project as well) at costs well below rates charged by core facilities in academia. In this regard, GTC was transitioning from conducting high-throughput genome sequencing to providing custom sequencing services on a smaller scale to multiple customers, both academic and industrial. Our need for sequencing individual ORF clones nicely coincided with their need to adapt to a changing market. Three separate and distinct collaborations all focused on the single goal of obtaining a comprehensive version of the C. elegans ORFeome.

Collaborations Come and Go, the Science Stays

All academia-industry collaborative ventures are at risk of being prematurely dissolved due to the vagaries of funding mechanisms and the business climate. This risk is heightened as the scale of the project increases, particularly when one partner is providing a unique component. In the worst-case scenario, the loss of a single partner can derail the entire enterprise. To guard against such an outcome in large-scale science projects requires one partner to serve as the “epicenter.” Because the business climate is generally less stable than academia, academic laboratories are the logical choice to be the focal point for such projects.

Ironically, all three of our corporate collaborators had “disappeared” prior to the publication of the C. elegans ORFeome (Reboul et al. 2003). The GenomeVision Sequencing Services of GTC was sold in 2003 to Agencourt Biosciences Corporation, with whom we now collaborate for our sequencing needs. During the creation of the C. elegans ORFeome, Invitrogen acquired our other industrial partners, ResGen and LTI. Because each partner was making significant contributions to the project, we were naturally concerned that the original arrangements might be abrogated in some fashion or that the project could be compromised or delayed. Fortunately, those fears were unfounded and we maintained our productive collaboration with all of the groups, most likely because the project had internal champions supported by strong interpersonal relationships in which “good faith” promises were every bit as important as any legally binding contracts. Had we been unable to access the technologies provided by our partners, the project would have been terminated.

As the ORFeome project continued toward its version 1.1 completion and the interactome mapping project moved forward, our relationship with Invitrogen, now the owner of Gateway cloning technology developed by LTI, evolved from a customer-based one to a more collaborative one, which helped secure our access to Gateway. We also maintained working relationships with those individual collaborators who moved into other ventures after the Invitrogen acquisitions.

Large-Scale Science Requires “Disruptive Technologies” That Arise Anywhere and Impact All

In the past 30 years, there have been five major technologies that have exploded across the entire breadth of biological research, disrupting how molecular biology was formerly done. These are the generation of stable hybridomas leading to monoclonal antibody production in 1975 (Kohler and Milstein 1975), the ability to directly sequence DNA in 1977 (Sanger et al. 1977), the development of PCR in 1985 (Saiki et al. 1985, 1986), the use of immobilized arrays of nucleic acids for analyzing gene expression profiles across large numbers of genes (Schena et al. 1995, 1996), and the capability to specifically knockdown expression of any target protein through the use of small RNAs that catalyze the degradation of specific mRNAs (Fire et al. 1998; Montgomery et al. 1998; Timmons and Fire 1998; Timmons et al. 2001). Although hybridomas and DNA sequencing came out of publicly funded laboratories, the initial version of PCR was developed at Cetus and licensed to Roche. Today, there are a myriad of ways in which PCR, and related technologies, is used to selectively amplify target nucleic acids. Although many of these methods have been developed in academic and industrial laboratories, virtually all aspects of PCR are performed by using commercially available reagents and equipment with the exception of the source of template, and even then there are commercial sources of many of the standard cDNA libraries used for PCR. Microarray technology and RNAi were both developed in academic laboratories, and published methods allow one to carry out either technique by using standard reagents and minimal investment in equipment (Alizadeh et al. 1999; Cheung et al. 1999; Paddison et al. 2004). However, industry has responded in rapid fashion (Lipshutz et al. 1999) such that the commercially available systems for conducting microarray studies have taken over the field while suppliers of commodity reagents have blanketed the research landscape with RNAi-based products (for a partial listing of companies offering products for RNAi, see http://www.biocompare.com/nature/jump/1065/siRNA-Technology.html).

The above examples demonstrate that academic laboratories have had a complex relationship with industry throughout the 30-year history of the biotech revolution. However, this complexity can be distilled to four distinct modes: (1) discoveries made in academic laboratories lead to the creation of new companies, new products, and new technologies through licensing efforts. These technology transfer activities are a direct consequence of the 1980 Bayh-Dole Act. (2) New technologies, products, and equipment developed in industry become key reagents/platforms/assays for academic projects. Such products can be accessed via collaboration, “beta testing,” or direct commercial purchase. (3) Industry provides contract services to academics. Both the service provider and academic customer may collaborate to improve the service product and avoid encumbrances. (4) Large-scale projects necessitate that academic laboratories and industry collaborate as full partners in which IP issues, project management, and staffing be well established before the project begins.

Collaborations Beget Collaborations

The various “omic” efforts described in this special issue further demonstrate the utility of collaborative efforts between academic laboratories and industry. The C. elegans ORFeome, interactome, and promoterome projects (Reboul et al. 2003; Dupuy et al. 2004; Lamesch et al. 2004; Li et al. 2004) were accomplished because of active involvement by our collaborative partners. Particularly in the case of the C. elegans ORFeome project, our corporate collaborations exemplified all four aspects above in that technology development, “beta testing,” contract sequencing, and project integration all played a role in the overall process. In addition, a successful collaboration actually generates more work for all parties. We continue to collaborate with nearly all of the individuals from industry who participated in the initial development of the C. elegans ORFeome (Reboul et al. 2001, 2003) despite the fact that many of them have changed companies. This continued interaction demonstrates that the formation of personal relationships among the partners is ultimately the critical factor to maintaining collaborations.

Acknowledgments

We thank our colleagues and friends throughout academia and industry who have provided support, critical evaluations, technologies, ideas and/or contributed to the various “omic” projects. We thank D. Allinger and J. Albala for critical reading of the manuscript and acknowledge the efforts of G. Lucier and U. Caney in fostering open access of clone resources. This work was supported by grants from the National Cancer Institute and the National Human Genome Research Institute awarded to M.V.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2771404.

  • 10 Corresponding authors. E-MAIL marc_vidal{at}dfci.harvard.edu; FAX (617) 632-5739. E-MAIL david_hill{at}dfci.harvard.edu; FAX (617) 632-5739.

References

WEB SITE REFERENCES

« Previous | Next Article »Table of Contents