Data Sharing and Data Management
The Cancer Genome Atlas Project (TCGA) is yielding an unprecedented amount of information on human clinical biospecimens. The informatics component of TCGA involves developing best ways to collect, store and distribute the clinical and genomic data generated by the project and is led by the TCGA Data Coordinating Center (DCC). TCGA selected SRA International, Arlington, Va., led by Project Director David Pot, Ph.D., to manage the DCC.
Among the issues that are being considered are:
- protecting patient privacy and confidentiality through secure access for research and clinical information that are classified as controlled access datasets
- developing data standards and controlled vocabularies
- establishing informatics pipelines for dataflow from production centers to a central repository
- developing new analytical and visualization technologies for different audiences to facilitate data analysis
Many of these issues are being addressed by the Genome Data Analysis Centers that were selected as part of the expansion of TCGA.
The TCGA continues to leverage the resources from the National Cancer Institute’s cancer Biomedical Informatics Grid® (caBIG®) to support the distribution of data and access to analytical tools for genomic data being generated by the Biospecimen Core Resource, Genome Sequencing Centers and the Cancer Genome Characterization Centers.
The TCGA Data Portal stores all data generated from the TCGA. Most data within the Data Portal is publicly accessed without any restriction; however, access to some data requires user certification for data access.