Uncovering What’s Inside Big Data
Today’s data is stored in data warehouses filled with high-performance computers that can process and analyze a sea of data in a short period of time.
By Ronda Wendler | Texas Medical Center News
Big data has been a big buzz phrase for the past few years. But what does big data really mean?
Essentially, it’s a ritzy term for a not-so-sexy concept: the idea that massive amounts of information can be analyzed and cross-referenced by high-performance supercomputers to reveal hidden patterns and correlations that can benefit humankind.
For example, genetics researchers seeking to link genetic mutations with diseases may have millions of DNA samples to wade through and cross-correlate.
Public health researchers looking for disease patterns across the country – determining where diabetes is most prevalent, for example – have page after page of epidemiology reports to sift through and compare.
Big data, processed by high-performance computers, software and other data-crunching tools, can take an otherwise behemoth-sized set of data and present it in a compelling and easy-to-grasp manner.
“More and more data from more and more sources is generated every day,” said Elmer Bernstam, M.D., professor and associate dean for research at the School of Biomedical Informatics at The University of Texas Health Science Center at Houston, and an internal medicine physician at the university’s Medical School. “Advancements in technology have led to incredibly clever algorithms that process, compare and visualize this data in insightful ways. This opens up promising new opportunities for research.”
In the past, a database was little more than a tall, metal filing cabinet, and large sets of data were presented as tabular reports.
Big data does away with big booklets of statistics, facts and figures, and instead presents the information as interactive, color-coded visualizations, like maps, charts and scientific illustrations that can be manipulated to suit each researchers’ needs. Data in the “big-data” world is available digitally, and is shared by all.
“The goal is to help researchers translate data into knowledge that will advance discoveries and improve health, while reducing costs and redundancy,” said NIH Director Francis Collins, M.D., Ph.D.
The National Institutes of Health is inviting biomedical researchers and data scientists to submit proposals for big data projects that will become part of six to eight centers the NIH will fund.
“Big data should always be driven by the subject matter experts who are using the system, so we want to hear from the researchers themselves,” said Collins.
Each of the centers will be known as Big Data to Knowledge, or BD2K, Centers of Excellence. The NIH will fund the centers over a four-year period with up to $24 million per year.
The first funding deadline ends Nov. 20, with additional funding deadlines announced next year.
“The BD2K Centers of Excellence will develop cutting-edge data management systems, and will support and help researchers with the software and tools used in sharing, integrating, analyzing and managing data. The centers will also help researchers to locate and use data generated by others,” said Eric Green, M.D., Ph.D., NIH acting associate director for data science and director of the National Human Genome Research Institute.
Training for students and researchers will also be provided by the centers, Green said.
The products that result from the centers will be shared and distributed broadly to the research community.
More details can be found at http://bd2k.nih.gov.