Skip Navigation

Big Data Takes a Byte

Over half a century ago, American mathematician and philosopher Norbert Weiner stated that “information is information, not matter or energy,” effectively dubbing data the third constituent of the universe. The trend is evident: Information, now predominantly digital, impacts every aspect of our lives, and that effect will only continue to grow. Across public and private sectors, data scientists have found ways to leverage new data collection and processing tools to gain revolutionary insights and to facilitate unprecedented collaboration.

This profound shift in our way of life, spurred by the rapid proliferation of digital information, is nothing short of revolutionary. Indeed, the term “information revolution” was aptly coined to capture the new economic paradigm beginning in the late 1990s. This period has been driven by a radical shift from traditional methods of problem solving to data-driven decision making. Gary King, Director of Harvard’s Institute for Quantitative Social Science, predicts that “the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”

The tremendous effects of the information revolution can largely be attributed to the recent and rapid advancements in computing that make it possible to collect, store, and analyze increasingly large quantities of data. As a result, innovators can use this enormous amount of accessible information to create updated digital tools. Though information has played a pivotal role in every era of history, this cycle has led to an information explosion, with over 90 percent of all data in the world having been generated in the last two years.

Much of this data is derived from the digital footprint of individuals; the unprecedented amount of accessible behavioral information allows researchers to gain an incisive view into human nature and to use that newfound insight to advance private and public objectives. This explains why machine learning and Big Data engineers are two of the top emerging jobs on LinkedIn, and why positions for data scientists have grown over 650 percent since 2012. Across the board, industries have made the shift from traditional experience-based decision making to more robust, data-guided management. In fact, research shows that companies using data-driven decision making have achieved 5 to 6 percent higher productivity gains than those that do not.

Aside from harnessing Big Data to generate profits for businesses, data scientists are also addressing high-impact societal problems. Seth Stephens-Davidowitz, author and former data scientist at Google, reveals how trends in searches can illuminate complex social issues. One such example is the United Nations Global Pulse. The initiative applies Big Data to development and humanitarian action to support a range of projects, from using satellite images of roofs to measure poverty in Uganda to mining tweets to gain insights into the Indonesian food crisis. Despite widespread application of data science techniques to the world’s biggest challenges, there is one key sector that is lagging behind. Adoption of data-driven technologies in the spheres of medicine and public health has been concerningly slow, a consequence of ethical and privacy concerns. Still, data science holds great promise to positively transform healthcare by improving the performance and productivity of physicians.

The health sector is notoriously tardy in adopting data-driven solutions, due in part to the difficulty of accessing medical datasets, which often contain sensitive information. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) creates obstacles to working with patient data in non-medical settings. HIPAA serves a critical role in maintaining privacy but also makes it challenging to share datasets and crowdsource ideas from the public, a time and cost-efficient way of problem solving often employed in other sectors.

Yet another difficulty in adopting these data-driven health policies is the knowledge gap between physicians and data scientists. Both are highly specialized professions with little crossover: Highly-trained doctors are unlikely to know how to build a machine learning model, and data scientists aren’t usually well-versed in patient care. This gap blocks the flow of ideas from both ends: Data scientists racing to develop better fitness trackers or hard-to-navigate health record platforms aren’t addressing the most pressing health care concerns, and physicians, feeling threatened by the prospect of being replaced, are reluctant to pursue automation.

Nonetheless, even with these challenges, there is hope for Big Data in medicine, and progress is indeed being made. With greater processing power and more precise software, it’s now easier to anonymize, standardize, and share medical data across hospitals and research institutions. Easy access to related cases will allow doctors to improve patient care and enable them to make high-impact predictions without needing any HIPAA-protected information. For instance, the Center of Disease Control and Prevention is able to predict flu outbreaks by leveraging mined data about purchasing patterns at pharmacies, and Google Flu Trends attempts to solve the same problem by using flu-related Google searches, though it was ultimately discontinued due to limited success.

With increased access to medical data, it’s important that physicians and data scientists engage in meaningful collaboration. Online data challenges are a great start: Medical organizations such as the Radiological Society of North America are beginning to post computer-aided detection competitions on Kaggle, the world’s largest online community of data scientists and machine learning enthusiasts, to promote the use of data science in medicine. Recently, a pneumonia detection challenge run in concert with the National Institutes of Health provided over 100,000 anonymized chest X-rays to the public for data analysis in order to create an automated solution to pneumonia diagnosis and treatment. In addition to these online challenges, long-term, in-person partnerships between data scientists and medical practitioners are valuable. One undergraduate research team at Brown University is working with radiologists from Rhode Island Hospital to create an algorithm to detect strokes. With over 85 percent accuracy, the project is just one example of the promising applications of Big Data in medical image analysis.

Even though Big Data has an auspicious and revolutionary future in health care, it’s important to recognize that the end goal is not to replace human physicians. Rather, tools such as computer-aided diagnosis aim to reduce errors and increase productivity, thus allowing doctors to spend more time doing research and building meaningful relationships with patients. Through newly-developed technology, data scientists can help alleviate stress on physicians who are struggling with a field-wide burnout crisis, thereby improving quality of care for patients and quality of life for doctors.

Constant advancements in technology have become such a norm that it’s easy to forget we are living out a revolution. The information revolution is in full swing as unfathomably large amounts of data are being generated and analyzed every single day. With so much information, data scientists should focus on using Big Data to solve specific, high-impact issues. On that front, health care is a sector filled with untapped potential—which data science can help explore, one gigabyte at a time.

Photo: “Technology