- Scroll down for video.
- Like oil, data in its unrefined state is difficult to use; it must be cleaned and organized before it can be truly useful
- The Hubble Legacy Archive holds decades’ worth of images and data that researchers saved even before they had the technology to examine it
- By 2018, U.S. companies will need more than 1.5 million managers and analysts trained to collect and interpret data
“More data don’t guarantee better decisions. … The right data, however, do,” said Dr. Michael Hasler, program director for the business analytics Master's program at the McCombs School of Business. The idea of “Big Data” is ubiquitous, and companies often believe they need to become part of the big data push without necessarily understanding why or how. But on Jan. 22 at the Texas Enterprise Speaker Series, Hasler reminded the audience gathered at the J.J. Pickle Research Campus that data’s real value isn’t in merely being collected, but in how it helps us make better decisions.
“Data is the new oil,” explained Hasler. “And just like oil, data in its unrefined state is really difficult to use. In its unrefined state, it’s a bunch of zeros and ones. We have to clean it and organize it in order to use it.”
Data Must Be Saved
Dr. Niall Gaffney, director of Data Intensive Computing at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin, agrees. “Data unto itself is worthless,” he said. “It’s like that junk drawer that you’ve got in the kitchen. There’s a lot of useful stuff in there — if you had it organized and set up right.”
And Gaffney knows a little bit about big data. Not only is he on the leadership team at TACC, which houses Stampede — one of the biggest supercomputers in the world — but he is also a former Hubble Space Telescope data scientist who was instrumental in the development of the Hubble Legacy Archive (HLA), a project that catalogued 23 years’ worth of Hubble images and information to make that data available for open research.
The HLA was an ambitious endeavor and one that would not have been possible had decades’ worth of researchers not made one crucial decision: to save their data. Even when the technology didn’t exist to examine it — even before they knew exactly what questions they wanted to ask of it — they preserved it.
“For data to work for you,” Gaffney explains, “you can’t lose it. … It can’t be recreated. You never know how you’re going to use it in the long run.”
Data Must Be Organized
Once scientists began combing through half a petabyte (524,288 gigabytes) worth of data that had been amassed, they noticed something surprising: While plotting supernovae in deep space, they found they were brighter than they should be at such a great distance. They deduced that the only way this could be explained was that the universe is expanding faster today than in times past.
“So when you come up with something like that, you come up with something called Dark Energy,” Gaffney said. “And when you come up with something called Dark Energy, you win one of these: a Nobel Prize.”
Gaffney’s point is simple and applicable to all fields, from astronomy and physics to business or medicine: Data saved is data used — whether today or in 20 years — and we can’t always predict how that data will be applied.
The HLA has simplified the research of three new Nobel Laureates and countless other astronomers, and it has also contributed to something you might have on your smartphone right now: Google Sky Map. During Gaffney’s time at the Space Telescope Science Institute, he and his fellow scientists approached Google and asked if the tech giant had considered turning the view of Google Earth outward, toward space. His team, along with other groups working to collect and organize astronomical data, contributed their findings from the HLA so Google could build an accurate, responsive, and interactive view of our galaxy and beyond.
“You may think that it’s a really hard thing to put together a Google Earth — and it is — but it didn’t start as just Google Earth. It’s a complex set of simple questions that go into building these things, and how you really harness the power of data is by asking a lot of little questions that you can answer, and assembling that into a grand-scale answer,” says Gaffney.
Data Must Provide Insights
To be useful, data must be stored and organized. To be valuable, data must provide insights. Whether it’s studying deep space supernovae or analyzing consumer-buying habits, the right data enable us to make better decisions.
But the data-driven world is changing rapidly. In 1993, 100 terabytes of data were transferred over the Internet. In 2013, 200 terabytes of data were transferred across the Internet … per second. The volume, velocity, and variety of data are increasing exponentially, and this can make it very challenging to sift through and find relevant correlations that produce usable results, especially for those managers not accustomed to working with large, dynamic data sets.
But learning to do so, argues Hasler, is essential. Big data can mean big value — hundreds of billions of dollars, in fact. Geotracking and geofencing technologies, which rely on big data, has the potential to be worth $600 billion annually to the global economy. That’s the kind of decision-making power data can bring to corporations, but it also requires professionals who can do the work.
The McKinsey Global Institute reports “that there will be a shortage of talent necessary for organizations to take advantage of big data” in the years ahead. How much of a shortage? By 2018, U.S. organizations will need more than 1.5 million managers and analysts who can turn data into decisions.
Hasler encourages business leaders to ask themselves the following questions before jumping blindly into the numbers:
- What data do I already have?
- What other data do I need?
- How can I get that data?
- What skills and tools do I need?
- How do I protect and store my data and protect my customers’ privacy?
After all, he says, it’s not the data itself that is crucial for success — it’s knowing how and why to apply it.
“Analytics do not begin with data,” he said. “They begin with problems and opportunities.”