The data governance mandates of GDPR, the quest for AI-driven analytics and the pull of cloud computing set much of the tone for the efforts of data management and big data teams in 2018. These and related data management trends will further affect the work of data professionals in 2019, according to industry analysts.
For example, organizations increasingly are emphasizing data privacy protections when they put big data applications into production use. That was spurred by the advent of GDPR, the new European Union privacy law that took effect last May. A harbinger of the need for stronger data governance, GDPR was followed in June by the approval of a state law with similar intents in California; the California Consumer Privacy Act’s compliance requirements are due to go into effect at the start of 2020.
The new laws make effective governance of data a priority for data managers and corporate executives, said William McKnight, president of McKnight Consulting Group in Plano, Texas. “People are going to have to gain a better understanding of data lineage, data quality and data access,” McKnight said. “Those shops that have featured data governance are far ahead [in doing so].”
A more orderly data lake
Things are changing even for the Hadoop data lake, once seen as a place in which to toss unsorted raw data for potential analytics uses.
“The renewal of data governance affects the data lake,” McKnight said. “You can’t just throw data there, even if that is what the data scientist wants to do.”
Data governance for the data lake has spawned a heightened focus on data catalogs and metadata tagging processes, he added. The role of data pros is also changing as part of those initiatives and other data management trends — a change that is reflected in the rise of DataOps, McKnight said.
This is an offshoot of DevOps methods that strive to streamline application development. In the form of DataOps, McKnight noted, data management teams work to maintain consistent treatment of data and to see that none of it is left behind in the push to full-scale production use across distributed data architectures.
Wayne Eckerson, founder and principal consultant at Eckerson Group in Hingham, Mass., also pointed to DataOps as an emergent new discipline that’s likely to become more prevalent in 2019.
“A lot of user organizations are trying to wrap their minds around DataOps with more agile processes,” Eckerson said. “They’re looking at lean version control and testing — doing all the good software development practices and applying them to the data environment.”
Data management teamwork expands
As they pursue DataOps-oriented practices, data management teams are also working more closely with the business — to the point where their place in the traditional IT structure is also changing.
“We’re seeing a continuing disintermediation of central IT and a related change in data team dynamics,” McKnight said. “Many organizations are acknowledging this as the new way, and it’s reflected in the composition of their IT shops.”
Also notable among data management trends is the growing need for data management and analytics teams to work collaboratively on new types of advanced analytics that take advantage of AI technology, according to Doug Henschen, an analyst at Constellation Research Inc. in Monte Vista, Calif.
Working in unified teams is a key to moving machine learning and deep learning models into production at large scale, Henschen said. It’s a 2018 trend that he expects to see carry through to 2019.
“What’s needed are team-based approaches that knit together data scientists, data analysts, data engineers, developers and business leaders in order to embed models into business applications at scale with ongoing monitoring and optimization,” Henschen said.
For businesses, competitive differentiation will be based partly on how broadly they can make use of data and harness cutting-edge analytics techniques — and “whether they can do it with solid governance and compliance,” Henschen continued. “Data is only as valuable as it is trusted.” He, too, sees data catalogs growing in use as a way to make data more governable.
Look to the cloud as a data platform
Cloud computing is hardly a new trend, but it reached a crescendo of sorts on the data management side in 2018. Cloud databases and data warehouses were central to technology and acquisition moves by AWS, IBM, Microsoft and Oracle. The prospect of big data systems moving to the cloud also lurked behind the merger that Hadoop vendors Cloudera and Hortonworks agreed to in October.
Both Cloudera and Hortonworks faced technology and business challenges as they pursued cloud initiatives that allowed users to swap out the Hadoop Distributed File System (HDFS) for cloud object storage technologies, according to James Curtis, an analyst at 451 Research in New York.
In fact, neither company was touting the term Hadoop much at the time the merger was announced, reflecting the diminished role of HDFS and the MapReduce processing engine and programming framework — the big data platform’s original core components.
“The original underlying processing engine and file format in Hadoop are being superseded. But, while Hadoop will be the elephant not in the room, what Hadoop did is not going away,” Curtis said.
What Hadoop effectively did, he added, was usher in a broad big data ecosystem of open source software components and allow users to apply different processing techniques to different data workloads — two data management trends that are expected to continue with or without Hadoop itself.
The cloud-based data warehouse made its first appearance as a technology to watch quite a long time ago. As with big data systems, the shift of more data warehouses to the cloud also now looks inevitable.
“The cloud is where data warehouses are going. The only drawback is the huge volumes of data on premises, but there are tools for that, too,” Eckerson said, referring to software that can help users move their data to the cloud.
Fast deployment, always-on operations and the ability to more easily handle spiky performance are some of the deciding factors in going to the cloud, Eckerson said. He added that more and more often, IT and data management teams are also glad to hand over data infrastructure responsibilities to cloud providers.
Change: Deal with it
The end of the year and beginning of a new one is an arbitrary delimiter for measuring the progress or retreat of technology and data management trends. But it’s crucial that organizations set themselves up to deal with the change that new and emerging trends bring, McKnight said. In 2019, they’ll definitely have more of it to deal with, he predicted.
“We’re going to see accelerating change in the world of data, and we’re also going to see resistance to that change,” McKnight said.
The latter part isn’t new: There always has been a “resistance factor” in organizations, McKnight continued. He said, though, that leading-edge companies increasingly are addressing the internal resistance and “being more progressive in meeting their data needs.”