Data has become the fuel the economy is run on, and its security and storage are becoming an increasingly integral part of any enterprise’s infrastructure. Understanding the pros and cons of each data storage option and any potential benefits which can be derived from each is immensely important to the future success of any firm.
In order to explore the various industrial data storage options out there and find out what firms can do to better position themselves for 2019, DATAx turned to Manoj Vig, head of the clinical data repository and clinical data lake at IQVIA and speaker at DATAx New York. Below is the first of a two-part series of conversations about all things storage.
DATAx: What differences do people need to be aware of when it comes to data lakes and data repositories?
Manoj Vig: There are many different variations of these terminologies: Data lakes, data warehouse and data repository to name a few. Sometimes their functionalities overlap with each other and sometimes they complement each other.
Modern data lakes offer unlimited data storage capability and can support a variety of data formats such as structured and unstructured data, text, images and genomic data. Data lakes store data in a secure and compliant way, with sufficient replication and failover support.
Data lakes also enable its data consumers to model their data according to their needs and allows data to support individual use case need. For example, one decision-maker may want to process a dataset and visualize it in a dashboard, while another may want to run a machine-learning algorithm on the same dataset. A third may want to include a search function on the same dataset using natural language processing (NLP). These are all possible using data lakes, as it allows you to decouple data, and actually model and reuse the same data in myriad ways.
A data repository, on the other hand, is more like a data product on top of a data lake that processes data in a specific way, integrates data into standard models and presents data to consumers through many different channels. It hides all the complexities of data processing, data quality and integration from a consumer and allows decision-makers to use the processed data for their needs.
In my view, a data repository is a collection of data capabilities, many of which are self-serve, thereby allowing various kinds of users to leverage data to make impactful decisions. This supports human users in the form of reports, dashboards and alerts, but also supports machine users included IoT devices.
At the end of the day, the difference between these concepts should be measured based on how they complement each other, and how they help us make better and faster business decisions.
DATAx: What technical improvements do you see disrupting the way we store data in 2019?
MV: I think the more important factor will be how we use stored data. The reusability of data, how fast actual decision-makers can access stored data, how much value we can get out of stored data before its value diminishes and how data in motion can be used to improve the healthcare ecosystem.
In my view, 2019 will bear witness to a massive shift in data storage and computing strategies which will be driven by cloud systems such as AWS and Azure.
Not only do these platforms provide data storage capabilities that are limitless when it comes the amount of data they store, which will continue to be a very important factor for many businesses and use cases, but they also provide several pre-engineered turnkey capabilities that are hard to replicate in local data centers without significant investments.
A cloud platform’s capability to store petabytes of data in a geographically replicated fashion with many different data centers across a number of regions is just amazing. On top of that, these capabilities offer pre-engineered compliance and regulation abilities, tailored to specific countries and industries.
This allows organizations to launch their data and analytics systems in various geolocations quickly and cheaply, providing early data and analytics access to decision-makers across the planet, fostering better collaboration, information sharing and collective decision-making, which will be a game changer in near future for all kinds of businesses.
There are many different ways we can define the benefit of cloud services and it will take a long time to discuss all of those benefits. However, if I had to choose one specific trend that would most likely change how we store, distribute and compute data, I would go with cloud platforms.
Manoj Vig will be on a panel on Day Two of the AI & Big Data for Pharma Summit, part of DATAx New York, taking place on December 12–13 at the Hilton Midtown. To attend and hear more great insights from other data experts from some of the biggest and most influential organizations, register here today before it’s too late.