25 Sep Data Fabric “Tapestry or Burlap”
Data fabric is all the threads of architecture and technologies, woven together to mitigate the complexities of managing the myriad varieties of digital data. Using multiple database management systems, across a variety of platforms to orchestrate the integration of the siloes of complex digital data.
The data fabric is a new methodology to manage and integrate data that promises to unlock the power of data, in ways that shatter the limits of previous generations of technology, such as data warehouses and data lakes. Because it is based on a graph data model, the data fabric is able to absorb, integrate, and maintain the vast quantities of data in any of their formats.
“Data Fabric is an architecture and set of data services, that provide consistent capabilities across a choice of endpoints, spanning on-premises and multiple cloud environments. Data Fabric simplifies and integrates data management, across cloud and on-premise to accelerate digital transformation.”
Now let’s examine the “threads that makes up this Fabric” AI, API’s, Analytics, MicroServices, Kubernetes, Docker, Mixed Clouds, Big Data and Edge IoT. Each of these unique topics deserve it’s own Blog. So I will be discussing each one in greater detail and focusing on one topic or thread per blog. Today we will focus on the Data Fabric as a concept.
Today’s Enterprise has data deployed everywhere, comprising a variety of structured, unstructured, and semi-structured data types that are deployed on-premises, in Multi-Clouds (Public, Private and Hybrid). With data in flat files, tagged files, SQL & NoSQL databases, Big Data repositories, graph databases, ad infinitum. The expanding variety of tools, technologies, platforms, and data types make it difficult to manage the processing, access, security, and integration of data across multiple platforms.
Technology is emerging that creates a converged platform that supports the storage, processing, analysis and management of diverse data. Data maintained in existing files, tables, streams, objects, images, IoT data and container-based applications, can all be accessed with different standard interfaces.
A data fabric makes it possible for applications and tools designed to access data using many interfaces such as:
- NFS (Network File System)
- POSIX (portable operating system interface)
- REST API (representative state transfer)
- HDFS (Hadoop distributed file system)
- ODBC (open database connectivity)
- Apache KAFKA (for real-time streaming data)
The data fabric must also allow support for future standards as they develop.
There are a number of requirements a data fabric must address:
- Speed, scale and reliability —access to data maintained within the data fabric must meet business requirements for speed, scale and reliability across multiple computing environments, without requiring trade-offs.
- Centralized service level management: SLA’s related to response times, availability, reliability, and risk containment must be measured, monitored, and managed with the same process for all data.
- Consolidated data protection: Data security, backup (BU), and disaster recovery (DR) methods, are built into the data fabric framework. Applying them consistently across the infrastructure for all data whether it be in the cloud, multi-cloud, hybrid, or in on-premises deployments.
- Infrastructure elasticity: Decoupling data management processes and practices from specific deployment technologies, makes for a more resilient infrastructure, when adopting edge IoT, or any unknown future technology innovations.
- Multiple locations —access to data from network edge, enterprise data center, multi-clouds (Public, Private, or Hybrid)
- Unified data management: Providing a single framework to manage data across multiple and disparate deployments reducing the complexity of data management.
- files must be easy to locate and access
- levels of security must be maintained at the highest level
- files must have compression to reduce storage needs
- provide snapshots of the data for backups
- support multi-tenant (multiple company) computing environments
- High Reliability and Availability —must have a highly reliable environment that self- manages and self-heals itself. Providing high availability services to meet mission critical needs.
The complexities of today’s enterprise data management is growing at an ever accelerating rate of new technologies, new kinds of data, and new platforms. This data is increasingly distributed across on- premise and mixed cloud environments. The process of moving, storing, protecting, and accessing data can become fragmented, depending on where data is located and the technologies used. Having to update data management methods with each technological change is difficult, disruptive and expensive.
As technology innovation accelerates, it can quickly become unsustainable. Data fabric solutions can serve to minimize this disruption by creating a highly adaptable data management environment that can be quickly adjusted as technology evolves.
So the answer to the question “Tapestry or Burlap” is: It depends.
It depends on the threads you choose and loom you use. The loom is the Data Fabric platform you choose, such as Azure Service Fabric, a Platform as a Service (PaaS) or Cambridge Semantics Enterprise Data Fabric. The threads you have available are Dockers, Kubernetes, API’s, MicroServices, Mixed Clouds, Big Data, Analytics, Edge IoT and AI. How you weave them together will determine the effectiveness of your solution.
The key to long term success is to being open to the new and disruptive technologies that are fast approaching. As more and more organizations deploy Data Fabric solutions, more of the holes can be exposed and solutions created to fill them will make an ever tighter and more lustrous fabric.
The Goal of making “all data” available for “any purpose” at “any time” “anywhere” is in sight, but we are not there yet.
Legacy systems and other creators of data silos combined with the different types of data and the increasing varieties of data usages seem to be constantly moving the goalposts.
There is hope the use of AI and better designed API’s, will mitigate the complexity of modern data management. This will allow better automation of the huge number of tasks required to make this look easy, so that we don’t scare the people writing the checks.