
Creating a cloud solution for data analysis is always a time-consuming, complicated, and complex process, no matter if it is based on a modern data warehouse or a data lakehouse. The implementation of such projects can be described with a simple acronym: ISASA.
- The Generic Metadata Framework – why is it working?
- The concept of zones used in the Generic Metadata Framework
- Summary
The measures that must be performed to develop a solution are outlined by ISASA. These actions comprise:
I ingest data and gather it.
Data storing with S-Store.
Analyze and analyze data
Presentation of the prepared data on a surface, or “S.”
Act Now to Increase My Income by 4x
The team that develops the solution is responsible for the first four phases, while the client is in charge of the final step, Act.The customer makes decisions and completes actions aimed at enhancing business operations based on the results of data analysis (e.g., reports, dashboards). Both jobs that don’t require any domain knowledge and those that can only be completed with such information, like model construction, fall under the category of exclusively technical tasks.
Why does the Generic Metadata Framework function?
The development of data lakes and other repetitive operations like data collection are both automated via the Generic Metadata Framework. This enables the developer of a solution to concentrate on the core of a particular issue, i.e., on developing a model that satisfies the established business requirements.
The Generic Metadata Framework aids the solution’s developer in focusing on the business elements of the solution. Additionally, it automates and simplifies:
data loading procedures (allowing complete and incremental data loading),
construction of data lakes (structure definition, data splitting),
beginning data transformation (input data transformation),
constructing delta lakes (structure definition and data splitting),
building data warehouses (establishing the model in the data views).
Principal traits:
flexibility at the architectural level, building data analysis systems on top of contemporary data warehouses and data lakes.
automating routine processes, such data collecting.
Security, monitoring, data governance, and other components are all included in a complete architecture that makes it possible to build scalable systems.
Building elements available via Azure.
Flexibility in terms of data access and simple Power BI connection.
Extensibility: The framework is easily expandable, allowing for the addition of additional data source types, for example.
The areas covered by the framework are shown in the graph below:
Data is gathered from sources, including those in on-premises systems, in the first stage and saved in an Azure Data Lake Gen2 data lake in the native format. The supported data source at that time is based on SQL Server and features technologies that enable incremental data loading through the use of time stamps and change tracking. The metadata gathered from the source system serves as the foundation for the configuration process as a whole.
The gathered data is then preprocessed in the following stage, creating a history of data changes (SCD 2) and saving it in a data lake in delta format. Using the preset setup and information loaded in the previous stage, the procedure is carried out using Spark Databricks.
The next stage is creating the model. The model itself is preserved as views on Spark and extra configuration, which specifies, for example, how the model is supposed to be fed and if SCD 1 or SCD 2 are to be used. The data is sourced from the tables in the so-called curated zone.
The data flow of a solution built on the Generic Metadata Framework is depicted in the graph below.
Zones are a notion in the Generic Metadata Framework.
The bronze, silver, and gold zones utilized in the Generic Metadata Framework are quite similar to the solution building techniques recommended by Databrick. Data may be isolated not only at the logical level but also at the physical level thanks to the split into zones. (Specialized containers on a data lake) level. Additionally, it provides a detailed definition of the input and output for every phase of data processing. Each zone’s data may be accessed thanks to Azure Synapse Serverless.
There are four components that make up the Generic Metadata Framework:
The data loader accountable for preserving data in a data lake and loading data from data sources.
Preprocessor for data constructing a data lake is in charge of early data processing.
Database House who is in charge of building and feeding the data model.
Integrator of Synapses is charge of moving data from the data lakehouse to the Azure Synapse Dedicated Pool.
Modules 3 and 4 can function independently of one another, while Module 4 is optional. In other words, a solution based on the Generic Metadata Framework can be created without utilizing the Azure Dedicated Pool. By offering a self-serve platform, the Generic Metadata Framework also facilitates developing solutions based on the data mesh methodology.
Summary
Utilizing Generic Metadata Framework reduces the time required for migration projects significantly by automating repetitive tasks like data collecting and providing the framework needed to construct solutions.
It ensures both scalability and a high degree of flexibility in terms of access to data.