By Vassil Dimov, Mateo Tošić and Sowmya Kameswaran
In the previous blog post we have introduced Db2 for z/OS Data Gate and described it in detail. In this blog we will discuss the integration between Db2 Data Gate and Watson Knowledge Catalog and highlight the business value of it.
In the current business world, modernization of data and use of AI is the key to success. The guiding principles of the AI ladder defined by IBM, help organizations with business transformation based on the four key areas mentioned below:
- Collect — Make data simple and accessible (All data sources contribute to this pillar)
- Organize — Create a business-ready analytics foundation (Data governance services like Watson Knowledge Catalog)
- Analyze — Build and scale AI with trust and transparency (Watson Studio)
- Infuse — Operationalize AI throughout the business (this is what customers do with the data in their own products)
In this blog we will discuss how Db2 Data Gate and Watson Knowledge Catalog, representing the first two pillars of the AI Ladder, can help organizations to unlock the huge value of their Z data in the cloud.
About Db2 Data Gate
Db2 Data Gate enables modern high-volume, high-frequency hybrid cloud applications that need read-only access to valuable enterprise data from Db2 for z/OS. It plays a key role in the Collect pillar by enabling movement of data from Db2 for z/OS into the Cloud Pak for Data platform. With data synchronization between source Db2 for z/OS and target IBM Db2 and IBM Db2 Warehouse, applications are able to get access to current data. To learn more about IBM Db2 for z/OS Data gate, please read “What is Db2 Data Gate? Db2 Data Gate Blog Series Part 1”
About Watson Knowledge Catalog
Watson Knowledge Catalog (WKC) is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. A catalog connects people to the data and knowledge that they need. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.
WKC unites all information assets into a single metadata-rich catalog, based on Watson’s understanding of relationships between assets and how they’re being used and socialized among users in existing projects. It is integrated with an enterprise data governance platform that merges the analytics capabilities of Watson Studio. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed.
Data protection has gained importance in recent years. That is why it is so important that WKC protects data from misuse and enables sharing of assets with automated, dynamic masking of sensitive data elements. This avoids violating various data protection regulations. For instance, when handling healthcare data in the USA, companies need to be aware of HIPAA (Health Insurance Portability and Accountability Act), a set of rules on how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft. Moreover, any company based in the EU or offering services to people in the EU must comply with GDPR (General Data Protection Regulation), which has a much broader scope and governs the use of all personal data.
Db2 Data Gate — Watson Knowledge Catalog integration highlights
Watson Knowledge Catalog provides fine-grained control of data from various sources to users who need access to them. While administrators have the most permissions, data scientists and developers can only access data that is published to catalogs. Business analysts can, in addition to that, view data quality and access information asset views, while data engineers and data stewards can discover assets, import metadata, and access governance artifacts. The benefits are numerous for different user personas.
With the combination of Data Gate and WKC, data scientists and software engineers can explore the most important enterprise data coming from the mainframe and use all tools they are familiar with in the cloud for analysis and modeling and prototyping. They can benefit from tools, like schema structure discovery, to further accelerate the development of models and application. They do not even need to look for connection metadata since all assets are cataloged and accessible in just a few clicks.
Data stewards, on the other hand, can easily work on data quality using governance artifacts, such as business terms, business glossary, classifications, and automatic data profiling in WKC. They can define which columns from Db2 for z/OS are visible for whom in the cloud. More importantly, they can take care of regulations mentioned above (GDPR, HIPAAA, etc.). This has significant importance for data coming from Db2 for z/OS, as a data store containing the most sensitive customers’ data. On top of that, they can use rules, such as automatic data deletion, triggered once data on Z is deleted (e.g., customer related analysis for some customer that needs to be deleted once they leave the company).
In addition to the above, one of the other key benefits is the ability to understand and track data lineage — the journey made by the data from source through any transformations all the way till usage. Data lineage is very important when it comes to making sure the data is coming from the right source, being handled by the right people, undergoing the right transformations and landing in the right target. When Db2 for z/OS data is brought into the platform by Db2 Data Gate and then discovered and imported into the catalog, the data lineage can easily be maintained allowing data custodians to keep track of data all the way from the source. Last but not least, the usage of Db2 Data Gate allows to discover schema changes which can be maintained in the data linage of the data asses in WKC.
Steps to connect DG with WKC
Create a new catalog
Give your catalog a name and, optionally, a description. This catalog will be used to add the connections and assets to it.
Create the source connection
Choose Db2 for z/OS and type in the credentials and other parameters (host, port, etc.). WKC will use this metadata to access your database when you add some assets. You can click on “Test Connection” before creating it.
Add a source data asset
Choose the schema and the table you want from the newly created source connection. In the Assets tab you can see a preview of the data.
Create the target connection and add a target asset
Repeat the process from two previous steps. Instead of Db2 for z/OS choose Db2 or Db2 Warehouse according to your target database.
Add relationships to the data assets and connections
IsSameAs can be used to mark the source connection to be same as the target connection, but also to mark the source data asset to be same as the replicated, target data asset. IsContainedIn can be used to mark a data asset as contained in a connection (or Contains in the opposite direction).
Create Data Profile
(Benefit 1 — for Data Steward)
Data Profiles include generated metadata and statistics about the content of a data asset. An asset profile helps data stewards understand what actions to take to improve the data quality.
Use the data assets in Watson Studio
Create a project and add data assets
(Benefit 2 — Data Scientist / Software Engineer)
If you go to Watson Studio and create a new project (or use an existing one), you can add data assets from this catalog to it.
Create a notebook and load an asset into a data frame
Then you can use that asset for data analysis and modeling. In a Python/R notebook you will get an automatically generated block of code. Watson Studio will ask WKC for data and WKC will use the connection metadata to retrieve the data from the database. You can use the loaded data as a data frame.
We have shortly described Db2 Data Gate and introduced Watson Knowledge Catalog. We pointed out the benefits of their integration. By following the step by step video you were instructed how to get the integration yourself. To better materialize the benefits we went through a couple of example scenarios likely to be relatable in your usage flow.
For further reading, please check Daniel's blog here on Use Current Db2 for z/OS Data on Cloud, Without Direct Mainframe Access and Without Loosing Control Over Your Data