Sowmya Kameswaran's Posts (2)

Sort by

By Vassil Dimov, Mateo Tošić and Sowmya Kameswaran

Introduction

In the previous blog post we have introduced Db2 for z/OS Data Gate and described it in detail. In this blog we will discuss the integration between Db2 Data Gate and Watson Knowledge Catalog and highlight the business value of it.

AI Ladder

In the current business world, modernization of data and use of AI is the key to success. The guiding principles of the AI ladder defined by IBM, help organizations with business transformation based on the four key areas mentioned below:

  1. Collect — Make data simple and accessible (All data sources contribute to this pillar)
  2. Organize — Create a business-ready analytics foundation (Data governance services like Watson Knowledge Catalog)
  3. Analyze — Build and scale AI with trust and transparency (Watson Studio)
  4. Infuse — Operationalize AI throughout the business (this is what customers do with the data in their own products)

In this blog we will discuss how Db2 Data Gate and Watson Knowledge Catalog, representing the first two pillars of the AI Ladder, can help organizations to unlock the huge value of their Z data in the cloud.

About Db2 Data Gate

Db2 Data Gate enables modern high-volume, high-frequency hybrid cloud applications that need read-only access to valuable enterprise data from Db2 for z/OS. It plays a key role in the Collect pillar by enabling movement of data from Db2 for z/OS into the Cloud Pak for Data platform. With data synchronization between source Db2 for z/OS and target IBM Db2 and IBM Db2 Warehouse, applications are able to get access to current data. To learn more about IBM Db2 for z/OS Data gate, please read “What is Db2 Data Gate? Db2 Data Gate Blog Series Part 1

About Watson Knowledge Catalog

Watson Knowledge Catalog (WKC) is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. A catalog connects people to the data and knowledge that they need. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.

WKC unites all information assets into a single metadata-rich catalog, based on Watson’s understanding of relationships between assets and how they’re being used and socialized among users in existing projects. It is integrated with an enterprise data governance platform that merges the analytics capabilities of Watson Studio. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed.

Data protection has gained importance in recent years. That is why it is so important that WKC protects data from misuse and enables sharing of assets with automated, dynamic masking of sensitive data elements. This avoids violating various data protection regulations. For instance, when handling healthcare data in the USA, companies need to be aware of HIPAA (Health Insurance Portability and Accountability Act), a set of rules on how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft. Moreover, any company based in the EU or offering services to people in the EU must comply with GDPR (General Data Protection Regulation), which has a much broader scope and governs the use of all personal data.

Db2 Data Gate — Watson Knowledge Catalog integration highlights

Watson Knowledge Catalog provides fine-grained control of data from various sources to users who need access to them. While administrators have the most permissions, data scientists and developers can only access data that is published to catalogs. Business analysts can, in addition to that, view data quality and access information asset views, while data engineers and data stewards can discover assets, import metadata, and access governance artifacts. The benefits are numerous for different user personas.

With the combination of Data Gate and WKC, data scientists and software engineers can explore the most important enterprise data coming from the mainframe and use all tools they are familiar with in the cloud for analysis and modeling and prototyping. They can benefit from tools, like schema structure discovery, to further accelerate the development of models and application. They do not even need to look for connection metadata since all assets are cataloged and accessible in just a few clicks.

Data stewards, on the other hand, can easily work on data quality using governance artifacts, such as business terms, business glossary, classifications, and automatic data profiling in WKC. They can define which columns from Db2 for z/OS are visible for whom in the cloud. More importantly, they can take care of regulations mentioned above (GDPR, HIPAAA, etc.). This has significant importance for data coming from Db2 for z/OS, as a data store containing the most sensitive customers’ data. On top of that, they can use rules, such as automatic data deletion, triggered once data on Z is deleted (e.g., customer related analysis for some customer that needs to be deleted once they leave the company).

In addition to the above, one of the other key benefits is the ability to understand and track data lineage — the journey made by the data from source through any transformations all the way till usage. Data lineage is very important when it comes to making sure the data is coming from the right source, being handled by the right people, undergoing the right transformations and landing in the right target. When Db2 for z/OS data is brought into the platform by Db2 Data Gate and then discovered and imported into the catalog, the data lineage can easily be maintained allowing data custodians to keep track of data all the way from the source. Last but not least, the usage of Db2 Data Gate allows to discover schema changes which can be maintained in the data linage of the data asses in WKC.

Steps to connect DG with WKC

Create a new catalog

Give your catalog a name and, optionally, a description. This catalog will be used to add the connections and assets to it.

1*-TYXylG1oublSAn4RHDOqQ.gif

Create the source connection

Choose Db2 for z/OS and type in the credentials and other parameters (host, port, etc.). WKC will use this metadata to access your database when you add some assets. You can click on “Test Connection” before creating it.

1*wZrF-BSIrVLm8qTmRFkvsQ.gif

Add a source data asset

Choose the schema and the table you want from the newly created source connection. In the Assets tab you can see a preview of the data.

1*XjTJMd1VjZZQ3rg8DEwoqw.gif

Create the target connection and add a target asset

Repeat the process from two previous steps. Instead of Db2 for z/OS choose Db2 or Db2 Warehouse according to your target database.

1*IHm-RfspA7LX8gQ7GRP7Sw.gif

Add relationships to the data assets and connections

IsSameAs can be used to mark the source connection to be same as the target connection, but also to mark the source data asset to be same as the replicated, target data asset. IsContainedIn can be used to mark a data asset as contained in a connection (or Contains in the opposite direction).

1*XyWfyIZjKro04D-XGnhr9Q.gif

Create Data Profile

(Benefit 1 — for Data Steward)

Data Profiles include generated metadata and statistics about the content of a data asset. An asset profile helps data stewards understand what actions to take to improve the data quality.

1*3McA-3HLClQJ4AEAFWPFRg.gif

Use the data assets in Watson Studio

Create a project and add data assets

(Benefit 2 — Data Scientist / Software Engineer)

If you go to Watson Studio and create a new project (or use an existing one), you can add data assets from this catalog to it.

1*Agu_qKMniI2OkGgXaLQhfA.gif

Create a notebook and load an asset into a data frame

Then you can use that asset for data analysis and modeling. In a Python/R notebook you will get an automatically generated block of code. Watson Studio will ask WKC for data and WKC will use the connection metadata to retrieve the data from the database. You can use the loaded data as a data frame.

1*3VuYnQofeF3R9XBlSc5Yvw.gif

Conclusion

We have shortly described Db2 Data Gate and introduced Watson Knowledge Catalog. We pointed out the benefits of their integration. By following the step by step video you were instructed how to get the integration yourself. To better materialize the benefits we went through a couple of example scenarios likely to be relatable in your usage flow.

For further reading, please check Daniel's blog here on Use Current Db2 for z/OS Data on Cloud, Without Direct Mainframe Access and Without Loosing Control Over Your Data

Read more…

By Sowmya Kameswaran and Jens Müller

 

As we all know, Db2 for z/OS has one of the largest footprints in the enterprise database world. Your organization may have all or most of its' business critical data on Db2 for z/OS (you are not alone and we wouldn't recommend you to change a thing about that!). We realize though that many organizations are experimenting with or, rearchitecting and extending their infrastructure to take advantage of hybrid cloud.
With the changing environment and importance of hybrid cloud, cloud-based applications need easy, secure access to this data for modern applications. IBM Db2 Data Gate for z/OS makes data from Db2 for z/OS readily accessible on the IBM Cloud Pak for Data platform for business users and application developers.
 
As-is scenario
We have embarked on an exciting era of compelling modern application development. There is a surge in both reporting applications requiring read-only access to transactional data as well as data-intensive analytics applications requiring access to historical data. Organizations are developing new compelling applications for differentiating services delivered to their customers. Since much of this data originates in Db2 for z/OS, many organizations have built custom ETL (extract, transform, load) jobs to extract and load this data into other databases to support their application needs. While this approach may work in the short term, some of the problems with this approach are:
 
  1. Expensive to create and maintain over the course of time (due to complexity, and the costs of synchronizing source and target databases, and ensuring transactional consistency if necessary)
  2. Data security concerns once data is moved from where it originates
  3. Increased operational processes and cost on IBM Z
 
Why Db2 Data Gate
  • It is an integrated solution to securely access data from Db2 for z/OS on the cloud without the need for direct access to Db2 for z/OS.
  • Avoids significant investment in building and maintaining custom ETL solutions to move Db2 for z/OS data.
  • Provides better data currency via the Integrated Synchronization feature that replicates data from Db2 for z/OS to IBM Cloud Pak for Data.
  • Significantly reduces (operation) cost of data replication on mainframe since 96 % of the underlying data synchronization technology is zIIP eligible.
  • Enables modernization and transformation in your enterprise's Journey to Cloud.
  • With Integrated Synchronization, the data availability for applications accessing data from the source is not affected (source tables are fully online for reading and writing) while data is replicated to the target.
  • HA/DR is built directly into IBM Cloud Pak for Data 

 

Architecture highlights

9712629474?profile=RESIZE_710x

 

Db2 Data Gate is based on Db2 (row store) or Db2 Warehouse (column store) as the target databases within IBM Cloud Pak for Data. This model makes it suitable for supporting  applications that require row level access  as well as analytical applications that benefit from column based data store. Only one Db2 for z/OS database can be used as the data source. The key aspect of the architecture is the Integrated Synchronization feature which is optimized to replicate data from Db2 for z/OS to Db2 running under IBM Cloud Pak for Data.
 
With Db2 Data Gate and the target database running on the IBM Cloud Pak for Data platform, the solution works wherever the platform is able to run – private, public or hybrid cloud implementations, thus making relevant data readily available to application developers and business users where they need it.
 
Lab performance benchmarks
Db2 Data Gate boasts unrivaled performance when compared to any other data synchronization tool synchronizing data from Db2 for z/OS to Db2 (Warehouse).
 
With IBM Cloud Pak for Data and Db2 Data Gate installed on Linux on IBM Z, using Db2 Warehouse as the target database using hostPath data storage:
 
  • Peak load performance (for making initial copy): 2.1 TB/h
  • Peak synchronization performance: 200k rows/sec at 1.2 secs peak latency
 
Db2 Data Gate does not:
  • Replace traditional transactional systems since it only provides read-only data access to source data
  • Serve as a replacement for other data replication technologies because it supports only one source and one target and also does not support bidirectional replication or data transformation
  • Support data versioning and hence is not an operational data store
  • Guarantee data currency for high-volume transactional workloads. It is not a carbon copy of source data. Meaning, applications requiring absolute currency should access data at its point of origin.
 
User Interface snapshots
 
Db2 Data Gate provisioning
Select the target database type and deployment, resource allocation and network routing to proceed with creation of the Db2 Data Gate instance.
 
9712621054?profile=RESIZE_710x
Setting up source
Once the instance is created, the first step is to point the Db2 Data Gate instance to the Db2 for z/OS subsystem to be used as data source.
 
9712599073?profile=RESIZE_710x9712580674?profile=RESIZE_710x
 
 
Select and add tables from source to target
The next step is to select the tables to synchronize data from source to target.
 
9712580064?profile=RESIZE_710x
 
Db2 Data Gate dashboard
Overview of status and activities associated with the provisioned Db2 Data Gate instance.
 
9712578895?profile=RESIZE_710x 
 
Db2 Data Gate in action
The video below demonstrates the Db2 Data Gate end user experience. One of the key aspects to note is that when the source tables are added, loaded and setup for synchronization with Db2 Data Gate, there is no impact to concurrent workloads executing on the source tables. The source tables are fully online for reading and writing while Db2 Data Gate makes the copy and starts synchronizing.
 
 
Our next blog ...
Next time we will look at Db2 for z/OS Data Gate and Watson Knowledge Catalog integration.
Read more…