IBM Db2 - The Ultimate Database for Cloud, Analytics & Mobile
Part 2: IBM Db2 Analytics Accelerator
What’s Driving Digital Transformation?
Customers are not as loyal as they used to be. As a customer, if you don‘t get the experience you are looking for, you look for alternatives. Digital transformation will be rated by your customers based on that experience. Customer retain & help driving your business.
How quickly you can deliver something new into the market, you might have all the data. You need to make that data available. You might have collected and consolidated the data in different places (customer data/ credit card information) in order to get 360° view. If you add more and more data to this kind of data repository, there are challenges like how fast you can make the data available in a format that you can draw resolutions from that and make the experience for the customer.
Last but not least not just as customer base, but also when working with your partners (BPs or other partners), you want to simplify and streamline the whole workflow around data provisioning.
Machine learning is helping to drive improved customer service through personalization, drive greater employee productivity by helping to make the best decision at the right time and serves as a foundation for innovation by uncovering customer behaviors and product uses that you may not be aware of.
According to a Forrester Research study: “Insight-driven businesses bring insight, not just data, into every decision, and they know exactly how to use them for the greatest advantage across the entire customer life cycle. For these firms, digital insights and what they do with them are their secret weapons --- to disrupt your market and steal your customers”.
That statement really encapsulates what Machine Learning can bring to your organization.
But What is really Machine Learning?
Let’s start with the basics. It feels like you can hardly listen to a podcast, blog or technical publication without coming across Machine Learning. It powers everyday services we use like Watson, Siri, Facebook, and Google. But what is it? In short, it’s a way for computers to learn without explicitly being programmed. Or said another way, it’s software that can write software.
Not exactly a new idea: Neural networks were discovered by Santiago Ramón y Cajal early 1900, AI followed in 1950s… but, it finally works! Or at least, it works in specialized forms: Language translation, language understanding, speech transcription, Object detection, face recognition, …
Classification where Data points are labeled and are being used to predict a category. You can have Two-classes vs multi-classes. Typical examples are Fraud Detection (fraud vs non-fraud), Spam email detection (spam vs non-spam).
Regression when a value is being predicted, for instance Stock prices prediction.
Clustering when Data points are not labeled. The goal is then to group data into clusters to better organize the data.
To accomplish these calculations, Machine Learning is using a huge variety of mathematical algorithms to compute all the input information. These algorithms are classified in 3 groups:
Supervised Learning which is Task driven using Regression and Classification technics.
Unsupervised Learning which is Data driven using Clustering technics.
And Reinforcement Learning when Algorithms learn how to react to the environment.
IBM Machine Learning for z/OS
Now let’s focus specifically on IBM Machine Learning for z/OS.
Data science efforts represent a significant investment in skills, time, hardware and software. Data scientists and the business teams that depend on them, want to get the most out of the models that they develop.
To help data scientists build, deploy and monitor behavioral models, IBM introduced Machine Learning for z/OS. Machine Learning for z/OS is an enterprise grade, collaborative and extensible machine learning offering. It runs on IBM Z and benefits from all of the advantages mentioned before.
As it provides faster model development, deployment and monitoring, Machine Learning for z/OS offers a quick return on investment.
Through a hybrid cloud approach to model life-cycle management and collaboration it gives data science teams the flexibility to train and evaluate their models on their platform of choice. And for organizations that develop models on other platforms, using Spark or Python, they can easily deploy these models on IBM Z, where the majority of their transactions occur and most of their enterprise data originates.
Scoring can be easily integrated with transactional applications, without significant overhead, enabling real time insight at the point of interaction.
Machine Learning for z/OS is built on open source components but delivers far more than open source does. IBM's unique patented features enable data science teams to collaborate productively.
A data science team could certainly do a great deal of this themselves, but it would demand a significant investment in time and money. They would be wasting time and effort building software instead of building models that help you transform and innovate.
Model Development Tools for Both Coders and Non-Coders
Machine Learning for z/OS has IDE for both coders and non-coders. Because data scientists usually have various background.
For some data scientists, especially data scientists with compute science or mathematical background, they need more powerful tools and usually they have their favorite data processing, visualization and all kinds of libraries. They prefer to use programing IDE.
In Machine Learning for z/OS, we provide Jupyter Notebook for Coder. Jupyter Notebook is an open source tool, probably the most popular open source IDE for data scientists. Jupyter Notebook supports interactive programming. This is critical because the work of data scientists is iterative. Jupyter Notebook supports multiple languages like Scala or Python. It also supports table, charts, graphs for visualization.
Last but not the least, it supports import/export so people can share their work.
For some others, they like to use wizard or canvas to create models by drag & drop. Because they don’t have to learn a programming language in this way, or can quickly create a prototype.
For non-coders, besides Notebook, we also provide Visual Model Builder. They can just follow a wizard to process data and create a Spark ML model without any coding. Data scientists tell the tool where the data is, what the features and label are, what the algorithm they want to choose, then a model is created. Very simple.
Utilities to Accelerate Every Stage of Machine Learning
Data scientists can also use the libraries packaged in Machine Learning for z/OS to accelerate the most time-consuming work.
For instance, data preparation is painful for most data scientists. They have to spend time to fix the data quality problems. For instance, they need to fill the missing values if any exists in the attributes that matter. They need to figure out how to encode or index string data type to numeric data type because algorithm takes numeric data type as input only. Machine Learning for z/OS provides a utility auto data preparation (ADP) to automate the work saving data scientists’ time.
Machine Learning for z/OS also provides automatically modelling tool Cognitive Assist for Data Scientists (CADS). CADS can help data scientists find out the model with best performance from dozens or hundreds of candidates. It’s not just automation to evaluate all models and pick the best one. It uses a smart approach by evaluating the performance of a model on a small dataset to predict the performance of model on a larger dataset. In this way, CADS saves lots of resources and time to evaluate all hundreds of models. The same methodology can be applied to hyper-parameter tuning. The utility to automate hyper parameter tuning is called HPO. They both come from IBM research and could significantly help data scientist pick the best model with less time and resource.
Highly Available Online Scoring Services
Online scoring service is the service making prediction instantly and usually called in transaction. So, the performance, availability and scalability of online scoring service are very important to operationalize machine learning.
Machine Learning for z/OS supports PMML, Spark ML and scikit-learn models. PMML is an industry standard that almost all vendors and open source tools support. Spark ML is the format Spark machine learning library generates. Scikit-learn is the most popular machine learning library in Python community. So, we have a good coverage on mainstream machine learning framework.
Machine Learning for z/OS scoring engines exist for each model type. PMML and Spark ML use JVM based scoring engines, deployed in Liberty. Scikit-learn use the Python based scoring engine, deployed in Flask and uWSGI server, that are Python web server.
Also, Machine Learning for z/OS supports JVM based scoring service deployed in CICS region. This is a new feature of CICS TS 5.3 and 5.4. With this feature, we could see significant performance benefit. That also simplifies the work for COBOL developers making scoring calls.
As all services on Z, high availability is a basic requirement. Machine Learning for z/OS Online scoring service takes advantage of Liberty’s HA architecture.
To recap on the benefits of Machine Learning for z/OS, it Moves Machine Leaning capability to the platform where the most valuable data resides, it Integrates real-time predictive analytics with transactions and finally it Leverages z/OS superior reliability, availability and security.
Part 4: Operational Decison Manager
Add a Comment