UDF for Regular Expressions in Db2 12 for z/OS without IDAA

Introduction:

I read the great article "Pattern matching using regular expression in DB2?" by Jane Man
(https://www.worldofdb2.com/profiles/blogs/pattern-matching-using-re...).
In essence, she used the built-in XPath function fn:matches which handles regular expressions nicely.
(Might be helpful to read her article before continuing.)
Her conclusion: "Now, we can do pattern matching using regular expression in DB2!!"
Yes - but may I add:

It would help if you know a bit of XML, know how to code XPath expressions.

Since version 12, Db2 for z/OS offers five new built-in functions named REGEXPR_xxxx() which enable you to work with regular expressions more easily (without XML skill).

Yet, these new functions require the IDAA!

Goal:

In my class on version 12 a student (whose company does not have IDAA) inspired me to develop a UDF, which does not rely on IDAA, but still enables someone without XPath skills to easily employ regular expressions in Db2 for z/OS.

Db2 means:

At first glance, it seemed an easy task: Just create a UDF with two parameters - for the two arguments of the fn:matches function. But: The second argument of fn:matches must be a literal, so you cannot simply pass the UDF parameters!

Hence, dynamic SQL is needed. And luckily: Since version 12, Db2 for z/OS supports dynamic SQL within a UDF!

The UDF:

With these means ( XPath, dynamic SQL ) I created the

UDF "islike_regexpr()" which yields 1 (else 0)

if the first parameter satisfies the regular expression, provided as second parameter.

First, let's see how you can use this UDF

(whereby the second parameter, the regular expression, must be enclosed in apostrophes). In the following example, I used the regular expression '.*[0-9].*'  - meaning I'm looking for department names containing at least one digit ( [0-9] ) with any number [*] of any other character [.] before of after the digit(s).

[Remark: long statement, if you use LIKE instead.]

(a) UDF used in the SELECT clause:

---------+---------+---------+---------+---------+---------+
SELECT islike_regexpr(deptname, '.*[0-9].*' ) AS match
, deptname
FROM DSN81210.DEPT
#
---------+---------+---------+---------+---------+---------+
MATCH DEPTNAME
---------+---------+---------+---------+---------+---------+
    0 SPIFFY COMPUTER SERVICE DIV.
    0 PLANNING
.. etc ...
    0 OPERATIONS
    0 SOFTWARE SUPPORT
    1 BRANCH OFFICE F2
    1 BRANCH OFFICE G2
    1 BRANCH OFFICE H2
    1 BRANCH OFFICE I2
    1 BRANCH OFFICE J2
DSNE610I NUMBER OF ROWS DISPLAYED IS 14
---------+---------+---------+---------+---------+---------+

(b) UDF primarily used in the WHERE clause (and the SELECT clause, too):

---------+---------+---------+---------+---------+------
SELECT islike_regexpr(deptname, '.*[0-9].*' ) AS match
, deptname
FROM DSN81210.DEPT
WHERE islike_regexpr(deptname, '.*[0-9].*' ) = 1
#
---------+---------+---------+---------+---------+------
MATCH DEPTNAME
---------+---------+---------+---------+---------+------
    1 BRANCH OFFICE F2
    1 BRANCH OFFICE G2
    1 BRANCH OFFICE H2
    1 BRANCH OFFICE I2
    1 BRANCH OFFICE J2
DSNE610I NUMBER OF ROWS DISPLAYED IS 5
---------+---------+---------+---------+---------+------


Now have a look at the UDF - comments, explanations will follow.
(This UDF only fulfills the basic function, i.e.,
- it is not optimized
- it does not handle inappropriate input (see enhancement further down)
- the VARCHAR lengths are arbitrary/not tested
- ... it should just give you the rough idea)

---------+---------+---------+---------+----
CREATE FUNCTION islike_regexpr
( parm1 VARCHAR(256)
, regexpr VARCHAR(256) )
--RETURNS VARCHAR(1000)
RETURNS SMALLINT
LANGUAGE SQL
READS SQL DATA
BEGIN
DECLARE result SMALLINT ;
DECLARE stmt_str VARCHAR(1000) ;
DECLARE s1 STATEMENT ;
DECLARE c1 CURSOR FOR s1 ;


SET stmt_str =


'WITH temp1 (parm1_as_xml_col) AS '                 CONCAT
' ( SELECT XMLQUERY (''<doc>{$p1}</doc>'' '     CONCAT
' PASSING '''                                                        CONCAT
parm1                                                                  CONCAT
''' AS "p1" ) '                                                         CONCAT
' FROM SYSIBM.SYSDUMMY1 ) '                          CONCAT
' SELECT SMALLINT(COUNT(*)) FROM temp1 '    CONCAT
' WHERE XMLEXISTS (''$px1[fn:matches(.,"'        CONCAT
regexpr                                                              CONCAT
'" ) ] '' PASSING parm1_as_xml_col AS "px1" ) ' ;


PREPARE s1 FROM stmt_str ;
OPEN c1 ;
FETCH c1 INTO result ;
CLOSE c1 ;
--RETURN stmt_str ;
RETURN result ;
END
#
---------+---------+---------+---------+----

Explanations:


- I used a slightly different SELECT statement (with a common table expression, see below)
  compared to the statement Jane Man proposed.
- How does the generated dynamic statement string (stored in the variable stmt_str) look like?
  For the first DEPT row, this statement string with the two parameters
  a) deptname =  'SPIFFY COMPUTER SERVICE DIV.'
  b) the regular expression: '^(S).*(V).*'
  evaluates to:

WITH temp1 (parm1_as_xml_col) AS
( SELECT XMLQUERY ('<doc>{$p1}</doc>'
                PASSING 'SPIFFY COMPUTER SERVICE DIV.' AS "p1" )
   FROM SYSIBM.SYSDUMMY1 )
SELECT SMALLINT(COUNT(*))
   FROM temp1
  WHERE XMLEXISTS ('$px1[fn:matches(.,"^(S).*(V).*" ) ] '
                PASSING parm1_as_xml_col AS "px1" )

Testing:


How can you test/verify that the statement string generated in the UDF really evaluates to the above query?
You let the UDF display the generated statement string by changing the UDF at three places:
1. Uncomment: RETURNS VARCHAR(1000)
   and comment: RETURNS SMALLINT
2. Comment the four lines
   PREPARE s1 FROM stmt_str ;
   OPEN c1 ;
   FETCH c1 INTO result ;
   CLOSE c1 ;
3. Uncomment: RETURN stmt_str ;
   and comment: RETURN result ;
- If you then invoke the UDF in the SELECT clause of a query, it displays the stmt_str.
- This un-/commenting might help you, if you modify the SELECT statement and want to test your modification.

Enhancement:


- When using EMP.LASTNAME instead of DEPT.DEPTNAME as input

  (and the simpler regular expression '^(H).*'),

  the UDF produces an error:

---------+---------+---------+---------+---------+---------+---------+-------
SELECT lastname
FROM DSN81210.EMP
WHERE islike_regexpr(lastname, '^(H).*' ) = 1 
---------+---------+---------+---------+---------+---------+---------+-------
LASTNAME
---------+---------+---------+---------+---------+---------+---------+-------
HAAS
HENDERSON
DSNE610I NUMBER OF ROWS DISPLAYED IS 2
DSNT408I SQLCODE = -104, ERROR: ILLEGAL SYMBOL "' AS parm1_as_col FROM
SYSIBM.SYSDUMMY1 ) , temp2 (parm1_as_xml_col) AS ( SELECT XMLQUERY
('". SOME SYMBOLS THAT MIGHT BE LEGAL ARE: , FROM INTO
---------+---------+---------+---------+---------+---------+---------+-------


- Observation: One LASTNAME is "O'CONNEL" - which contains an apostrophe.
- Therefore the input parameter needs some pre-processing, the UDF must be enhanced:
  Add the following SET parm1 = ... statement

  (between DECLARE c1 .. and SET stmt_str ..) :

  DECLARE c1 CURSOR FOR s1 ;
  SET parm1 = REPLACE(parm1,'''','''''') ;
  SET stmt_str = ..

- Then the statement string in the UDF evaluates to

  (note the two apostrophes inside O''CONNELL):

WITH temp1 (parm1_as_xml_col) AS
( SELECT XMLQUERY ('<doc>{$p1}</doc>'
              PASSING 'O''CONNELL' AS "p1" )
   FROM SYSIBM.SYSDUMMY1 )
SELECT SMALLINT(COUNT(*))
   FROM temp1
  WHERE XMLEXISTS ('$px1[fn:matches(.,"^(H).*" ) ] '
                   PASSING parm1_as_xml_col AS "px1" )


- and the above query produces the correct result without error:

---------+---------+---------+---------+---------+---------+---------+-------
LASTNAME
---------+---------+---------+---------+---------+---------+---------+-------
HAAS
HENDERSON
HEMMINGER
DSNE610I NUMBER OF ROWS DISPLAYED IS 3
---------+---------+---------+---------+---------+---------+---------+-------

Performance considerations:


1.
- This UDF should perform well enough for small or medium large tables -

  but probably not for large tables.
- One approach: Shrink the original table by first applying some LIKE predicate(s),
  so the UDF islike_regexpr() only operates on a subset of all table rows.

2.
- I tried to use a parameter marker (?) instead of "parm1" in the statement string.
- Together with UDF option CONCENTRATE STATEMENTS WITH LITERALS,

  this should help with the DSC.

But when I replaced parm1 by '?', I received:

-418 A STATEMENT STRING TO BE PREPARED CONTAINS AN INVALID USE OF PARAMETER MARKERS
Explanation [abbreviated]:
The statement cannot be executed because a parameter marker has been used in an invalid way.
Examples of places where parameter markers cannot be used:
- As an argument to an XMLEXISTS predicate.

I'm afraid, the same is true for XMLQUERY, so I stopped further inquiries/trials.

Closing:


If you are more experienced with dynamic SQL, you may find it challenging to find a solution for using parameter markers?! - would be great!

Hopefully the proposed UDF can help you!

Views: 54

Add a Comment

You need to be a member of The World of DB2 to add comments!

Join The World of DB2

Bringing Db2 enthusiasts together virtually. Expert or novice, distributed or mainframe, this is the place for everything DB2.

Forum

Introducing IBM Db2 for z/OS Developer Extension for Microsoft Visual Studio Code

Started by Calene Janacek in Application Development and DB2 Jul 30. 0 Replies

We are excited to announce that the first iteration of IBM Db2 for z/OS Developer Extension is available now as a free downloadable extension in the…Continue

QMF Governor

Started by Maitena Gallastegi Ginea in Application Development and DB2. Last reply by Maitena Gallastegi Ginea Jul 30. 4 Replies

Hi,We are using QMF Governor to limit the QMF queries of users.We have configured correctly and it is working OK. We want to get statistics of those queries canceled by QMF Governor but we are not able to discover where that information is stored.…Continue

© 2020   Created by Surekha Parekh.   Powered by

Badges  |  Report an Issue  |  Terms of Service