Common DB2 for z/OS RUNSTATS collection mistakes

The DB2 optimizer exploits the catalog information to choose an access path for a given SQL statement, and RUNSTATS is what populates the catalog. It is important therefore to collect a good base of statistics to give the optimizer a fighting chance of choosing well-performing access paths for your SQL.

Here we will discuss some common mistakes that are made when running RUNSTATS.

Mistake #1 – Collecting statistics on indexes and tablespaces at different times.

It is common to collect inline statistics as part of a REORG. When a tablespace is REORG’d, all indexes are also rebuilt, and thus statistics collection will establish a consistent view of the tablespace and its associated indexes. However, Index REORGs may also be scheduled separately to rebuild a disorganized index. For index REORGs, collecting inline statistics can result in inconsistencies if data volumes have changed since the last RUNSTATS on the tablespace or other indexes for that table. Such inconsistencies could impact the access path choice since the optimizer does not attempt to resolve statistics conflicts between related objects.

It is therefore recommended to only execute RUNSTATS (or inline stats) on the tablespace and associated indexes at the same time, and not to collect only on an individual index during REORG.

Mistake #2 – Misunderstanding NUMCOLS specification for index RUNSTATS.

I often see the following usage of NUMCOLS as part of RUNSTATS on an index:

RUNSTATS TABLE(ALL)

INDEX(ALL) FREQVAL NUMCOLS 5 COUNT 10

This will collect the top 10 most frequently occurring values for the combination of the 1st 5 columns of an index. The mistake however is the assumption that RUNSTATS will cascade the collection of all combinations – such as frequencies on the 1st 4 columns, the 1st 3 columns, the 1st 2 columns, and finally the 1st column. This is incorrect, as you only get what you ask for with this specification. You ask for 5 columns – you get 5 columns.

The solution is to specify NUMCOLS 4, then NUMCOLS3, NUMCOLS2 and NUMCOLS 1. However, this takes us to the next mistake – where such statistics are rarely used.

Mistake #3 – Excessive NUMCOLS collection as a default

For those who understand the syntax for multi-column frequencies, they often use the following default:

RUNSTATS TABLE (ALL) INDEX (ALL) KEYCARD

FREQVAL NUMCOLS 1 COUNT 10

FREQVAL NUMCOLS 2 COUNT 10

FREQVAL NUMCOLS 3 COUNT 10

FREQVAL NUMCOLS 4 COUNT 10

FREQVAL NUMCOLS 5 COUNT 10

This is itself a mistake – simply because multi-column frequency statistics are rarely useful, and thus default collection for all indexes is excessive. While these are efficient for RUNSTATS to collect, they add unnecessary rows to the catalog (SYSIBM.SYSCOLDIST), and increase bind/prepare time since the optimizer must read them in and process these.

For the optimizer to take advantage of multi-column frequencies, the query needs equals predicates with literals on all columns in that group. So the index with COL1, COL2, COL3, COL4, COL5 – you need 5 equals predicates, all with literals (no host variables or parameter markers). This is not common.

There will be some situations where multi-column frequencies are beneficial – but it is rarely worth collecting multi-column frequencies by default and instead collect them if an individual query is identified where they will benefit. If you choose to stop collecting multi-column frequencies, then read on to the next mistake.

Mistake #4 Failing to clean up or recollect “one-off” statistics

When additional frequency, histogram or other colgroup statistics are collected, it is common to cease collection if there was no observed benefit. The question then is – did you remove them? Unfortunately the answer is often NO.

When RUNSTATS executes, it only replaces existing statistics with new values – it does not clear out other statistics collected on that tablespace or index. For example, if you collect NUMCOLS 2 COUNT 10, but the next RUNSTATS execution does NOT specify this clause, then those statistics will remain from their prior collection. This leads to statistics becoming stale, and that can lead to access path issues.

The simple way to determine if there are older (potentially stale) statistics is to query SYSIBM.SYSCOLDIST looking for distinctly different STATSTIMEs for the same table. One exception – since frequency statistics are stored as a percentage of the data, then these only become stale if the stored percentages are no longer representative. It is feasible to collect single column frequencies once, and not recollect them regularly – so be careful cleaning up single column frequency entries (TYPE=’F’ and NUMCOLUMNS=1).

DB2 11 RUNSTATS delivers a new RUNSTATS option – RESET ACCESSPATH – to reset statistics on the objects to -1. This is an effective way to remove old statistics, but ensure you follow up by recollecting statistics with your standard RUNSTATS options. Alternatively, frequency statistics can be removed by specifying COUNT 0 – for example, if you previously collected NUMCOLS 5 COUNT 10, you can remove those specific statistics by using NUMCOLS 5 COUNT 0. This approach does not apply to histogram or other colgroup statistics. Another alternative is to simply issue a DELETE statement to remove the unnecessary rows from SYSIBM.SYSCOLDIST.

Tooling such as IBM Optim Query Workload Tuner can help identify stale or conflicting statistics, and can also identify additional statistics that would benefit your workload. DB2 11 for z/OS also helps identify statistics that are missing based upon your SQL, and also statistics conflicts that exist. As mentioned in mistake #1 and #4, conflicts are often because of statistics being collected at different times – which points to an error in your procedures, or because statistcs were collected and never recollected or deleted. But you shouldn’t wait until they introduce a statistical conflict before you address these.

E-mail me when people leave their comments –

You need to be a member of WorldofDb2 to add comments!

Join WorldofDb2

Comments

  • Today there is no way to persist the dynamic statement cache. Allowing dynamic SQL to stabilize (and store their structures in the catalog similar to static SQL) is something that is in plan for a future release of DB2. But it is too early to confirm when that will be delivered. My hope is that within the next few months we can provide clarity.

  • Thanks, so much !!!   One more question, of course...   Can the dynamic statement cache be made persistent across recycling a subsystem?

    We do weekly reorgs across our SAP subsystems on Monday thru Thursday evenings, but recycle the subsystems late Saturday night.  That loses the cache and everything has to rebind using current stats.  Tables that reorg every week on Tuesday or Wednesday likely don't have the best Real Time Statistics at that point.  Normally, they rebind on the first exertion after the reorg, which gives good stats for the bind.  We've had issues with bad first binds in the past.  It seems like the problems always happen from completely losing the cache.

  • Bill, SAP uses REOPT(ONCE) for dynamic SQL in DB2 for z/OS, and thus the optimizer sees the literal values at first execution for choosing the access path. Therefore, what you observed is very possible for an SAP environment.

  • I recently had an SAP join of two large tables that was performing badly and getting a "bad" access path.  Everything else looked good, so I looked into SYSCOLDIST.  Sure enough, we had old rows for both tables.  I deleted the oldest rows first, one table at a time. The SQL gradually improved.  With all the old rows gone, it now has a "good" access path and is running 4+ times faster (7-8 minutes elapsed, instead of 30-45 minutes).

    However,...  This was dynamic SQL and used all parameter markers.  Based on your article, it should not have improved.  I'm wondering about the join predicates...  Could DB2 have been comparing the SYSCOLDIST stats between the two tables being joined, deciding they did not match well and made a bad decision because of that, even though no literals were present?

  • This is very good and useful information and would be even more useful when information on which generic parameters I should use for a DB2 Subsystem with 394 production databases and 8173 tables were added :-)

  • This is useful info and very easy to Catch. I will have to change one of our automated precesses for reorgs.

  • This is very well written and useful info. Thank you.

  • This is pretty cool Terry  105 views in less that 10 minutes - people must find this information  valuable. 

This reply was deleted.