Licence Creative Commons

IBM DB2: how to get a compressed table

29. 04. 2014

I needed to recreate an issue which concerned contention of access to the compression dictionary of an IBM DB2 table. Inevitably, need arose to get myselves a compressed table with minimal effort and with minimal system requirements: inserting millions of rows "to get the compression to occur somehow" is not a "clean" way of doing this.

Therefore, I spent some time studying the DB2 compression etc., and to "whoever may be concerned", I hereby present quick and simple way how to create a minimalistic compressed DB2 table (and comp dictionary).

First, obviously, you must create the table with the "COMPRESS YES" statement. Don't forget to create unique index on it!

Kód:
SET CURRENT SQLID= 'AUTHC72C'; 
COMMIT;
DROP DATABASE DBC72C;
COMMIT;
CREATE DATABASE DBC72C; 
CREATE TABLESPACE TSC72C IN DBC72C COMPRESS YES; 
CREATE TABLE TC72C(PKEY INT NOT NULL PRIMARY KEY GENERATED ALWAYS 
 AS IDENTITY (START WITH 1 INCREMENT BY 1), 
 TEXTDATA VARCHAR(996) 
 WITH DEFAULT 'ARMADILLOISARMEDMEDICALARMSDEALER')
IN DBC72C.TSC72C; 
Create unique index IC72C ON TC72C(PKEY);
COMMIT;

Now, why did I make a table with 1000B long rows? Because of the way the compression algorithm works. DB2 uses the LZ78 algorithm, which creates an adaptive dictionary of the longest string read so far. Which means that if you have data like "ARMED ARMADILLO", dictionary would have a single record for the string "ARM" and substitute it in the ARMadillo onwards.

But... DB2 compression would only activate if the compression would result in more rows fitting inside a single page of data.

Which means that if you have your typical 4kB page, and 4 1000B records... You'll need to compress just 226 bytes (or 22%) of each TEXTDATA column to trigger toe compression. With the LZ78 algorithm, that is completely feasible, ergo: way to go!

Not so fast. There is one more condition for the compression to activate: the table size must be larger than "lower threshold." Fortunately for you, I did the resarch, so I could assure you that this threshold is usually no greater than 2MB.

Now, you could happily proceed forward and flood your table with armadillos or any other animals of choice. Using the DB2's capability of inner selects helps a lot:

Kód:
INSERT INTO AUTHC72C.TC72C(TEXTDATA) VALUES(
'ARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTAN'||
'ARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADI'||
'LLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMORE'||
'DRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISN'||
'OTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATAR'||
'MADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANAR'||
'MOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILL'||
'OISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDR'||
'ATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOT'||
'ANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMA'||
'DILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMO'||
'REDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOI'||
'SNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRAT'||
'ARMADILLOISNOTANARMOREDRATARMADILLOISNOTANARMOREDRATARMADILLOISNOTAN'||
'ARMOREDRATARMADILLOISNOTANARMOREDRAT');
COMMIT;
INSERT INTO AUTHC72C.TC72C(TEXTDATA)
VALUES(SELECT TEXTDATA FROM AUTHC72C.TC72C WHERE PKEY=1); 
COMMIT; 

And repeat the latter insert cca. 2050x.

Oh, one more thing: compression is not turned on if all the rows in your table are identical. So just throw some more random garbage in -- less than 10 rows should be enough.

The compression is not going to turn on just like that, however: you must run either LOAD or REORG utility. And since LOAD (without preceeding UNLOAd, that is) would deprive you of your collection of armed armadillos, run REORG.

Now, based on what we know about DB2, you should have gotten yourselves a nicely compressed table. However, testing isn't about guessing. How do you know for sure?

I'll tell you just in the beginning so that you wouldn't lose your time: forget about DSN0COMP. That is only useful as an "advisory utility", but it would be of no help to you as to whether your table has been compressed or not.

What you really need is the DSN1PRNT utility. Provide it with your table's VSAM's DSN (don't worry: usual path to this dataset is "<your subsystem name>.*.<your DB name>"), don't forget to give it PARM='FORMAT' and off you go. Since DSN1PARM is an IBM utility, it should be finished before you manage exit the JCL editor. Now, open the SYSPRINT of your job and try to FIND the string PGSFLAG (not PGFLAGS -- these are useless to you). Did you find any? If your PGSFLAGS'es values equals '04'X (or, possibly, 08hex, or maybe others -- it's IBM, right?), then congratulations: you indeed do have a compressed table.

Also, when you unload and then reload compressed table into another one, the IBM LOAD utility is nice enough to give you a human readable report on how much space you have just saved using the compression (as you can see, my armed armadillos are quite compression-friendly, as intended):

Kód:
141 KB WITHOUT COMPRESSION
20 KB WITH COMPRESSION
85 PERCENT OF THE BYTES SAVED FROM COMPRESSED DATA ROWS

100 PERCENT OF THE LOADED ROWS WERE COMPRESSED

70 BYTES FOR AVERAGE UNCOMPRESSED ROW LENGTH
12 BYTES FOR AVERAGE COMPRESSED ROW LENGTH

39 PAGES REQUIRED WITHOUT COMPRESSION
9 PAGES REQUIRED WITH COMPRESSION
76 PERCENT OF THE DB2 DATA PAGES SAVED USING COMPRESSED DATA

To sum it up: if you want compressed rows in a IBM DB2 table, following conditions must be met:

  1. the table must have been created or altered to have the COMPRESS YES clause
  2. the rows in a page would not be compressed unless their compression using LZ78 would enable at least one more row to fit onto that page
  3. rows in the table are not all the same
  4. the table contains the minimum of 2MB of data

Good luck and happy hunting.




Vzkaz autoru

Na článek můžete reagovat použitím následujícího formuláře.
Váš komentář:

Váš e-mail (očekáváte-li reakci z mé strany):

Níže prosím opište dvě zdeformovaná slova z obrázku do rámečku pod nimi.
Jedná se o ochranu proti zasílání spamu, protože počítačoví roboti ona slova nedokáží rozluštit. Pokud je nedokážete rozluštit ani vy, kliknutím na ikonku reproduktoru se vám přehrají.