Wising up



We've been attempting to tune some datwarehouse operations this week and I came across some behaviour that i wasn't expecting so took to otn to try and get an explanation of what was happening and to confirm what i wanted to do was indeed possible.

First up some background on the job we are running in the warehouse - its actually a very simple statement where we take all the rows from table A and table B that match on a join key and dump the results in a 3rd table C.

The only thing that makes it interesting is that both tables are quite large

Table A = 1 million rows
Table B = 1 billion rows

The end result of the join being 500000000 rows.

Both tables are interval partitioned on the join column on some arbitrary value of chunks of 20000 numbers - so we have quite a large number of partitions per table too.

As the tables are equi partitioned i expected the optimal way to get the result set would be to do a parallel partition wise join.

Just to clarify as it's not a particularly widely used feature a partition wise join (PWJ) uses a single process to read from chunk A in table A and chunk A in table B - that way the process is only ever looking at rows that could be joined to each other and we don't end up with processes randomly reading in data and spewing it to multiple other recipients to then process - making the whole thing more efficient. This is only ever possible with equipartitioned tables.

So now that's explained I'll come on to the problem- which is basically it didn't seem to work for interval partitioning.

As the example was not really practical to let people investigate and reproduce i came up with a simple test case using tables created as copies of DBA_OBJECTS.

The test case is as follows:

-- interval case
  CREATE TABLE "SB_DWH_IN"."TEST1"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY RANGE ("OBJECT_ID") INTERVAL (10000)
(PARTITION "LESS_THAN_ZERO"  VALUES LESS THAN (0)  TABLESPACE "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;

  CREATE TABLE "SB_DWH_IN"."TEST2"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY RANGE ("OBJECT_ID") INTERVAL (10000)
(PARTITION "LESS_THAN_ZERO"  VALUES LESS THAN (0)  TABLESPACE "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;

--hash case
  CREATE TABLE "SB_DWH_IN"."TEST1"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY HASH ("OBJECT_ID") partitions 8
store in ( "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;

  CREATE TABLE "SB_DWH_IN"."TEST2"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY HASH ("OBJECT_ID") partitions 8
store in ( "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;

After these tables are created the SQL used to access them is as follows:


select /*+ PARALLEL(TEST2,8) PARALLEL(TEST1,8) */ *
from "SB_DWH_IN"."TEST2","SB_DWH_IN"."TEST1"
where TEST1.object_id=test2.object_id

The interval partitioned one produced a plan like this

nonPWJ.PNG

And the hash one produced this

pwjenabled.PNG

You'll note the plans are 'similar' - the key difference which shows it as being a PWJ in the hash case is there is only one set of slaves (see the icons on the bottom left in blue) - in the interval case you can see a red set and a blue set being shown and messages being spewed between them.

So what's going on - is interval partitioning not supported for PWJ?

After some interesting back ad forth i decided to try the range case (which is essentially interval anyway to see what that produced.

--range case


  CREATE TABLE "SB_DWH_IN"."TEST5"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY RANGE ("OBJECT_ID")
(PARTITION x1 VALUES LESS THAN (50000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x2 VALUES LESS THAN (100000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x3 VALUES LESS THAN (150000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x4 VALUES LESS THAN (200000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x5 VALUES LESS THAN (250000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x6 VALUES LESS THAN (300000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x7 VALUES LESS THAN (350000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x8 VALUES LESS THAN (MAXVALUE)  TABLESPACE "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;




  CREATE TABLE "SB_DWH_IN"."TEST6"
  TABLESPACE "SB_DWH_INTEGRATION"
  PARTITION BY RANGE ("OBJECT_ID")
(PARTITION x1 VALUES LESS THAN (50000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x2 VALUES LESS THAN (100000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x3 VALUES LESS THAN (150000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x4 VALUES LESS THAN (200000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x5 VALUES LESS THAN (250000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x6 VALUES LESS THAN (300000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x7 VALUES LESS THAN (350000)  TABLESPACE "SB_DWH_INTEGRATION" ,
PARTITION x8 VALUES LESS THAN (MAXVALUE)  TABLESPACE "SB_DWH_INTEGRATION" )
as select * from DBA_OBJECTS where object_id is not null;

--query to run

  select /*+ PARALLEL(TEST5,8) PARALLEL(TEST6,8) */ *
from "SB_DWH_IN"."TEST5","SB_DWH_IN"."TEST6"
where TEST5.object_id=test6.object_id

This produced a PWJ

range.PNG

At this point i was getting very confused....

Then Jonathan announced that it was working fine for him with the same test case and i got even more confused - what was wrong for me?

It then came about that it's the CBO doing it's thing here and deciding in some cases that PWJ is not the best optimization seemingly based on the number of partitions!

In his test case he'd ended up with 16 partitions which fitted nicely in some calculation with running parallel 8, however in his next test he got 15 partitions and no PWJ!

In my range and hash case i also had 8 so that's why it decided to work.

I'm still not clear what te actual nature of the calculation is but i managed to get my statements to run in test with PWJ with the following query

select /*+ PARALLEL(TEST2,41) PARALLEL(TEST1,41)  */ *
from "SB_DWH_IN"."TEST2","SB_DWH_IN"."TEST1"
where TEST1.object_id=test2.object_id

So using 41 ( so tantalizingly close to 42 i know... :-)) it worked fine and it's related to the fact that i had 123 partitions in my test case - into which 41 is divisible. Every other degree of paralleism i tried would not PWJ!

So the lesson here is that the number of partitions can be important in determining if PWJ can be used or not - not something i really expected.

As the intervals keep getting added you could flip between a non PWJ and PWJ plans which wouldn't be great....

There is no real resolution for this i don't think - we plan to change our partitioning to hash and use factors of 2 as the hash number - i.e. 128 or 256 etc.... which should always be able to PWJ as the maths is all 'nice'.

An interesting little puzzle - just glad i had help to resolve it!

Comments