How histogram can affect query plan change hence performance.

1.Let me create a test case.I am loading the table with following distribution.Obviously it means data is somewhat skewed.

Value:-1
Range:-10

Value:-2
Range:-100

Value:-3
Range:-1000

Value:-4
Range:-10000

Value:-5
Range:-100000

C##TEST@TESTDB1> create table tt1(c1 number,c2 char(10));

Table created.

C##TEST@TESTDB1> declare
begin
for j in 1..10 loop
2 3 4 insert into tt1 values(1,’A’);
5 end loop;
6 for j in 1..100 loop
7 insert into tt1 values(2,’A’);
8 end loop;
9 for j in 1..1000 loop
10 insert into tt1 values(3,’A’);
11 end loop;
12 for j in 1..10000 loop
13 insert into tt1 values(4,’A’);
end loop;
14 15 for j in 1..100000 loop
insert into tt1 values(5,’A’);
16 17 end loop;
18 commit;
19 end;
20 /

PL/SQL procedure successfully completed.

2.Gather statistics.

C##TEST@TESTDB1> EXEC DBMS_STATS.gather_table_stats(‘C##TEST’, ‘TT1’);

PL/SQL procedure successfully completed.

3.The query shows the statistics collection did not gather histogram.

C##TEST@TESTDB1> COLUMN column_name FORMAT A20

SELECT column_id,
column_name,
histogram
FROM user_tab_columns
WHERE table_name = ‘TT1’
ORDER BY column_id;

COLUMN_ID COLUMN_NAME HISTOGRAM
———- ——————– —————
1 C1 NONE
2 C2 NONE

4.Let me create index on C1 column.

C##TEST@TESTDB1> create index I1_TT1 on TT1(c1);

Index created.

5.Checking plan with c1=1.It must select 10 rows but query estimator will estimate the query would resturn 22222 rows (Total_rows/5)

C##TEST@TESTDB1> set serveroutput off;

C##TEST@TESTDB1> select /*+ gather_plan_statistics */ count(1) from tt1 where c1=1;

COUNT(1)
———-
10

C##TEST@TESTDB1> SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)); 2

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
SQL_ID cvywn9uwakwgx, child number 0
————————————-
select /*+ gather_plan_statistics */ count(1) from tt1 where c1=1

Plan hash value: 1950687611

————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 2 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 2 | INDEX RANGE SCAN| I1_TT1 | 1 | 22222 | 10 |00:00:00.01 | 2 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“C1″=1)

19 rows selected.

6.5.Checking plan with c1=5.It must select 10 rows but query estimator will estimate the query would resturn 22222 rows (Total_rows/5)

C##TEST@TESTDB1> select /*+ gather_plan_statistics */ count(1) from tt1 where c1=5;

COUNT(1)
———-
100000

C##TEST@TESTDB1> SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)); 2

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
SQL_ID af3u4z2rmuq3s, child number 0
————————————-
select /*+ gather_plan_statistics */ count(1) from tt1 where c1=5

Plan hash value: 1950687611

———————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
———————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.15 | 198 | 197 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.15 | 198 | 197 |
|* 2 | INDEX RANGE SCAN| I1_TT1 | 1 | 22222 | 100K|00:00:00.01 | 198 | 197 |
———————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“C1″=5)

19 rows selected.

7.Now let me collect frequency based histogram using below command.

C##TEST@TESTDB1> EXEC DBMS_STATS.gather_table_stats(ownname=>’C##TEST’,tabname=>’TT1′,method_opt=>’FOR COLUMNS C1′);

PL/SQL procedure successfully completed.

8.Check bucket distribution using following query.

C##TEST@TESTDB1> select
table_name
, column_name,
histogram,num_distinct,num_buckets from user_tab_col_statistics
Where table_name = ‘TT1’ and column_name = ‘C1’; 2 3 4 5

TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
—————————— ——————– ————— ———— ———–
TT1 C1 FREQUENCY 5 5

C##TEST@TESTDB1> SELECT endpoint_value,
endpoint_number,
endpoint_number – LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value) AS frequency
FROM user_tab_histograms
WHERE table_name = ‘TT1’
AND column_name = ‘C1’
ORDER BY endpoint_value; 2 3 4 5 6 7

ENDPOINT_VALUE ENDPOINT_NUMBER FREQUENCY
————– ————— ———-
1 10 10
2 110 100
3 1110 1000
4 11110 10000
5 111110 100000

8.Checking plan with c1=1 again (Please re-parse the statement ).

It must select 10 rows and query estimator will also estimate the query would resturn 10 rows (due to histogram)

C##TEST@TESTDB1> select /*+ gather_plan_statistics */ count(1) from tt1 where c1= 1;

COUNT(1)
———-
10

C##TEST@TESTDB1> SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)) 2
3 ;

PLAN_TABLE_OUTPUT
——————————————————————————–
SQL_ID 61sr5p68bawwz, child number 0
————————————-
select /*+ gather_plan_statistics */ count(1) from tt1 where c1= 1

Plan hash value: 1950687611

——————————————————————————–
——

| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buf
fers |

——————————————————————————–
——

| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 |
2 |

| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 |
2 |

|* 2 | INDEX RANGE SCAN| I1_TT1 | 1 | 10 | 10 |00:00:00.01 |
2 |

——————————————————————————–
——

Predicate Information (identified by operation id):
—————————————————

2 – access(“C1″=1)

9.Checking plan with c1=5 again (Please re-parse the statement ).

It must select 100K rows and query estimator will also estimate the query would resturn 100K rows (due to histogram).
Also note It now goes for INDEX FAST FULL SCAN rather than INDEX RANGE SCAN.

C##TEST@TESTDB1> select /*+ gather_plan_statistics */ count(1) from tt1 where c1= 5;

COUNT(1)
———-
100000

C##TEST@TESTDB1> SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)) 2 ;

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
SQL_ID 14pmbpytcgv7y, child number 0
————————————-
select /*+ gather_plan_statistics */ count(1) from tt1 where c1= 5

Plan hash value: 3548606959

——————————————————————————————
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
——————————————————————————————
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 226 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 226 |
|* 2 | INDEX FAST FULL SCAN| I1_TT1 | 1 | 100K| 100K|00:00:00.01 | 226 |
——————————————————————————————

Predicate Information (identified by operation id):
—————————————————

2 – filter(“C1″=5)

Leave a Reply

Your email address will not be published. Required fields are marked *