Oracle 12c gather statistics oracle deep dive

I referred some reference from my favorite Tim Hall blog (ORACLE_BASE)

A.Basic of optimizer statistics: –

The optimizer cost model relies on statistics collected about the objects involved in a query, and the database and host where the query runs. Statistics are critical to the optimizer’s ability to pick the best execution plan for a SQL statement.

Object

What influences statistics

Views

Table Statistics
1.No of rows.
2.No of blocks.
3.Avg row length.
1.DBA_TABLES
2.DBA_TAB_STATISTICS
Column Statistics
1.No of distinct values.
2.No of nulls
3.Data distribution(Histogram)
4.Extended statistics
1.DBA_TAB_COLS
2.DBA_TAB_COL_STATISTICS
3.USER_PART_COL_STATISTICS
4.USER_SUBPART_COL_STATISTICS
Index Statistics
No of distinct values.
No of leaf blocks.
Clustering g factor
1.DBA_INDEXES
2.DBA_IND_STATISTICS
Histograms
Data distribution in columns
DBA_TAB_HISTOGRAMS
Extended Stats
Relation between data stored in different columns of the same table
DBA_TAB_COL_STATISTICS
System statistics
I/O performance and utilization.
CPU performance and utilization
sys.aux_stats$

 

B.Configuring Automatic Optimizer Statistics Collection Using Cloud Control FOR gather statistics oracle 

Please go to “Administration”->”Oracle Schedure”->”Automated Maintenance Tasks”

 

You can see 3 default automatic jobs scheduled. You must choose “Optimizer Statistics Gathering”

Now click on “Optimizer Statistics Gathering”

You may click on “Global Statistics Gathering Options” for viewing configured parameters.

 

You must choose “Configure” button to enable or disable /Change Window.

You may check by sqlplus also to check status of  Automatics Optimizer Stats Collection
SQL> COL CLIENT_NAME FORMAT a31
SELECT CLIENT_NAME, STATUS
FROM DBA_AUTOTASK_CLIENT
WHERE CLIENT_NAME = ‘auto optimizer stats collection’;
SQL>   2    3
CLIENT_NAME                     STATUS
——————————- ——–
auto optimizer stats collection ENABLED

C.Manual Statistics Gather standard parameters  FOR gather statistics oracle 

Preference Description Default (11gR2) Scope Version
CASCADE
Determines if index stats should be gathered for the current table (TRUE, FALSE, AUTO_CASCADE).
DBMS_STATS.AUTO_CASCADE
G, D, S, T
10gR1+
DEGREE
Degree of parallelism (integer or DEFAULT_DEGREE).
DBMS_STATS.DEFAULT_DEGREE
G, D, S, T
10gR1+
ESTIMATE_PERCENT
Percentage of rows to sample when gathering stats (0.000001-100 or AUTO_SAMPLE_SIZE).
DBMS_STATS.AUTO_SAMPLE_SIZE
G, D, S, T
10gR1+
METHOD_OPT
Controls column statistics collection and histogram creation.
FOR ALL COLUMNS SIZE AUTO
G, D, S, T
10gR1+
NO_INVALIDATE
Determines if dependent cursors should be invalidated as a result of new stats on objects (TRUE, FALSE or AUTO_INVALIDATE).
DBMS_STATS.AUTO_INVALIDATE
G, D, S, T
10gR1+
AUTOSTATS_TARGET
Determines which objects have stats gathered (ALL, ORACLE, AUTO).
AUTO
G
10gR2+
GRANULARITY
The granularity of stats to be collected on partitioned objects (ALL, AUTO, DEFAULT, GLOBAL, ‘GLOBAL AND PARTITION’, PARTITION, SUBPARTITION).
AUTO
G, D, S, T
10gR2+
PUBLISH
Determines if gathered stats should be published immediately or left in a pending state (TRUE, FALSE).
TRUE
G, D, S, T
11gR2+
INCREMENTAL
Determines whether incremental stats will be used for global statistics on partitioned objects, rather than generated using table scans (TRUE, FALSE).
FALSE
G, D, S, T
11gR2+
CONCURRENT
Should objects statistics be gathered on multiple objects at once, or one at a time (MANUAL, AUTOMATIC, ALL, OFF).
OFF
G
12cR1+
GLOBAL_TEMP_TABLE_STATS
Should stats on global temporary tables be session-specific or shared between sessions (SHARED, SESSION).
SESSION
G, D, S
12cR1+
INCREMENTAL_LEVEL
Which level of synopses should be collected for incremental partitioned statistics (TABLE, PARTITION).
PARTITION
G, D, S, T
12cR1+
INCREMENTAL_STALENESS
How is staleness of partition statistics determined (USE_STALE_PERCENT, USE_LOCKED_STATS, NULL).
NULL
G, D, S, T
12cR1+
TABLE_CACHED_BLOCKS
The number of blocks cached in the buffer cache during calculation of index cluster factor. Jonathan Lewis recommends “16” as a sensible value.
1
G, D, S, T
12cR1+
OPTIONS
Used for the OPTIONS parameter of the GATHER_TABLE_STATS procedure (GATHER, GATHER AUTO).
GATHER
G, D, S, T
12cR1+

D.Some standard statistics gather  FOR gather statistics oracle 

Gather statistics:-

EXEC DBMS_STATS.gather_database_stats;
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15);
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15, cascade => TRUE);
EXEC DBMS_STATS.gather_schema_stats(‘SCOTT’); EXEC DBMS_STATS.gather_schema_stats(‘SCOTT’, estimate_percent => 15);
EXEC DBMS_STATS.gather_schema_stats(‘SCOTT’, estimate_percent => 15, cascade => TRUE);
EXEC DBMS_STATS.gather_table_stats(‘SCOTT’, ‘EMPLOYEES’);
EXEC DBMS_STATS.gather_table_stats(‘SCOTT’, ‘EMPLOYEES’, estimate_percent => 15);
EXEC DBMS_STATS.gather_table_stats(‘SCOTT’, ‘EMPLOYEES’, estimate_percent => 15, cascade => TRUE);
 EXEC DBMS_STATS.gather_table_stats(ownname=>’C##TEST’,tabname=>’TT1′,method_opt=>’FOR COLUMNS C1′);

Delete statistics:-

EXEC DBMS_STATS.delete_database_stats;
EXEC DBMS_STATS.delete_schema_stats(‘SCOTT’);
EXEC DBMS_STATS.delete_table_stats(‘SCOTT’, ‘EMP’);
EXEC DBMS_STATS.delete_column_stats(‘SCOTT’, ‘EMP’, ‘EMPNO’);
EXEC DBMS_STATS.delete_index_stats(‘SCOTT’, ‘EMP_PK’);

E.Determining When Optimizer Statistics Are Stale  FOR gather statistics oracle 

A.First create a demo table and load some data.

SQL> set linesize 300;

SQL> create table tab1(c1 number(10),c2 char(100));

Table created.

SQL> insert into tab1 select rownum,’A’ from dual connect by rownum<10000;

9999 rows created.

SQL> commit;

Commit complete.

B.Now statistics are not populated.

SQL> select NUM_ROWS,BLOCKS,STALE_STATS from dba_tab_statistics where table_name=’TAB1′;

NUM_ROWS     BLOCKS STA

———- ———- —

C.Let me gather statistics now

 

SQL>  exec dbms_stats.gather_table_stats(‘TEST’,’TAB1′);

PL/SQL procedure successfully completed.

 

SQL> select NUM_ROWS,BLOCKS,STALE_STATS from dba_tab_statistics where table_name=’TAB1′;

NUM_ROWS     BLOCKS STA

———- ———- —

9999        244 NO

D.Now again let me load more data

SQL> insert into tab1 select rownum,’A’ from dual connect by rownum<10000;

9999 rows created.

SQL> commit;

Commit complete.

E.Still DBA_TAB_STATISTICS shows STALE statistics NO.

SQL> select NUM_ROWS,BLOCKS,STALE_STATS from dba_tab_statistics where table_name=’TAB1′;

NUM_ROWS     BLOCKS STA

———- ———- —

9999        244 NO

 

F.Now let me flush/write the database monitoring information from memory to disk

SQL>  BEGIN
 DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
END;
/  2    3    4
PL/SQL procedure successfully completed.

 

G.Now the statistics will show STALE

SQL> select NUM_ROWS,BLOCKS,STALE_STATS from dba_tab_statistics where table_name=’TAB1′;

NUM_ROWS     BLOCKS STA

———- ———- —

9999        244 YES

F.Gathering Statistics for Fixed Objects  FOR gather statistics oracle 

Fixed objects are dynamic performance tables and their indexes. These objects record current database activity. Unlike other database tables, the database does not automatically use dynamic statistics for SQL statement referencing X$ tables when optimizer statistics are missing. Instead, the optimizer uses predefined default values. These defaults may not be representative and could potentially lead to a suboptimal execution plan. Thus, it is important to keep fixed object statistics current.
To gather schema statistics using GATHER_FIXED_OBJECTS_STATS:
  1. Start SQL*Plus, and connect to the database with the appropriate privileges for the procedure that you intend to run.
  2. Run the DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure, specifying the desired parameters. Chapter 12 Gathering Optimizer Statistics Manually

G.Gathering Statistics for Volatile Tables Using Dynamic Statistics  FOR gather statistics oracle 

A.Let us think my table TAB2 is extremely volatile. So We want dynamic statistics to be collected on the fly to avoid optimizer gather wrong statistics.

First let me create the demo table.

SQL> create table tab2(c1 number(10),c2 char(100));

Table created.

Let me gather statistics now.

SQL> exec dbms_stats.gather_table_stats(‘TEST’,’TAB2′);

PL/SQL procedure successfully completed.

 

SQL> select NUM_ROWS,BLOCKS,STALE_STATS from dba_tab_statistics where table_name=’TAB2′;

NUM_ROWS     BLOCKS STA

———- ———- —

0          0 NO

Now I need to load data again

SQL> insert into tab2 select rownum,’A’ from dual connect by rownum<10000;

9999 rows created.

SQL> commit;

Commit complete.

Now check explain plan does not show proper cardinality

SQL> set autotrace traceonly;

SQL> select * from tab2;

9999 rows selected.

Execution Plan

———————————————————-

Plan hash value: 2156729920

————————————————————————–

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

————————————————————————–

|   0 | SELECT STATEMENT  |      |     1 |   115 |     2   (0)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| TAB2 |     1 |   115 |     2   (0)| 00:00:01 |

Now We need to delete the statistics and lock the statistics. So optimizer will not gather statistics if there is any change in table.

sqlplus test/test

SQL> exec dbms_stats.delete_table_stats(‘TEST’,’TAB2′);

PL/SQL procedure successfully completed.

 

SQL> exec dbms_stats.lock_table_stats(‘TEST’,’TAB2′);

PL/SQL procedure successfully completed.

Now as there is no statistics, Oracle optimizer uses dynamic sampling.

The following table describes the levels for dynamic statistics. Note the following:

  • If dynamic statistics are not disabled, then the database may choose to use dynamic statistics when a SQL statement uses parallel execution.

  • If the OPTIMIZER_ADAPTIVE_FEATURES initialization parameter is true, then the optimizer uses dynamic statistics when relevant SQL plan directives exist. The database maintains the resulting statistics in the server result cache, making them available to other queries.

SQL> set autotrace traceonly;

SQL>  select * from tab2;

9999 rows selected.

Execution Plan

———————————————————-

Plan hash value: 2156729920

————————————————————————–

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

————————————————————————–

|   0 | SELECT STATEMENT  |      |  8435 |   947K|    68   (0)| 00:00:01 |

|   1 |  TABLE ACCESS FULL| TAB2 |  8435 |   947K|    68   (0)| 00:00:01 |

————————————————————————–

Note

—–

   – dynamic statistics used: dynamic sampling (level=2)

H.Enabling Concurrent Statistics Gathering  FOR gather statistics oracle 

Oracle Database 11g Release 2 (11.2.0.2) introduces a new statistics gathering mode, ‘concurrent statistics gathering’. The goal of this new mode is to enable a user to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.

To enable concurrent statistics gathering, use the DBMS_STATS.SET_GLOBAL_PREFS procedure to set the CONCURRENT preference. Possible values are as follows:
  • MANUAL Concurrency is enabled only for manual statistics gathering.
  • AUTOMATIC Concurrency is enabled only for automatic statistics gathering.
  • ALL Concurrency is enabled for both manual and automatic statistics gathering.
  • OFF Concurrency is disabled for both manual and automatic statistics gathering. This is the default value.
This tutorial in this section explains how to enable concurrent statistics gathering.

Create a partitioned table.

SQL> create table t1 (id number, sometext varchar2(50),my_date date) tablespace data partition by hash (id) partitions 16;

Please note that concurrency is OFF.

SQL> SELECT DBMS_STATS.GET_PREFS(‘CONCURRENT’) FROM DUAL;
DBMS_STATS.GET_PREFS(‘CONCURRENT’)
——————————————————————————–
OFF
SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = ‘DEFAULT_PLAN’;
System altered.

Check if you have sufficient job process

SQL> show parameter job;

NAME                                 TYPE        VALUE

———————————— ———– ——————————

job_queue_processes                  integer     1000

Now you change concurrency to ‘ALL’

SQL> BEGIN
 DBMS_STATS.SET_GLOBAL_PREFS(‘CONCURRENT’,’ALL’);
END;
/  2    3    4

PL/SQL procedure successfully completed.

SQL> SELECT DBMS_STATS.GET_PREFS(‘CONCURRENT’) FROM DUAL;
DBMS_STATS.GET_PREFS(‘CONCURRENT’)
——————————————————————————–
ALL

Create a procedure to parallel insert data (30 sessions will insert data parallely)

SQL> create or replace procedure manyinserts as
v_m number;
 begin
   for i in 1..1000000 loop
  select round(dbms_random.value() * 44444444444) + 1 into v_m from dual t;
  insert /*+ new2 */ into t1 values (v_m, ‘DOES THIS’||dbms_random.value(),sysdate);
 end loop;
commit;
end;
 /
Procedure created.

 

SQL> create or replace procedure manysessions as
v_jobno number:=0;
begin
FOR i in 1..30 LOOP
 dbms_job.submit(v_jobno,’manyinserts;’, sysdate);
END LOOP;
commit;
end;
/

Procedure created.

SQL> exec manysessions;

PL/SQL procedure successfully completed.

Now gather schema in another session.

SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS(‘TEST’);

PL/SQL procedure successfully completed.

Please check status now.

SQL> /

SQL> SET LINESIZE 1000
SQL> COLUMN TARGET FORMAT a8
COLUMN TARGET_TYPE FORMAT a25
SQL> SQL> COLUMN JOB_NAME FORMAT a14
COLUMN START_TIME FORMAT a40
SELECT TARGET, TARGET_TYPE, JOB_NAME,
SQL> SQL>   2   TO_CHAR(START_TIME, ‘dd-mon-yyyy hh24:mi:ss’)
  3  FROM DBA_OPTSTAT_OPERATION_TASKS
  4  WHERE STATUS = ‘IN PROGRESS’
AND OPID = (SELECT MAX(ID)
  5    6   FROM DBA_OPTSTAT_OPERATIONS
 WHERE OPERATION = ‘gather_schema_stats’);
  7
TARGET   TARGET_TYPE               JOB_NAME       TO_CHAR(START_TIME,’DD-MON-YY
——– ————————- ————– —————————–
TEST.T1  TABLE                                    20-feb-2018 02:22:41
TEST.T1. TABLE PARTITION                20-feb-2018 02:22:41
SYS_P786

I.Incremental Statistics  FOR gather statistics oracle 

:-

In typical data warehousing environment existence of huge partitioned tables is very common, gathering statistics on such tables is challenging tasks. For partitioned tables there are two types of statistics Global and Partition level statistics. Gathering global statistics is very expensive and resource consuming operation as it scans whole table. Hence most of the time people use to reduce estimate_percent down to less than 1 percent. This does helps in reducing time taken to gather stats but may not be sufficient to represent the data distribution. Gathering partition level statistics is not so expensive as it gathers only for the partitions where data has been changed.
Traditionally statistics are gathered in two phase
  1. Scan complete table to gather Global statistics
  2. Scan only the partitions where data has been changed
Obviously global stats can be derived by using partition level stats like say for example number of rows at table level = just sum number of rows from all the partitions. But global stats like NDV(Number of Distinct Values) which is very important in calculating cardinality can’t be derived so easily. The only way to derive them is by scanning the whole table.
These synopsis data are stored in WRI$_OPTSTAT_SYNOPSIS$ and WRI$_OPTSTAT_SYNOPSIS_HEAD$ tables residing in SYSAUX tablespace. Table WRI$_OPTSTAT_SYNOPSIS$ will grow enormously as there will be individual synopsis created for each hash proportional to distinct value existing at table,partition and column level. Table WRI$_OPTSTAT_SYNOPSIS_HEAD$ will have each record for every table, partition, and column. In 11.1 release gathering incremental statistics would take longer time if you have wide tables with many partitions due to delete statement working on WRI$_OPTSTAT_SYNOPSIS$ table. In 11.2 this issue has been resolved by Range-Hash partitioning the WRI$_OPTSTAT_SYNOPSIS$ table.

Create a range based partition

CREATE TABLE t_range
 (ID NUMBER,
 CREATE_DATE DATE,
 NAME CHAR(100))
PARTITION BY RANGE(CREATE_DATE)
INTERVAL(NUMTODSINTERVAL(1,’DAY’)) STORE IN (tbs1,tbs2,tbs3,tbs4)
 (PARTITION part_old VALUES LESS THAN (TO_DATE(’04-FEB-2018′, ‘DD-MON-YYYY’)) TABLESPACE tbs1
);

Insert data in the table

SQL> Declare
Begin
For i in 1..10 loop
For j in 1..1000 loop
Insert into t_range values(j,sysdate+i,’DEBASIS’||to_char(i));
End loop;
End loop;
end;
/
PL/SQL procedure successfully completed.

Let me gather the statistics now:-

SQL> exec dbms_stats.gather_table_stats(‘SYS’,’T_RANGE’);

Please check the analyze date

SELECT partition_name,
       to_char( last_analyzed, ‘DD-MON-YYYY, HH24:MI:SS’ ) last_analyze,
       num_rows
FROM   DBA_TAB_PARTITIONS
WHERE  table_name = ‘T_RANGE’
ORDER  BY partition_position;
SQL> /
PARTITION_NAME                 LAST_ANALYZE                     NUM_ROWS
—————————— —————————— ———-
PART_OLD                       21-FEB-2018, 22:38:05                   0
SYS_P463                       21-FEB-2018, 22:38:05                1000
SYS_P464                       21-FEB-2018, 22:38:05                1000
SYS_P465                       21-FEB-2018, 22:38:05                1000
SYS_P466                       21-FEB-2018, 22:38:05                1000
SYS_P467                       21-FEB-2018, 22:38:05                1000
SYS_P468                       21-FEB-2018, 22:38:05                1000
SYS_P469                       21-FEB-2018, 22:38:05                1000
SYS_P470                       21-FEB-2018, 22:38:05                1000
SYS_P471                       21-FEB-2018, 22:38:05                1000
SYS_P472                       21-FEB-2018, 22:38:05                1000

Please check initial setting of statistics related parameters

SQL> SELECT dbms_stats.get_prefs(‘INCREMENTAL’,’SYS’,’T_RANGE’) “INCREMENTAL” FROM   dual;
INCREMENTAL
——————————————————————————–
FALSE
SQL> SELECT dbms_stats.get_prefs(‘PUBLISH’,’SYS’,’T_RANGE’) “PUBLISH” FROM   dual;
PUBLISH
——————————————————————————–
TRUE
SQL> SELECT dbms_stats.get_prefs(‘ESTIMATE_PERCENT’,’SYS’,’T_RANGE’) “ESTIMATE_PERCENT” FROM   dual;
ESTIMATE_PERCENT
——————————————————————————–
DBMS_STATS.AUTO_SAMPLE_SIZE
SQL> SELECT dbms_stats.get_prefs(‘GRANULARITY’,’SYS’,’T_RANGE’) “GRANULARITY” FROM   dual;
GRANULARITY
——————————————————————————–
AUTO

Now let me set TRUE to INCREMENTAL statistics

exec dbms_stats.set_table_prefs(‘TEST’,’T1′,’INCREMENTAL’,’TRUE’);

Please check synopsis table now

SELECT o.name         “Table Name”,
       p.subname      “Part”,
       c.name         “Column”,
       h.analyzetime  “Synopsis Creation Time”
FROM   WRI$_OPTSTAT_SYNOPSIS_HEAD$ h,
       OBJ$ o,
       USER$ u,
       COL$ c,
       ( ( SELECT TABPART$.bo#  BO#,
                  TABPART$.obj# OBJ#
           FROM   TABPART$ tabpart$ )
         UNION ALL
         ( SELECT TABCOMPART$.bo#  BO#,
                  TABCOMPART$.obj# OBJ#
           FROM   TABCOMPART$ tabcompart$ ) ) tp,
       OBJ$ p
WHERE  u.name = ‘SYS’ AND
       o.name = ‘T_RANGE’ AND
       tp.obj# = p.obj# AND
       h.bo# = tp.bo# AND
       h.group# = tp.obj# * 2 AND
       h.bo# = c.obj#(+) AND
       h.intcol# = c.intcol#(+) AND
       o.owner# = u.user# AND
       h.bo# = o.obj#
ORDER  BY 4,1,2,3
/
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              PART_OLD             CREATE_DATE          21-FEB-18
T_RANGE              PART_OLD             ID                   21-FEB-18
T_RANGE              PART_OLD             NAME                 21-FEB-18
T_RANGE              SYS_P463             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P463             ID                   21-FEB-18
T_RANGE              SYS_P463             NAME                 21-FEB-18
T_RANGE              SYS_P464             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P464             ID                   21-FEB-18
T_RANGE              SYS_P464             NAME                 21-FEB-18
T_RANGE              SYS_P465             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P465             ID                   21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P465             NAME                 21-FEB-18
T_RANGE              SYS_P466             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P466             ID                   21-FEB-18
T_RANGE              SYS_P466             NAME                 21-FEB-18
T_RANGE              SYS_P467             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P467             ID                   21-FEB-18
T_RANGE              SYS_P467             NAME                 21-FEB-18
T_RANGE              SYS_P468             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P468             ID                   21-FEB-18
T_RANGE              SYS_P468             NAME                 21-FEB-18
T_RANGE              SYS_P469             CREATE_DATE          21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P469             ID                   21-FEB-18
T_RANGE              SYS_P469             NAME                 21-FEB-18
T_RANGE              SYS_P470             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P470             ID                   21-FEB-18
T_RANGE              SYS_P470             NAME                 21-FEB-18
T_RANGE              SYS_P471             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P471             ID                   21-FEB-18
T_RANGE              SYS_P471             NAME                 21-FEB-18
T_RANGE              SYS_P472             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P472             ID                   21-FEB-18
T_RANGE              SYS_P472             NAME                 21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P473             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P473             ID                   21-FEB-18
T_RANGE              SYS_P473             NAME                 21-FEB-18

Now I will again load data only to one partition

SQL> Declare
Begin
For j in 1..1000 loop
Insert into t_range values(j,sysdate+1,’DEBASIS’||to_char(j));
End loop;
end;  2    3    4    5    6
  7  /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.

Let me gather stats again

SQL>  exec dbms_stats.gather_table_stats(‘SYS’,’T_RANGE’);
PL/SQL procedure successfully completed.

Now We will notice that the partition,where we load data is only analyzed avoiding costly full scan of all partitions

SQL>  SELECT partition_name,
       to_char( last_analyzed, ‘DD-MON-YYYY, HH24:MI:SS’ ) last_analyze,
       num_rows
FROM   DBA_TAB_PARTITIONS
WHERE  table_name = ‘T_RANGE’
ORDER  BY partition_position;  2    3    4    5    6
PARTITION_NAME                 LAST_ANALYZE                     NUM_ROWS
—————————— —————————— ———-
PART_OLD                       21-FEB-2018, 22:46:58                   0
SYS_P473                       21-FEB-2018, 22:46:58                1000
SYS_P463                       21-FEB-2018, 22:50:15                2000
SYS_P464                       21-FEB-2018, 22:46:58                1000
SYS_P465                       21-FEB-2018, 22:46:58                1000
SYS_P466                       21-FEB-2018, 22:46:58                1000
SYS_P467                       21-FEB-2018, 22:46:58                1000
SYS_P468                       21-FEB-2018, 22:46:58                1000
SYS_P469                       21-FEB-2018, 22:46:58                1000
SYS_P470                       21-FEB-2018, 22:46:58                1000
SYS_P471                       21-FEB-2018, 22:46:58                1000
SYS_P472                       21-FEB-2018, 22:46:58                1000

We can see the preference for the INCREMENTAL still TRUE

SQL> SELECT dbms_stats.get_prefs(‘INCREMENTAL’,’SYS’,’T_RANGE’) “INCREMENTAL” FROM   dual;
INCREMENTAL
——————————————————————————–
TRUE

Now let me turn INCREMENTAL statistics to FALSE again

SQL> exec dbms_stats.set_table_prefs(‘SYS’,’T_RANGE’,’INCREMENTAL’,’FALSE’);
PL/SQL procedure successfully completed.
SQL> SELECT dbms_stats.get_prefs(‘INCREMENTAL’,’SYS’,’T_RANGE’) “INCREMENTAL” FROM   dual;
INCREMENTAL
——————————————————————————–
FALSE
SQL> SELECT o.name         “Table Name”,
  2         p.subname      “Part”,
       c.name         “Column”,
  3    4         h.analyzetime  “Synopsis Creation Time”
FROM   WRI$_OPTSTAT_SYNOPSIS_HEAD$ h,
  5    6         OBJ$ o,
       USER$ u,
  7    8         COL$ c,
       ( ( SELECT TABPART$.bo#  BO#,
                  TABPART$.obj# OBJ#
  9   10   11             FROM   TABPART$ tabpart$ )
         UNION ALL
         ( SELECT TABCOMPART$.bo#  BO#,
 12   13   14                    TABCOMPART$.obj# OBJ#
           FROM   TABCOMPART$ tabcompart$ ) ) tp,
 15   16         OBJ$ p
WHERE  u.name = ‘SYS’ AND
       o 17  .name = ‘T_RANGE’ AND
 18   19         tp.obj# = p.obj# AND
       h.bo# = tp.bo# AND
 20   21         h.group# = tp.obj# * 2 AND
       h.bo# = c.obj#(+) AND
       h.intcol# = c.intcol#(+) AND
 22   23   24         o.owner# = u.user# AND
       h.bo# = o.obj#
ORDER  BY 4,1,2,3
 25   26   27  /
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              PART_OLD             CREATE_DATE          21-FEB-18
T_RANGE              PART_OLD             ID                   21-FEB-18
T_RANGE              PART_OLD             NAME                 21-FEB-18
T_RANGE              SYS_P464             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P464             ID                   21-FEB-18
T_RANGE              SYS_P464             NAME                 21-FEB-18
T_RANGE              SYS_P465             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P465             ID                   21-FEB-18
T_RANGE              SYS_P465             NAME                 21-FEB-18
T_RANGE              SYS_P466             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P466             ID                   21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P466             NAME                 21-FEB-18
T_RANGE              SYS_P467             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P467             ID                   21-FEB-18
T_RANGE              SYS_P467             NAME                 21-FEB-18
T_RANGE              SYS_P468             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P468             ID                   21-FEB-18
T_RANGE              SYS_P468             NAME                 21-FEB-18
T_RANGE              SYS_P469             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P469             ID                   21-FEB-18
T_RANGE              SYS_P469             NAME                 21-FEB-18
T_RANGE              SYS_P470             CREATE_DATE          21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P470             ID                   21-FEB-18
T_RANGE              SYS_P470             NAME                 21-FEB-18
T_RANGE              SYS_P471             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P471             ID                   21-FEB-18
T_RANGE              SYS_P471             NAME                 21-FEB-18
T_RANGE              SYS_P472             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P472             ID                   21-FEB-18
T_RANGE              SYS_P472             NAME                 21-FEB-18
T_RANGE              SYS_P473             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P473             ID                   21-FEB-18
T_RANGE              SYS_P473             NAME                 21-FEB-18
Table Name           Part                 Column               Synopsis
——————– ——————– ——————– ———
T_RANGE              SYS_P463             CREATE_DATE          21-FEB-18
T_RANGE              SYS_P463             ID                   21-FEB-18
T_RANGE              SYS_P463             NAME                 21-FEB-18
36 rows selected.
SQL> commit;
Commit complete.

Let me gather statistics again

SQL> exec dbms_stats.gather_table_stats(‘SYS’,’T_RANGE’);
PL/SQL procedure successfully completed.

Now please note statistics updated all the partitions of the table again which will go for  costly full table scan for all partitions

SQL> SELECT partition_name,
       to_char( last_analyzed, ‘DD-MON-YYYY, HH24:MI:SS’ ) last_analyze,
       num_rows
FROM   DBA_TAB_PARTITIONS
WHERE  table_name = ‘T_RANGE’
ORDER  BY partition_position;   2    3    4    5    6
PARTITION_NAME                 LAST_ANALYZE                     NUM_ROWS
—————————— —————————— ———-
PART_OLD                       21-FEB-2018, 23:06:47                   0
SYS_P473                       21-FEB-2018, 23:06:47                1000
SYS_P463                       21-FEB-2018, 23:06:47                3000
SYS_P464                       21-FEB-2018, 23:06:47                1000
SYS_P465                       21-FEB-2018, 23:06:47                1000
SYS_P466                       21-FEB-2018, 23:06:47                1000
SYS_P467                       21-FEB-2018, 23:06:47                1000
SYS_P468                       21-FEB-2018, 23:06:47                1000
SYS_P469                       21-FEB-2018, 23:06:47                1000
SYS_P470                       21-FEB-2018, 23:06:47                1000
SYS_P471                       21-FEB-2018, 23:06:47                1000
SYS_P472                       21-FEB-2018, 23:06:47                1000
12 rows selected.

J.SQL Plan directives  FOR gather statistics oracle 

SQL Plan Directives are one of the functionalities that make up Adaptive Query Optimization in Oracle Database 12c. SQL plan directives are like “extra notes” for the optimizer, to remind it that it previously selected a suboptimal plan, typically because of incorrect cardinality estimates. Incorrect cardinality estimates are often caused by issues like missing statistics, stale statistics, complex predicates or complex operators. Unlike SQL profiles, which are statement specific, SQL plan directives are linked to query expressions, so they can be used by several statements containing matching query expressions. Situations like missing histograms or missing extended statistics may result in SQL plan directives being generated.
The database manages SQL plan directives internally. Situations like automatic reoptimization may result in SQL plan directives being written to the SGA and later persisted to the SYSAUX tablespace, at which point they can be displayed using the DBA_SQL_PLAN_DIRECTIVES and DBA_SQL_PLAN_DIR_OBJECTS views. Alternatively, existing SQL plan directives can be persisted manually using the DBMS_SPD package.

Let me create test seup first

sqlplus test/test
SQL> DROP TABLE tab1 PURGE;
Table dropped.

SQL> CREATE TABLE tab1 (
id               NUMBER,
gender           VARCHAR2(1),
has_y_chromosome VARCHAR2(1),
CONSTRAINT tab1_pk PRIMARY KEY (id),
CONSTRAINT tab1_gender_chk CHECK (gender IN (‘M’, ‘F’)),
CONSTRAINT tab1_has_y_chromosome_chk CHECK (has_y_chromosome IN (‘Y’, ‘N’))
);

Table created.

SQL>
INSERT /*+ APPEND */ INTO tab1
SELECT level, ‘M’, ‘Y’
FROM   dual
CONNECT BY level <= 10;
COMMIT;
10 rows created.
SQL>Commit complete.
SQL>
INSERT /*+ APPEND */ INTO tab1
SELECT 10+level, ‘F’, ‘N’
FROM   dual
CONNECT BY level <= 90;
COMMIT;
 90 rows created.
SQL>Commit complete.
SQL> CREATE INDEX tab1_gender_idx ON tab1(gender);
CREATE INDEX tab1_has_y_chromosome_idx ON tab1(has_y_chromosome);
Index created.
Index created.
 SQL> EXEC DBMS_STATS.gather_table_stats(USER, ‘TAB1’);
 PL/SQL procedure successfully completed.

Now please check there is no histogram collected

SQL> COLUMN column_name FORMAT A20
 SELECT column_id,
       column_name,
       histogram
FROM   user_tab_columns
WHERE  table_name = ‘TAB1’
ORDER BY column_id;
 COLUMN_ID COLUMN_NAME          HISTOGRAM
———- ——————– —————
         1 ID                   NONE
         2 GENDER               NONE
         3 HAS_Y_CHROMOSOME     NONE

Please note that we observe cardinality miscalculation in explain plan,which is evident here.Cardinality mismatch always leads to improper explain plan hence poor performance.

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
       *
FROM   tab1
WHERE  gender = ‘M’
AND    has_y_chromosome = ‘Y’;
SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => ‘allstats last’));  2    3    4    5
        ID G H
———- – –
         1 M Y
         2 M Y
         3 M Y
         4 M Y
         5 M Y
         6 M Y
         7 M Y
         8 M Y
         9 M Y
        10 M Y
10 rows selected.
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  5t8y8p5mpb99j, child number 1
————————————-
SELECT /*+ GATHER_PLAN_STATISTICS */        * FROM   tab1 WHERE  gender
= ‘M’ AND    has_y_chromosome = ‘Y’
Plan hash value: 1552452781
—————————————————————————————————————–
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
—————————————————————————————————————–
|   0 | SELECT STATEMENT                    |                 |      1 |        |     10 |00:00:00.01 |       4 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TAB1            |      1 |     25 |     10 |00:00:00.01 |       4 ||*  2 |   INDEX RANGE SCAN                  | TAB1_GENDER_IDX |      1 |     50 |     10 |00:00:00.01 |       2 |
—————————————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   1 – filter(“HAS_Y_CHROMOSOME”=’Y’)
   2 – access(“GENDER”=’M’)
21 rows selected.

We will also observe that Oracle found that this sql query can be optimized further

SQL> COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16
SELECT sql_text, is_reoptimizable
FROM   v$sql
WHERE  sql_id = ‘5t8y8p5mpb99j’;SQL> SQL> SQL>   2    3
SQL_TEXT                                 IS_REOPTIMIZABLE
—————————————- —————-
SELECT /*+ GATHER_PLAN_STATISTICS */     Y
    * FROM   tab1 WHERE  gender = ‘M’ AN
D    has_y_chromosome = ‘Y’
SELECT /*+ GATHER_PLAN_STATISTICS */     Y
    * FROM   tab1 WHERE  gender = ‘M’ AN
D    has_y_chromosome = ‘Y’

At this point, the optimizer has written SQL plan directives into the SGA, but they are not visible to us as they have not been persisted to the SYSAUX tablespace.

SQL> SET LINESIZE 200
COLUMN dir_id FORMAT A20
COLUMN owner FORMAT A10
COLUMN object_name FORMAT A10
COLUMN col_name FORMAT A10SQL> SELECT TO_CHAR(d.directive_id) dir_id, o.owner, o.object_name,
o.subobject_name col_name, o.object_type, d.type, d.state, d.reason
2    3  FROM   dba_sql_plan_directives d, dba_sql_plan_dir_objects o
WHERE  d.directive_id=o.directive_id
4    5  AND    o.owner = ‘TEST’
ORDER BY 1,2,3,4,5;  6no rows selected

We can wait for the database to persist the SQL plan directives, or manually persist them using the DBMS_SPD package.

SQL> EXEC DBMS_SPD.flush_sql_plan_directive;
PL/SQL procedure successfully completed.

If we repeat the previous query, the SQL plan directives will be visible.

SQL> SET LINESIZE 200
COLUMN dir_id FORMAT A20
COLUMN owner FORMAT A10
COLUMN object_name FORMAT A10
COLUMN col_name FORMAT A10
SQL> SELECT TO_CHAR(d.directive_id) dir_id, o.owner, o.object_name,
       o.subobject_name col_name, o.object_type, d.type, d.state, d.reason
 FROM   dba_sql_plan_directives d, dba_sql_plan_dir_objects o
WHERE  d.directive_id=o.directive_id
 AND    o.owner = ‘TEST’
 ORDER BY 1,2,3,4,5;
DIR_ID               OWNER      OBJECT_NAM COL_NAME   OBJECT TYPE             STATE      REASON
——————– ———- ———- ———- —— —————- ———- ————————————
796134444163473829   TEST       TAB1       GENDER     COLUMN DYNAMIC_SAMPLING USABLE     SINGLE TABLE CARDINALITY MISESTIMATE
796134444163473829   TEST       TAB1                  TABLE  DYNAMIC_SAMPLING USABLE     SINGLE TABLE CARDINALITY MISESTIMATE

Now let me rerun the query and query explain plan will show proper cardinality with help of sql plan directive

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
       *
FROM   tab1
WHERE  gender = ‘M’
AND    has_y_chromosome = ‘Y’;
SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => ‘allstats last’));
  2    3    4    5
        ID G H
———- – –
         1 M Y
         2 M Y
         3 M Y
         4 M Y
         5 M Y
         6 M Y
         7 M Y
         8 M Y
         9 M Y
        10 M Y
10 rows selected.
SQL> SQL> SQL>
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  5t8y8p5mpb99j, child number 2
————————————-
SELECT /*+ GATHER_PLAN_STATISTICS */        * FROM   tab1 WHERE  gender
= ‘M’ AND    has_y_chromosome = ‘Y’
Plan hash value: 1552452781
—————————————————————————————————————–
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
—————————————————————————————————————–
|   0 | SELECT STATEMENT                    |                 |      1 |        |     10 |00:00:00.01 |       4 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TAB1            |      1 |     10 |     10 |00:00:00.01 |       4 |
|*  2 |   INDEX RANGE SCAN                  | TAB1_GENDER_IDX |      1 |     10 |     10 |00:00:00.01 |       2 |
—————————————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   1 – filter(“HAS_Y_CHROMOSOME”=’Y’)
   2 – access(“GENDER”=’M’)
Note
—–
   – dynamic statistics used: dynamic sampling (level=2)
   – statistics feedback used for this statement
   – 1 Sql Plan Directive used for this statement
27 rows selected.

K.Extended Statistics  FOR gather statistics oracle 

In real-world data, there is often a relationship or correlation between the data stored in different columns of the same table. For example, consider a customers table where the values in a cust_state_province column are influenced by the values in a country_id column, because the state of California is only going to be found in the United States. If the Oracle Optimizer is not aware of these real-world relationships, it could potentially miscalculate the cardinality estimate if multiple columns from the same table are used in the where clause of a statement. With extended statistics you have an opportunity to tell the optimizer about these real-world relationships between the columns.
By creating extended statistics on a group of columns, the optimizer can determine a more accurate cardinality estimate when the columns are used together in a where clause of a SQL statement.You can use DBMS_STATS.CREATE_EXTENDED_STATS to define the column group you want to have statistics gathered on as a whole. Once the group has been established Oracle will automatically maintain the statistics on that column group when statistics are gathered on the table.
If we continue with the initial example of the customers table, When the value of cust_state_province is ‘CA’ we know the value of country_id will be 52790 or the USA. There is also a skew in the data in these two columns; because the company used in the SH is based in San Francisco so the majority of rows in the table have the values ‘CA’ and 52790. Both the relationship between the columns and the skew in the data can make it difficult for the optimizer to calculate the cardinality of these columns correctly when they are used together in a query.

Let us connect to SH schema:-

SQL> select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341

It is evident from explain plain that cardinality estimate incorrect.

SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
——————————————————————————–
SQL_ID  f3rfbwhagqcss, child number 0
————————————-
select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   423 (100)|          |
PLAN_TABLE_OUTPUT
——————————————————————————–
|   1 |  SORT AGGREGATE    |           |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |   128 |  2048 |   423   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))
20 rows selected.

Prior to Oracle Database 11g the only option open to us would be to make the optimizer aware of the data skew in both the country_id column (most rows have 52790 as the value) and the cust_state_province column (most rows have ‘CA’ as the value). We can do this by gathering histograms on the skewed columns.

SQL> exec dbms_stats.gather_table_stats(user,’CUSTOMERS’,method_opt=>’for all columns size skewonly’);
PL/SQL procedure successfully completed.

Let me create extended statistics  on the two column used in where clause

select dbms_stats.create_extended_stats(null,’customers’,'(country_id,cust_state_province)’) from dual;
SQL> col COLUMN_NAME format a30;
SQL> col HISTOGRAM format a30;
SQL> select column_name,histogram from user_tab_col_statistics where table_name=’CUSTOMERS’;
COLUMN_NAME                    HISTOGRAM
—————————— ——————————
SYS_STUJGVLRVH5USVDU$XNV4_IR#4 FREQUENCY
CUST_STATE_PROVINCE_ID         FREQUENCY
COUNTRY_ID                     FREQUENCY
CUST_MAIN_PHONE_NUMBER         HYBRID
CUST_INCOME_LEVEL              FREQUENCY
CUST_CREDIT_LIMIT              FREQUENCY
CUST_EMAIL                     HYBRID
CUST_TOTAL                     FREQUENCY
CUST_TOTAL_ID                  FREQUENCY
CUST_SRC_ID                    NONE
CUST_EFF_FROM                  FREQUENCY
CUST_EFF_TO                    NONE
CUST_VALID                     FREQUENCY
CUST_ID                        HYBRID
CUST_FIRST_NAME                HYBRID
CUST_LAST_NAME                 HYBRID
CUST_GENDER                    FREQUENCY
CUST_YEAR_OF_BIRTH             FREQUENCY
CUST_MARITAL_STATUS            FREQUENCY
CUST_STREET_ADDRESS            HYBRID
CUST_POSTAL_CODE               HYBRID
CUST_CITY                      HYBRID
CUST_CITY_ID                   HYBRID
CUST_STATE_PROVINCE            FREQUENCY
SQL> select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341

We can now see it goes for dynamic sampling and adaptive optimizer feature

SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  f3rfbwhagqcss, child number 0
————————————-
select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   423 (100)|          |
|   1 |  SORT AGGREGATE    |           |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |  3424 | 54784 |   423   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))
Note
—–
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
   – dynamic statistics used: dynamic sampling (level=2)
   – 1 Sql Plan Directive used for this statement
25 rows selected.

Optionally we can see report of the column

SQL> set long 100000
SQL> set lines 120
SQL> set pages 0
SQL> select dbms_stats.report_col_usage(user,’CUSTOMERS’) from dual;
LEGEND:
…….
EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate
GROUP_BY   : Used in GROUP BY expression
…………………………………………………………………….
###############################################################################
COLUMN USAGE REPORT FOR SH.CUSTOMERS
………………………………
  1. COUNTRY_ID : EQ EQ_JOIN
  2. CUST_CITY_ID : EQ_JOIN
  3. CUST_ID : EQ_JOIN
  4. CUST_STATE_PROVINCE : EQ
  5. CUST_STATE_PROVINCE_ID : EQ_JOIN
  6. CUST_TOTAL_ID : EQ_JOIN
  7. SYS_STUJGVLRVH5USVDU$XNV4_IR#4 : EQ
  8. (CUST_STATE_PROVINCE, COUNTRY_ID) : FILTER
###############################################################################

Also I need to turn off optimizer dynamic sampling and optimizer adaptive feature to simulate extended statistics.

SQL> alter system set optimizer_dynamic_sampling=0;
SQL> alter system set optimizer_adaptive_features=false;
SQL> select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341
SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
——————————————————————————–
SQL_ID  f3rfbwhagqcss, child number 2
————————————-
select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   424 (100)|          |
PLAN_TABLE_OUTPUT
——————————————————————————–
|   1 |  SORT AGGREGATE    |           |     1 |    35 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |    13 |   455 |   424   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))

Let me drop the extended statistics

SQL> exec dbms_stats.drop_extended_stats(null,’customers’,'(country_id,cust_state_province)’);
PL/SQL procedure successfully completed.

I will recreate statistics again

SQL> select dbms_stats.create_extended_stats(null,’customers’,'(country_id,cust_state_province)’) from dual;
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,’CUSTOMERS’,'(COUNTRY_ID,CUST_STATE_PROVINCE)’)
—————————————————————————————————————————————–
SYS_STUJGVLRVH5USVDU$XNV4_IR#4

Let me gather statistics again

SQL>  exec dbms_stats.gather_table_stats(user,’CUSTOMERS’);
PL/SQL procedure successfully completed.

Still it will show bad cardinality estimation

SQL> select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341
SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  f3rfbwhagqcss, child number 2
————————————-
select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   423 (100)|          |
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
|   1 |  SORT AGGREGATE    |           |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |  1115 | 17840 |   423   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))
20 rows selected.

Now let me again gather statistics with skewonly option of histogram.It will show correct estimate.

SQL> exec dbms_stats.gather_table_stats(user,’CUSTOMERS’,method_opt=>’for all columns size skewonly’);
PL/SQL procedure successfully completed.
SQL>  select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341
SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  74w1jdh78t4w8, child number 0
————————————-
 select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   423 (100)|          |
|   1 |  SORT AGGREGATE    |           |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |  3341 | 53456 |   423   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))

L.How to enable old statistics from history  FOR gather statistics oracle 

Let me check what are old statistics collected in previous test case

SQL> COL TABLE_NAME FORMAT a10
SQL>SELECT TABLE_NAME,
 TO_CHAR(STATS_UPDATE_TIME,’YYYY-MM-DD:HH24:MI:SS’) AS STATS_MOD_TIME
FROM DBA_TAB_STATS_HISTORY
WHERE TABLE_NAME=’CUSTOMERS’
AND OWNER=’SH’
ORDER BY STATS_UPDATE_TIME DESC;
TABLE_NAME STATS_MOD_TIME
———- ——————-
CUSTOMERS  2018-02-22:03:30:57
CUSTOMERS  2018-02-22:03:24:07
CUSTOMERS  2018-02-22:03:20:27
CUSTOMERS  2018-02-22:03:00:37
CUSTOMERS  2018-02-22:02:16:48
CUSTOMERS  2018-02-22:02:01:26
CUSTOMERS  2018-02-22:01:42:21
CUSTOMERS  2018-02-07:12:15:38
8 rows selected.

Let me enable older statistics now

SQL> BEGIN
 DBMS_STATS.RESTORE_TABLE_STATS( ‘SH’,’CUSTOMERS’,
 TO_TIMESTAMP(‘2018-02-22:03:24:07′,’YYYY-MM-DD:HH24:MI:SS’) );
END;
/
  2    3    4    5
PL/SQL procedure successfully completed.

Now if we check explain plan,it went back to previous explain plan cost

SQL> select count(1) from customers where cust_state_province=’CA’ and country_id=52790;
  COUNT(1)
———-
      3341
SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————–
SQL_ID  f3rfbwhagqcss, child number 2
————————————-
select count(1) from customers where cust_state_province=’CA’ and
country_id=52790
Plan hash value: 296924608
——————————————————————————–
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————–
|   0 | SELECT STATEMENT   |           |       |       |   423 (100)|          |
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
|   1 |  SORT AGGREGATE    |           |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |  1115 | 17840 |   423   (1)| 00:00:01 |
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   2 – filter((“CUST_STATE_PROVINCE”=’CA’ AND “COUNTRY_ID”=52790))
20 rows selected.

M.Online Statistics FOR GATHER STATISTICS ORACLE

In Oracle Database 12c, online statistics gathering “piggybacks” statistics gather as part of a direct-path data loading operation such as, create table as select (CTAS) and insert as select (IAS) operations. Gathering statistics as part of the data loading operation means no additional full data scan is required to have statistics available immediately after the data is loaded.
Online statistics gathering does not gather histograms or index statistics, as these types of statistics require additional data scans, which could have a large impact on the performance of the data load. To gather the necessary histogram and index statistics without re-gathering the base column statistics use the DBMS_STATS.GATHER_TABLE_STATS procedure with the new options parameter set to GATHER AUTO. Note that for performance reasons, GATHER AUTO builds histogram using a sample of rows rather than all rows in the table.

About the Author

debasis maity

12+ years of rich experience on Database Administrations and on Infrastructure Solution Architect. AWS Certified Solution Architect and Senior Oracle DBA

3 thoughts on “Oracle 12c gather statistics oracle deep dive

Leave a Reply

Your email address will not be published. Required fields are marked *