While setting up a baseline for my XMLDB performance tests, I noticed that my “count(*)” on a Binary XML table (using Securefile LOB storage) called “WIKI_STAGE” took an awful long time. So long, that I even had to kill the SQL*Plus session, that was executing the “count(*)”. I started wondering. Why did it take so long to come up with the result?
In the end, due to good advice from Jonathan Lewis, I came up with a solution (although probably unsupported) and a better understanding off the mechanics involved. Also, as a side effect, it triggered a really good discussion on the “Oracle-L” freelist, regarding “counting”.
But lets start from the beginning…
The Setup
As you may still know, after reading “Setting Up an XMLDB Performance “Baseline” Environment“, I am busy trying to figure out the best baseline environment to do some serious XML data loading and testing on different setup’s. Its “hard” work, every load session (within my current setup) takes roundabout 20 hours to complete. The original data, an XML dumpfile from Mediawiki (English) is 17 Gb in size (on Windows NTFS) and contains approximately 7 million records.
I have a Oracle schema called “wiki”, granted the dba and xdbadmin role, with a default tablespace called MEDIAWIKI_STAGE. The rest of the details and / or RDA output will follow when I post testing results.
This tablespace MEDIAWIKI_STAGE has the following properties in a Oracle 11.1.0.6.0 Enterprise Edition database, situated on Oracle Enterprise Linux (32bit) Version 5 (base release):
SQL> select * from v$version;
BANNER
----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for Linux: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production
5 rows selected.
SQL> set pages 5000
SQL> set long 10000000
SQL> col "COL" for a200
SQL> select dbms_metadata.get_ddl('TABLESPACE','MEDIAWIKI_STAGE')
2 as "COL" from dual;
COL
----------------------------------------------------------------------
CREATE BIGFILE TABLESPACE "MEDIAWIKI_STAGE" DATAFILE
'/u02/oracle/oradata/TEST/mediawiki_stage.dbf' SIZE 26843545600
AUTOEXTEND ON NEXT 1073741824 MAXSIZE 46080M
LOGGING ONLINE PERMANENT BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO
ALTER DATABASE DATAFILE
'/u02/oracle/oradata/TEST/mediawiki_stage.dbf' RESIZE 28991029248
1 row selected.
The table “WIKI_STAGE” which is build on this tablespace, has the following characteristics, after it had been filled with the XML data and the Securefile lob was compressed with “COMPRESS HIGH”:
SQL> select dbms_metadata.get_ddl('TABLE','WIKI_STAGE')
2 as "COL"
3 from dual;
COL
----------------------------------------------------------------------
CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
OIDINDEX ( PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE" )
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE"
XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS SECUREFILE BINARY XML (
TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
NOCACHE LOGGING DECRYPT COMPRESS HIGH KEEP_DUPLICATES
STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 BUFFER_POOL DEFAULT)) ALLOW NONSCHEMA DISALLOW ANYSCHEMA
1 row selected.
After the loading process had been successfully executed, I gathered the following information:
SQL> @segment
-- an old script off mine, based on dba_segments
Segments:
Enter value for segment_name:
Enter value for segment_type:
Enter value for owner:
Enter value for extents:
Enter value for tablespace_name: MEDIAWIKI_STAGE
Segment Type Bytes Tot.Ext IniSize NextSize %Inc MinE MaxE Owner TableSpace
------------------------------ --------------- ------------ -------- ------------ ------------ ----- ----- ------- --------------- --------------------
SYS_LOB0000070946C00003$$ LOBSEGMENT 18909364224 456 106496 1 ####### WIKI MEDIAWIKI_STAGE
WIKI_STAGE TABLE 8319401984 309 65536 1 ####### WIKI MEDIAWIKI_STAGE
SYS_C009650 INDEX 352321536 113 65536 1 ####### WIKI MEDIAWIKI_STAGE
SYS_IL0000070946C00003$$ LOBINDEX 65536 1 65536 1 ####### WIKI MEDIAWIKI_STAGE
------------
sum 27581153280
SQL> select dbms_metadata.get_ddl('INDEX','SYS_C009650','WIKI') as "COL"
2 from dual;
COL
----------------------------------------------------------------------
CREATE UNIQUE INDEX "WIKI"."SYS_C009650" ON "WIKI"."WIKI_STAGE" ("SYS_NC_OID$")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE"
1 row selected.
.
Count(*) – A Special Case?
I am “an Oldtimer” regarding Oracle database usage and maybe one off the myths I remember and wondered about while assessing the problem was that I believed that a “count(*)” is special. Special in the way that the Oracle database does smart things with it, performance wise. Mind you I am not talking about “count(1)” or “count(column)” but only about the special case: “count(*)”.
While doing a simple “select count(*) from wiki_stage”, I noticed that it took far to long as I would have expected, and killed the session after a few hours.
12:26:24 SQL> select count(*) from wiki_stage;
select count(*) from wiki_stage
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [%s], [%s], [%s], [%s], [%s], [%s],
[%s], [%s]
.
I tried to figure out what was going on so I did an explain plan:
15:47:55 SQL> explain plan for
15:48:24 2 select count(*) from wiki_stage;
Explained.
Elapsed: 00:00:00.02
15:48:31 SQL> SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE',NULL,'ALL'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
Plan hash value: 3207986117
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 275K (1)| 00:55:03 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| WIKI_STAGE | 6921K| 275K (1)| 00:55:03 |
-------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / WIKI_STAGE@SEL$1
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
Note
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
- dynamic sampling used for this statement
24 rows selected.
Elapsed: 00:00:00.02
.
Hmmm, full table access.
That explains why it took so long. But why is is doing a full table access in the first place, I thought?
There is a smarter way to do the count, so why doesn’t use the Cost Based Optimizer (CBO) the index available…
The “WIKI_STAGE” table has a hidden OID column called SYS_NC_OID$ and on this column is a UNIQUE index called “SYS_C009650”. If the CBO would have used it the following would have happened:
15:14:43 SQL> select count(SYS_NC_OID$) from wiki_stage;
COUNT(SYS_NC_OID$)
------------------
6996002
Elapsed: 00:03:05.32
-- to get more info I did a "set autotrace on"
15:41:25 SQL> select count(SYS_NC_OID$) from wiki_stage;
COUNT(SYS_NC_OID$)
------------------
6996002
Elapsed: 00:00:00.36
Execution Plan
----------------------------------------------------------
Plan hash value: 1000365655
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 11596 (1)| 00:02:20 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6921K| 66M| 11596 (1)| 00:02:20 |
-------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
1233 recursive calls
0 db block gets
42748 consistent gets
0 physical reads
0 redo size
431 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
24 sorts (memory)
0 sorts (disk)
1 rows processed
-- and also tried an "explain plan for"
15:47:52 SQL> SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE',NULL,'ALL'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 1000365655
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 11596 (1)| 00:02:20 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6921K| 66M| 11596 (1)| 00:02:20 |
-------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / WIKI_STAGE@SEL$1
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT("SYS_NC_OID$")[22]
2 - "SYS_NC_OID$"[RAW,16]
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
25 rows selected.
.
I saw being mentioned “dynamic sampling used for this statement” in the output and thought, OK, probably that is the issue, I forgot to sample for statistics. So I retried it but now generating statistics first.
21:48:29 SQL> exec dbms_stats.gather_schema_stats('WIKI', cascade=>TRUE);
BEGIN dbms_stats.gather_schema_stats('WIKI', cascade=>TRUE); END;
PL/SQL procedure successfully completed.
Elapsed: 01:08:50.54
09:09:57 SQL> explain plan for
09:10:00 2 select count(*) from wiki_stage;
Explained.
Elapsed: 00:00:00.05
09:11:51 SQL> SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE',NULL,'ALL'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 3207986117
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 275K (1)| 00:55:03 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| WIKI_STAGE | 6996K| 275K (1)| 00:55:03 |
-------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
PLAN_TABLE_OUTPUT
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / WIKI_STAGE@SEL$1
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
20 rows selected.
Elapsed: 00:00:00.14
09:11:59 SQL> explain plan for
09:12:31 2 select count(SYS_NC_OID$) from wiki_stage;
Explained.
Elapsed: 00:00:00.02
09:12:33 SQL> SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE',NULL,'ALL'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 1000365655
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 11564 (1)| 00:02:19 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6996K| 113M| 11564 (1)| 00:02:19 |
-------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
PLAN_TABLE_OUTPUT
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / WIKI_STAGE@SEL$1
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT("SYS_NC_OID$")[22]
2 - "SYS_NC_OID$"[RAW,16]
21 rows selected.
Elapsed: 00:00:00.04
09:12:35 SQL> set autotrace on
09:15:54 SQL> select count(SYS_NC_OID$) from wiki_stage;
COUNT(SYS_NC_OID$)
------------------
6996002
Elapsed: 00:00:21.34
Execution Plan
----------------------------------------------------------
Plan hash value: 1000365655
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 11564 (1)| 00:02:19 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6996K| 113M| 11564 (1)| 00:02:19 |
-------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
42373 consistent gets
5207 physical reads
0 redo size
431 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
.
Nope, same effect. That is, the “select count(*) on wiki_stage” still used a full table access to do the count on the table and ignores the fact that there is an index that could be used to achieve the same.
Finding Answers via Oracle-L
While being stuck for answers, I sought help via my peers on the “Oracle-L” Freelist. As said, the discussion progressed far beyond the context of my initial discussion, but is definitely a good read regarding the internal mechanics involved. I have to read up on a Masterclass from Steve Adams I once followed, to bring me up to speed again on the context off dumping block segments, but most of it could make sense regarding an opportunity for the CBO to maximize a smart mechanism for doing a “count(*)”.
Anyway, the following applies to this situation:
Re: Performance off “count(*)”
* From: “Jonathan Lewis” [jonathan @xxxxxxxxxxxxxxxxxx]
* To: [oracle -l@xxxxxxxxxxxxx]
* Date: Fri, 18 Jul 2008 15:20:11 +0100In principle the optimizer knows that count(*) means:
“how many rows in the table” and will choose the smallest object that could return the result to do the scan. This means it may choose to do a fast-full scan of an index if at least one column of the index is declared as not null.
Depending on version, the choice between indexes of identical cost may simply be based on alphabetical ordering of the names – although number of distinct keys is supposed to have an effect as well in recent versions
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.comAuthor: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.htmlThe Co-operative Oracle Users’ FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
OK, that sounded reasonable (and who am I to doubt Jonathan). 😉
A little bit later in in the discussion thread, Jonathan suggested the following:
Re: Performance off “count(*)”
* From: “Jonathan Lewis” [jonathan @xxxxxxxxxxxxxxxxxx]
* To: [oracle -l@xxxxxxxxxxxxx]
* Date: Sat, 19 Jul 2008 09:33:44 +0100Marco,
XMLType table are a version of object tables, which means they have a hidden OID column called sys_nc_oid$.
The problem seems to be that the sys_nc_oid$ column (the objects unique object id) does not get declared with a not null constraint. So the unique index on this column cannot be used for the count.
I can’t think of a mechanism that could (legally) ever allow the OID to be null, so it seems that you should be safe issuing
alter table XXX modify sys_nc_oid$ not null;
If you do, then Oracle will automatically use the index in the count(*).
I’d raise an SR to get confirmation that this is safe before doing it on live data though.
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.comAuthor: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.htmlThe Co-operative Oracle Users’ FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
Unsupported Solutions to a “Count(*)”
For this database version and for this special case, being an XMLType table based on Binary XML, I can not see (yet) a reason why the CBO should not use a SYS_NC_OID$ column with the extra restriction “not null”.
While doing the unsupported (on my test database!), I got the following result:
22:57:16 SQL> alter table WIKI_STAGE
11:38:42 2 modify sys_nc_oid$ not null;
Table altered.
Elapsed: 01:07:28.80
12:46:11 SQL> exec dbms_stats.gather_schema_stats('WIKI',cascade=>TRUE);
PL/SQL procedure successfully completed.
Elapsed: 01:06:37.50
21:27:44 SQL> explain plan for
21:27:50 2 select count(*) from wiki_stage;
Explained.
Elapsed: 00:00:00.18
21:27:56 SQL> SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE',NULL,'ALL'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
Plan hash value: 1000365655
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11108 (1)| 00:02:14 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6996K| 11108 (1)| 00:02:14 |
-----------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / WIKI_STAGE@SEL$1
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
20 rows selected.
Elapsed: 00:00:00.32
21:30:09 SQL> select count(*) from wiki_stage;
COUNT(*)
----------
6996002
Elapsed: 00:00:48.35
Execution Plan
----------------------------------------------------------
Plan hash value: 1000365655
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11108 (1)| 00:02:14 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C009650 | 6996K| 11108 (1)| 00:02:14 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
42373 consistent gets
14470 physical reads
0 redo size
421 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
.
Worked like clockwork, as it should do anyway following the theory, but then again, I have no knowledge of the internal mechanisms involved regarding Binary XML. I have also no idea if the “not null” restriction can always be applied in this (special) case. I will make a case with the XMLDB Development team, via a Service Request, and see if this could be an enhancement to the database and/or the Cost Based Optimizer.
HTH
Marco
Related Posts
Small update: the request has been honored and should be shipped with the next mayor release of the Oracle database software.