This post will show you some of the first numbers I collected regarding “Loading XML data”, while making use of different XMLType “physical storage containers”.
I also have done some initial testing with Object Relational XMLType storage, but because this method of storage has many options and extra features, I won’t describe them yet here. This topic is interesting enough to earn its on post.
If you need some background on Oracle XMLType Storage option than have a read through the “Binary-, CLOB, Object Relational Storage” Category option in the menu, the Oracle XMLDB Developers Manual or a short intro via Oracle 11g – XMLType Storage Options.
After having created an environment as described in “XMLDB Performance: Environment, Set-up, Procedure“, the following results were gathered by me while keeping values constant, for example the values for “connection.xml“, as described in the “XMLDB Performance: Environment, Set-up, Procedure” . Only the WIKI_STAGE create statements are different.
My Goal
My goal here is to see how these XMLType storage structures perform, in this case, what happens while loading XML data. Another is issue is finding out how to create them and can I deduct some extra “concepts” or rules of numb out of these testing procedures. While setting the test environment up and loading the data I noticed that there are some issues, for example, regarding altering structures or counting the data (XMLDB Performance: The Side Effects of a Simple “count(*)”).
During this learning process, beside insight in, I also had some new questions that have to be answered to understand the mechanics involved. One of this questions I don’t have an answered yet for is “What are the internal differences in bookkeeping between a Smallfile and Bigfile tablespace, besides ROWID structures” (and what is the effect on XMLType storage structures).
Create Table Statements
In this first report, regarding my loading XML data test results, I compared the following structures:
- Binary XMLType Table using Basicfile storage
- Binary XMLType Table using Securefile, not compressed, storage
- Binary XMLType Table using Securefile, compressed high, storage
The results shown were measured while using the following SQL settings for create table WIKI_STAGE
Binary XML Basicfile (XMLType Table)
The following create WIKI_STAGE create table statement was used:
select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;
DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------
CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
OIDINDEX ( PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE" )
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE"
XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS BASICFILE BINARY XML
(TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))
ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.
Binary XML Securefile (XMLType Table) – NO COMPRESS
The following create WIKI_STAGE create table statement was used:
select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;
DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------
CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
OIDINDEX ( PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE" )
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE"
XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS SECUREFILE BINARY XML
(TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
NOCACHE LOGGING DECRYPT NOCOMPRESS KEEP_DUPLICATES
STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 BUFFER_POOL DEFAULT))
ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.
Binary XML Securefile (XMLType Table) – COMPRESS HIGH
The following create WIKI_STAGE create table statement was used:
select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;
DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------
CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
OIDINDEX ( PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE" )
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "MEDIAWIKI_STAGE"
XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS SECUREFILE BINARY XML (
TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
NOCACHE LOGGING DECRYPT COMPRESS HIGH KEEP_DUPLICATES
STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 BUFFER_POOL DEFAULT))
ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.
Segment Findings
All these storage structures, these binary XMLType tables, contain:
- A table segment
- An index segment
- A LOB segment
- A LOB index segment
While using the create table statements above, this resulted in the following used amount of bytes per segment type after 6.996.742 Mediawiki XML records where loaded.
Table Segments
Click on the picture to enlarge
Index Segments
Click on the picture to enlarge
LOB Segments
Click on the picture to enlarge
LOB Index Segments
Click on the picture to enlarge
Size Totals
Would you compare the original size on disk of these 6.996.742 Mediawiki XML files on Windows NTFS against the amount of bytes used in the database, then you would could see the following differences.
Totals Overview – Segments
Click on the picture to enlarge
and while looking at file size alone…
Totals Overview – Size on Disk, Size in the Database
Click on the picture to enlarge
A simple conclusion, while looking at the graphs, is that “standard” XML storage using Basicfile or Securefile CLOB storages are bigger that the XML size on Windows disk (while using NTFS). The other conclusion in this case is that you will regain 4.2 Gigabyte while compressing it in Securefile Binary XML format, compared to the Mediawiki XML Windows flat file.
To my surprise, I am now aware of the fact that a “Securefile” structure in Oracle 11.1.0.6.0 can be bigger that its alternative the “Basicfile” structure.
I also encountered a bug. A bug I had completely forgotten. One that I had encountered and reported during the end of beta testing of Oracle version 11.1.0.6.0 in 2007. My bug number 6364855, relates to “LOB SIZE USING SECUREFILE IS VERY LARGE”.
See the following alteration statement:
SQL> alter table wiki_stage modify lob (xmldata) (compress high)
Table altered
.
The symptoms are that if you compress or de-compress Securefile storage, the internal storage size increases, despite using “compress”. The bug was very quickly solved and will be packaged in patch version 11.1.0.7.0. So if needed, ask for a back port for Oracle database version 11.1.0.6.0.
Duration
An interesting comparison is, of course, also: “How much time was needed the load the XML data?”. Are there consequences while using one or the other storage model? And there are, as expected…
Click on the picture to enlarge
Conclusions?
What is new for me is that there seems to be a small load time trade off between the old “Basicfile” structure (the structure also used before 11g) and the new Securefile ( no compress) structure in the 11.1.0.6.0 (Oracle Enterprise Edition 32 bit for Linux), which I can’t explain. I did not yet trace / count, for instance, database server CPU time. It could be that storing data, while making us off Securefile structures in general, would consume more CPU cycles, but because I didn’t trace on this, also can’t prove this… As said, just a gut feeling.
The other differences can be explained because the can be explained via the information given in the XMLDB Developers Guide and Storage Guide
XMLType storage will be checked on XML Well-Formedness (this is default behavior) . This takes time (Memory and CPU cycle consumption)
Binary XML is a post parse storage model, this means that XML tags and information, in labor terms, will be cut in to pieces, sorted, ordered and categorized regarding “same elements”, “the same information”, usage, etc. Also the XML data will be stripped from unnecessary whitespace and the tree structure, meta data info will be optimized regarding storage, its accessibility and maintenance. All this information will used when the information is requested again and has to be reconstructed into its original form (with the exception regarding whitespace). The internal mechanism regarding the “How” is only known to Oracle.
Compressing (data) takes extra time, most of it needed for CPU processing. This explains the difference between the Securefile storage via “NOCOMPRESS” and the Securefile storage via “COMPRESS HIGH”.
Other conclusions…?
Securefile storage is smaller, so as advertised, while using compression, in the Oracle Enterprise Edition version 11.1.0.6.0. I hope that patch 11.1.0.7.0 will be released this year so I can re-test some of the features for Securefile. One of my questions, still unanswered, is “How much time is needed to “compress high” a Securefile LOB segment, using a default Securefile LOB segment (NOCOMPRESS) as a starting point”. Some internal information about Basicfile and Securefile Binary XML would be useful and could be used to explain the differences between seen in table, lob segment and lob index segments.
There are still some questions I didn’t answer yet that are related to the described storage models here, such as “What happens if the Binary XML storage model depends on XML Schemata” or “What happens if the Binary XML storage uses features like ‘de-duplication’ or different ‘schema controlled encoding’ “…
What I have described here wasn’t “exact” science. Uncontrolled functionality like AWR mechanisms were still enabled and in place, but I think some comparisons can be made and give a “statement of direction”. On the other hand, just doing it (setup, testing, loading data, thinking about “what does this figure mean…”), made me a lot wiser…
Although I believe that I only scratched the surface of some of the internals, dealing with XML storage etc., I hope that people can use this information to their advantage.
😎
Related Posts
Hi Marco!
Very interesting comparison! Looks like high compression brings quite some space savings!
Could you add the queries you used to get the segments and database sizes? Thanks!
Thomas
There is a dedicated presentation dealing with securefile during Oracle Open World (if you are going). Search in the scheduler on keyword “securefile”.
Scripting, segments:
If appropriate for detailed LOB info, I used: