I just read Thomas Kyte’ s blog post “Read This“, which is dealing with the content of the blog post of Cary Millsap. As Tom phrased it:
I liked what Cary Millsap just said:
I don’t mean “show and tell,” where someone claims he has improved performance at hundreds of customer sites by hundreds of percentage points [sic], so therefore he’s an expert.
I mean show your work, which means documenting a relevant baseline measurement, conducting a controlled experiment, documenting a second relevant measurement, and then showing your results openly and transparently so that your reader can follow along and even reproduce your test if he wants to.
This is more or less funny, because I read Cary’s post, be apparently I didn’t read it… I can really relate to it now.
I am in the middle of setting up a XMLDB test environment to test, among others, load times while using different kinds off XMLType storage based upon CLOB, Object Relational and Binary XML (using Basicfile / Securefile options). And although I am working on a VMware environment, I noticed that it isn’t that easy to setup a “controlled experiment“. What makes it harder is, that I am using the Mediawiki XML English dumpfile, that contains roundabout 7 million records (17 Gb of ASCII data). This makes it more interesting, and the effects more clearer, but it also takes much more time to do stuff.
I use an old (5 / 6 years of age) client desktop with 1.5 Gb of RAM to do the client work and a VMware environment with Oracle Enterprise Edition V5 and Oracle 11gR1 (11.1.0.6.0) as the database backend.
The first test didn’t “succeed” because I had forgotten to set off the anti-virus program that runs in the night on my desktop. Although the first loading test actually did succeed, this does not fit into my sense of “controlled experiment”.
The second test didn’t succeed because the 2 threads I used for the XML SAX Loader both crashed because it encountered two XML instances that where not XML wellformed.
The following test did succeed. I started up an XML SAX Loader session that was based on 4 simultaneous threads. All my XML data had been loaded into a XMLType Binary XML based table that was build on a dedicated SMALLFILE tablespace with ASSM.
To see if a compressed securefile LOB would be smaller than the actual file on disk, I tried to compress it with the “high” option. The database would not allow it because, of historical reasons I once tried to setup a clean database environment, I realized that the database was a standard database version.
I though, OK skip this for a moment, and do a rerun of the loading process, but now in an XMLType CLOB based storage table. This didn’t succeed because the loading process crashed due to not enough space available in the tablespace. Although I had set the MAXSIZE on datafile level to UNLIMITED for the SMALLFILE tablespace, I had not take into account the possibility that the overall size could be more then 40Gb to store the data (I had used only one datafile).
One of the reasons this will happen is because CLOB based XMLType storage will keep the XML document intact “as is”, including whitespace and therefore will be bigger than the post parse binary storage structure, despite this isn’t securefile based.
I then cleaned out the complete software installation and dropped all database files and structures and completely started from scratch installing the enterprise database edition. I now tried to get better grip on the “controlled” part and deactivated the automatic memory management of the 11gR1 database version, by manually setting up all the memory pool and cache structures.
The load process succeeded again, but this time much more slower while using the same configuration for the SAX Loader process. It was more or less twice as slow (gut feeling, still have to check the figures) as the situation based on the Standard Database Edition. I though this maybe could be caused by the fact that I used a BIGFILE tablespace this time.
Ouch.
I just did a “init 3” via the root account, disabling the X server CPU overhead I don’t need anymore and disabled the dbconsole interference on the database via “emca stop dbconsole”. So I will have to rerun this one again… Not that it is that big a problem because I wanted to do a rerun anyway but now with “nmon” and statspack running along while doing the XML data load.
I realize now that setting up an environment as mentioned by Cary Millsap in the beginning of this post is not that “easy” as I thought. If not the time aspect. Every run takes a minimum of 12 hours with my current set-up.