XML Data – To be stored or not to be stored, and beyond…

This is a backup of my post, with the same title, on the AMIS Technology Blog site

It is nagging…

I got an email from that you are able to download the handouts from ODTUG until half September (download them now, 1/2 September they will be archived and you will need an account to access them). Because I am highly interested on XML / XMLDB stuff, I scrolled through the list. My eye caught the presentation of Mr. Ken Atkins. As far as I can see from his picture on his website he looks like a nice, cool guy, but I can’t resist making some remarks here about the slides from his presentation: “XML Data Into and Out of Oracle – Using PL/SQL”. Making these remarks is difficult, I wasn’t there, I don’t know the exact context in which they are said in, but…

Anyway. Hereby some of my pointers, based on this presentation…

XML is not Relational

All in all it looks like a fairly decent presentation. The remark that caught my sight, and I don’t agree with, are the remarks pointed out after: “XML is *NOT* a good format for data access!” (repeated on slide 11 and 12). As far as I can decipher from the presentation, this is the basis of his way of dealing with XML data.

XML is *NOT* a good format for data access!

  • If you need to access the data, store it in relational tables!
  • XML is not really useful as a database (in spite of what many seem to think!)

After reading the first presentation slides, I discovered a while ago on the net, from Mr. Atkins, the ones from his presentations on OPP 2007, I can only guess that he shifted his general opinion, about how to handle XML. Trying to avoid XMLType storage. Being a “relational guy”, brought up with relational models and Chris Date et all, I more or less can agree, but having to deal XML has nothing to do with “relational data”.

The XML world is a different one compared to the Relational world. I my opinion (spoken from a relational viewpoint), XML data is “not data”, it is unstructured and should be transformed to Relational data. If you want to see it as “XML data” then you should deal with it from a XML viewpoint and therefore store and use it as is… XML.

If you store XML into relational tables, you will lose information: information about (at least) the relations between the entities…(if not while inserting, you will loose information retrieving the data). The other way can also be true: adding information, or implying relations, while storing it relational, which was not there in the first place when the XML data was stored.

Think about all the problems if dealing with relational topics like nullology, or the time aspect in data warehousing (what dates can I and can’t I compare to result in meaningful data), to get my drift. At current state the two, the relational model and XML, do not mix. Handling relational data has been refined over more then 30 years (and is still being refined), on the contrary to XML…

Store XML as XML

So from my viewpoint, treat XML as XML and therefore, if it has to be stored, use the Oracle XMLDB storage options (XMLType, XMLType Table) and not relational tables. Store XML as XML, preferably do this using Oracle 10 R2 or later database versions. From this point on, IMHO, the XMLDB storage functionality has enough options and flexibility to store XML as required by the business AND to delivery this data fast without loosing information, but…if (and only if) you do it the correct way… Presenting relational stored data as XML, for example via XMLType views or via an object orientated way (or via …), to the outside world is all about handling structured data a different way and therefore should be designed carefully without loosing to much information.

“How can I retrieve data in fast efficient way from an Oracle database when it is stored in a table?”, “Which factors make a difference?”. Everyone knows these are not easy questions to answer in a relational database setting like Oracle. It often depends. Handling XML data, being “unstructured data”, is even more challenging.

Logical and Physical

Using the datatype XMLType will keep the logical XML structure in tact, depending on the needs.

Regarding the physical nature, XMLType CLOB storage will keep the structure intact, XMLType Object Relational and Binary storage will shred (loosing whitespace “information”) the XML data, but keeps the rest of the XML intact. If data can be accessed fast or not, depends (as always) on correct applied structure and access paths, for example, via good use of using indexes. Oracle XMLDB functionality has the ability to do this; Fast. I dare to say, in some situations faster then the methods shown / demonstrated in the slides by Mr. Atkins.

If applied XMLType datatypes correctly with the proper physical structure, for example, a XMLType Column (CLOB storage) used with an XMLIndex (Oracle 11g) or an XMLType table (Object Relational storage) manually created with balanced tree indexes on the proper places (/nodes) as can be realized in Oracle 10g (preferably 10gR2), you WILL get “a good format for data access”. It will be fast and a flexible solution, the best you can get in the XML world, existing in a relational (database) world.

While using XMLType storage methods, you are also able to use all the advantages and improvements, based on standards (or drafts), like XML evolution, the repository, the protocol server, events, binary xml storage, de-duplication.

XMLType data has been matured enough to be able to function as transactional data, can be used in cases of highly queried data or in large amounts (slide 12). I have been there, I have done it, It works. On the other hand it also “depends”, as in: “you should not mix OLTP and data warehousing in the same database” (you will have to compromise on performance on one or the other).

A BIG amount of data in XML, is something different than speaking about a big of data in the relational world.

Do NOT kid yourself – I am still not speaking about relational modeling of data – I deal with XML data as is, with all its limitations (and advantages). It has no 30+ years history yet of being redesigned / redefined. As always, good design is the first step to success.

The Use of Resources

If people make use of the Document Object Models (DOM), etc, you should take into account that this could cause or implies validation, parsing and in memory processing and memory consumption… (the performance differences between using SQL/XML, XMLGEN, DOM – Slide 49). This can consume a lot of memory (and CPU). With XMLType storage, I am able to take this out of the equation. Sometimes we just don’t need validation, for example, when the data was entered via a web form (check here, avoid a check in the database).

This will mean gaining speed regarding data retrieval and insertion. Let Oracle optimize the shredding (via XMLType storage) and don’t invent your own method. Slide 140 even supports my viewpoint regarding performance, even (and I am not sure) if Mr. Atkins is referring to XMLType views in “XMLType – Parse Only” contrary to XMLType storage. The more it has to be validated or parsed, the more processing power (and memory) this will need (the difference show in slide 93).

Before you make remarks like: “Hey! And what about binary (OR shredding / ) processing”. Yep that can be also be the downside of using XMLType storage (compared to relational), but believe me, if applied properly, it will be really fast (relational database like fast) AND you won’t have to store it the relation way (which it isn’t in the first place). Store XML data as XML and relational data as relational data. If you can’t live with the current build in limitations of XML then just don’t use XML. Also instead of you doing the “code maintenance” (aka your defined programmatically shredding), let this be done via the Storage Model or via the use of an XML Schema.

Native, Relational and Beyond

“Why store XML in an Oracle / Relational database, in the first place?”, you might ask. “Why not using a “native” XML database?” Being not experienced enough with native XML databases, I am careful regarding commenting on this. Being a relational educated guy and reasoning from a relational viewpoint, I guess, I dare to say something about storing XML in a relational database (being aware that I continue on a slippery slope). Despite its XML shortcomings, wouldn’t it be nice to being able to imply (some of) the relational methods defined over last 30 year on handling XML and solving this way problems handling XML data. I can see object relational storage as a first step in this direction. SimpleTypes becoming my columns and GlobalTypes being my table definitions… Relations defined by…? Hmmm, to blasphemous? Anyway…

Existing in an Oracle relational database has also has it advantages. Oracle already supports an awful lot of W3C XML standards in the database, but you are not only bound to those W3C XML standards. You are able to use functionality that has been there in the relational database for ages, like using Partitioning, Advanced Queuing, Compression or Bitmap indexes. Despite the shortcomings of XML, you have also the tools at hand to solve problems “outside the box”, for instance, via using Java, PL/SQL, C(++) or .Net, and IMHO this is an advantage…

That said…

Here also some small remarks, for instance, on slide 23 from the handout of Mr. Atkins. The extract function has been often used as a way to pretty print the XML output. This was apparently something that wasn’t intended. This pretty print extract functionality will be deprecated in 11g. Another limitation lifted in Oracle 11g, is the 64k limit (slide 49). The advantages of being a beta participant…

My conslusion

Treat XML as XML and Relational data as Relational data. Be very careful in what you are doing, as said IMHO, while converting from or to XML or relational can result in incorrect or the loss of information. Let Oracle do what it is best at. Don’t re-invent “shredding” while being natively implemented.

Who knows I will be proven otherwise (via commenting on the post). I think handling XML, dealing with XML, storing XML are a difficult topics. Most issues are introduced by just using XML, so this is still something I have to figure out for myself. Who knows, I will make my ramblings public and post them here, if I filtered out some details out of my own thinking processes regarding: “why you should use XML in the first place”.

All together, I hope, I at least presented here some food of thought.

😎

Marco Gralike Written by: