Working for years with Oracle’s relational environment, XML DB is a completely different kind of beast. You’re confronted with all the languages and techniques the XML environment has to offer – XML, XML Schema, XSLT, Xpath, DTD, DOM, SAX, etc,etc,etc – plus the ones you know from Oracle’s (object) relational (database) world, SQL, PL/SQL, java, OCI, java, etc.
I am currently working for a customer, who implemented the XML DB functionality in a new system. The total amount of XML data is roundabout 200 Gb. The XML data is split-up in 4 tables, 2 tables containing an XMLType column (XML Schema based, CLOB storage) and 2 XMLType tables (XML Schema based, Object Relational storage).
Some XML documents are more than 200 pages printed data and fairly complex in their structure , sometimes as big as 3 to 4 Mb. So it’s a real challenge to get the data you want, in a decent time period. Even simple questions like “give me a count of ?” or “summarize the following numbers?” can be tricky if you are dealing with complextype constructions/elements. Usage of COUNT and SUM can be tricky in an XML environment.
The following will give you more insight in Oracle SQL/XPath solutions and the need for a clear understanding of your XML Schema’s and the solutions presented by the Oracle database.
You can use Oracle’s XMLType functions to query and/or extract data. From Oracle’s XML DB developers Guide it states:
You can query XMLType data and extract portions of it using the existsNode(), extract(), or extractValue() functions. These functions use a subset of the W3C XPath recommendation to navigate the document.
- existsNode()
- extract()
- extractValue()
The extract() function returns the node or nodes that match the XPath expression. Nodes are returned as an instance of XMLType. The results of extract() can be either a document or documentFragment.
SELECT extract(object_value,'/PurchaseOrder/Reference') "Reference"
FROM PURCHASEORDER;
Reference
------------------------------
SBELL-2002100912333601PDT
1 row selected.
The extractValue() function returns the value of the text node or attribute value that matches the supplied XPath expression. The value is returned as a SQL scalar datatype. This means that the XPath expression passed to extractValue() must uniquely identify a single text node or attribute value within the document.
SELECT extractValue(object_value,'/PurchaseOrder/Reference')
as "Reference"
FROM PURCHASEORDER;
Reference
------------------------------
SBELL-2002100912333601PDT1 row selected.
The existsNode function evaluates whether or not a given document contains a node which matches a W3C XPath expression. The existsNode() function returns true (1) if the document contains the node specified by the XPath expression supplied to the function and false (0) if it does not.
SELECT COUNT(*)
FROM PURCHASEORDER
WHERE existsNode(object_value,'/PurchaseOrder/Reference') = 1;
COUNT(*)
---------
132
1 row selected.
SELECT count(*)
FROM PURCHASEORDER
WHERE existsNode(object_value,
'/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')=1;
COUNT(*)
---------
1
1 row selected.
These examples (see “Example 3-23 Searching XML Content Using the existsNode() Function” ) from the XML DB Developers Guide can be misleading if you focus to much on the count(*) expression.
The count is only correct (in the last example) because there is an explicit “where Reference = {value}” XPath declaration in it. If you would base your count, as in the statement above, in XPath terms on “where there exists a node called reference”, then the XPath expression will probably not give you the expected result.
This is because the existsNode only validates the existence off the node, not how many nodes there are of this node in this branch. If it finds a node, it validates the condition as valid – and exits the branch.
This would give you the “expected” result for the count(*), in the example below = 2 records, if the structure would be a structure like:
SBELL-2002100912333601PDT
SBELL-2004100912389972PDT
The statement would give you an “unexpected” result for the count(*), in the example below = 1 record, containting repeating elements like a structure as described below:
SBELL-2004100912333601PDT
SBELL-2004100912389972PDT
Look at the following XML Schema structure:
Based on this schema we now create a table and insert some rows.
set lines 50
set long 10000
set trimspool on
drop table amis_xml;
create table amis_xml (data xmltype);
insert into amis_xml
values
(xmltype('
Marco
Gralike
2345
'));
insert into amis_xml
values
(xmltype('
BertJan
Meinders
3456
125
'));
insert into amis_xml
values
(xmltype('
Chris
Gralike
4567
'));
insert into amis_xml
values
(xmltype('
Karin
Kriebisch
'));
commit;
describe AMIS_XML
Name NULL? Type
------- -------- --------
DATA XMLTYPE
select t.data from AMIS_XML t;
DATA
------------------------------
Marco
Gralike
2345
BertJan
Meinders
3456
125
Chris
Gralike
4567
Karin
Kriebisch
4 rows selected.
select * from amis_xml t
where existsnode(t.data,'/PAYMENT/PERSON/SALARY')=1;
DATA
------------------------------
Marco
Gralike
2345
BertJan
Meinders
3456
125
Chris
Gralike
4567
3 rows selected.
select count(*) from amis_xml t
where existsnode(t.data,'/PAYMENT/PERSON/SALARY')=1;
COUNT(*)
---------
3
1 row selected.
Instead off the (maybe) expected 4 rows selected, existsnode only returns 3. The two salary nodes of Mr. Meinders are counted as one.
The solution to this problem is presented by the XMLSequence function.
The XMLSequence() function makes it possible to take an XMLType containing a fragment and perform SQL operations on it. It generates a collection of XMLType objects from an XMLType containing a fragment. The collection contains one XMLType for each of the root elements in the fragment. This collection of XMLType objects can then be converted into a virtual table using the SQL table() function. Converting the fragment into a virtual table makes it easier to use SQL to process the results of an extract() function that returned multiple nodes.
The examples as shown in the XML DB Developers Guide, show us examples why and when we should use XMLSequence (mark the XXX records selected).
The described structure:
SBELL-2004100912333601PDT
SBELL-2004100912389972PDT
Would give with the statement:
SELECT extract(object_value,’/PurchaseOrder/Reference’) “Reference”
FROM PURCHASEORDER;
Reference
——————————
1 record selected
SELECT extract(value(v),’/Reference’) “Reference”
FROM PURCHASEORDER t
, TABLE(XMLSequence(extract(object_value, ‘/PurchaseOrder/Reference’))) v
;
Reference
——————————
2 records selected
Knowing this, we now can re-write our SQL/XPath statements.
select extract(value(v),'/SALARY') "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;
Salary
------------------------------
2345
3456
125
4567
4 rows selected.
select count(*)
from amis_xml t
, TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;
COUNT(*)
----------
4
1 row selected.
select extractvalue(value(v),'/SALARY') "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;
Salary
-------
2345
3456
125
4567
4 rows selected.
select sum(extractvalue(value(v),'/SALARY')) "Total"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;
Total
------
10493
1 row selected.
The next release of the Oracle database, version 10g version 2, will hopefully present more flexible tools like XQuery and or better XPath possibilities like:
select extract(t.data,'count(/PAYMENT/PERSON/SALARY)')
from amis_xml t ;
ERROR:
ORA-31012: Given XPATH expression not supported
good article…thanks!!
Be aware that time passes on. From of database version 10.2.x you should use the XMLTABLE syntax instead of the table(xmlsequence(extract())) construct.
Hi,
I have a problem using XML. I have a following XML
Now I want to parse the xml with PL/sql and store the value of client and message_type in two variable like v_client and v_msg_typ.
Could you please help me sending the code of that?
Thanks in advance
Mithun
I know the Oracle OTN Forums experiencing some problems, but don’t you think you will be more succesfull there ( http://forums.oracle.com/forums/forum.jspa?forumID=34 ) solving your problem?
My site scrambled your XML…so I have nothing to go on…
Grz
Marco
Good article, it helped me. Thanks!
select extract(x,’/Product/qty/no/text()’).getStringVal() into p from dual;
and
select extract(x,’/Product/qty/no/text()’).getStringVal() into p from dual where existsNode(x,’/Product/qty[no=”2″]’) = 1;
returns–> 12
Please let me know where I am going wrong
My XML:
——
1
2
My XML used in the above comment is
’12’
Use TABLE(XMLSequence(extract())) in the FROM clause to unnest the “strings”
Hi there,
When I try with “select extractvalue” it returns “ORA-19025. EXTRACTVALUE returns value of only one node” to me. Do you know why it happens?
Anyway, thanks a lot for the usefull information!
EXTRACTVALUE can only return a scalar value. To find out where you get XML fragments or multiple values temporarily replace the EXTRACTVALUE with EXTRACT…
Hello!
Thank you a lot for your article! It was exactly what I was looking for!
Although, I do have a problem there. By using
SELECT count(‘/*/row’)
FROM SAP_XML t , TABLE(XMLSequence(extract(t.xml_file,’/*/row’))) v
where …;
I always get “12” as return. I use the 11g Version. I understand I have to “use the XMLTABLE syntax instead of the table(xmlsequence(extract())) construct” but how?
Thank you for your time in advance!
i can’t understand this expresion i hope any one explain it
SELECT COUNT(*) FROM amis_xml t
WHERE existsnode(t.DATA,’/PAYMENT/PERSON/SALARY’)=1;
COUNT(*)
———
3
1 ROW selected.
Find all nodes where exists a node SALARY under /PAYMENT/PERSON which is not without value/empty and then count the results.