After creating the initial structures to scrape some JSON content of Wikipedia, as mentioned in “HOWTO: Building a JSON Database API (1)“, it is now time to create some views and accompanying PL/SQL code.
Using JSON_TABLE…
Although you would be able to see the JSON content if it had been stored in a CLOB column, it is not that easy to do via a BLOB column. In this case, as a simple workaround, we can use the JSON_TABLE function to make all JSON BLOB content visible up to the setting for the MAX_STRING_SIZE database parameter in Oracle 12c.
--
CREATE OR REPLACE FORCE VIEW SCRAPED_JSON_4000_MAX
as
select sc.id,
jt.json_content
from json_scraped_content sc
, JSON_TABLE(sc.json FORMAT JSON,
'$'
NULL ON ERROR
COLUMNS (json_content VARCHAR2(4000)
FORMAT JSON
WITH CONDITIONAL WRAPPER
PATH '$'
)) jt;
--
or
--
CREATE OR REPLACE FORCE VIEW SCRAPED_JSON_4000_MAX
as
select sc.id,
jt.json_content
from json_scraped_content sc
, JSON_TABLE(sc.json FORMAT JSON,
'$'
COLUMNS (json_content VARCHAR2(4000)
FORMAT JSON
WITH CONDITIONAL WRAPPER
PATH '$'
NULL ON ERROR
)) jt;
--
FORMAT JSON
First of all column JSON of table JSON_SCRAPED_CONTENT is a BLOB datatype. To signal this to the JSON_TABLE function you will have to add “FORMAT JSON” just after the column call and before the JSON Path Expression…
, JSON_TABLE(sc.json FORMAT JSON, ...
ERROR CLAUSE
Then, when the JSON content is bigger than 4000 characters in the varchar2(4000) column (or 32K for the new maximum size), we return NULL when it is bigger via the “ON ERROR” clause. The JSON_TABLE error clause, “NULL ON ERROR”, comes after the JSON Path Expression, as in the first example of the statement.
, JSON_TABLE(sc.json FORMAT JSON,
'$'
NULL ON ERROR
COLUMNS ...
The same goes for the second example of the statement, but in such a case it will depend what happens in the COLUMNS section, that is if you use a JSON_VALUE, JSON_QUERY or JSON_EXISTS, which might have different values or syntactically different, then the JSON_TABLE function. A brief overview in the documentation can be found here: “Error Clause for Oracle SQL Functions for JSON“
During testing your statement, it is very useful to always add the “ERROR ON ERROR” clause. The default is “NULL ON ERROR” which means even if something goes wrong (and not intended to be so), the function will not signal this, but always returns NULL.
In the mentioned use case, attempt via a simple JSON_TABLE construct to create a view which enables us to show some of the JSON content in the BLOB column, this is okay for now.
WRAPPER CLAUSE
The default representation of the result value is “as is”, also called “WITHOUT WRAPPER”. If by accident you needed JSON as a result or, for example, the output is not the expected “scalar” value, then you will get an error notifying the problem. You can circumvent this with the WRAPPER CLAUSE.
FORMAT JSON
WITH CONDITIONAL WRAPPER
PATH '$'
Have a look at the following JSON Examples from the manual
So for our view called SCRAPED_JSON_4000_MAX, to be on the save side, I used the WITH CONDITIONAL WRAPPER clause. This keeps JSON content “as is” and if it signals an error, that is being not valid JSON, the wrapper clause will create valid JSON content by embedding it in square brackets / creating a JSON array.
JSON_TABLE
In all I like the JSON_TABLE function. It reminds me bit of the XMLTABLE functionality; their is way more under the covers, then a first look will reveal. JSON_QUERY, JSON_VALUE, etc. can be used within the JSON_TABLE function which makes this a very powerful function. Given. The syntax might become more complex due to this as well.
The next step was to create a view with “show me the parent value”, “the child value”, “if there is a warning” and how the content looks if I escape the URL control parameters ( &, |, ?, \ ), with their HTML encoded values…
As a reminder, this is the content we are working with (forcing an error message by removing the ‘rawcontinue‘ parameter from the URL):
My current JSON_TABLE statement for view SCRAPED_CONTENT_CONTINUE is defined as:
--
CREATE OR REPLACE FORCE VIEW SCRAPED_CONTENT_CONTINUE
AS
SELECT id AS pkid ,
ai.rno ,
ai.parent_name ,
ai.child_name ,
REPLACE(REPLACE(REPLACE(REPLACE(ai.child_name,'\',''),'&','%26'),'|','%7C'),'?','%3F') AS child_name_escaped ,
ai.warning AS url_construct_warning
FROM json_scraped_content,
JSON_TABLE(json FORMAT JSON,
'$'
NULL ON ERROR
COLUMNS
( rno FOR ordinality ,
parent_name VARCHAR2(4000) PATH '$.query.allimages[*].name' ,
child_name VARCHAR2(4000) PATH '$."query-continue".allimages.aicontinue' ,
warning VARCHAR2(4000) PATH '$.warnings.main.*')
) AS ai;
--
There might be more URL control parameter I not yet know of, and or I have the content wrong, therefore I made the choice to create a view with a column to determine the actual value for the “next image” instead of initially try to get everything right in the PL/SQL package code.
You might notice the double quotes in “query-continue”
child_name VARCHAR2(4000) PATH '$."query-continue".allimages.aicontinue'
Currently using REPLACE as a fast solution although a Regular Expression might be a better one. The FOR ORDINALITY clause which generates row numbers, has been added by me, in this test case, to see if there are unexpected multiple rows. Just like adding the additional ERROR ON ERROR – clause during testing/development, to check on unforeseen “logical” data corruption (content is man made after all)…
Apparently IÂ accidentally discovered a JSON parser behavior inconsistency, that I nor the development team was aware of… Always happy to improve the product, but also a good example why working with man-made-live content is way more rewarding.
If you are with me this far…
the following is almost the full statement to see the JSON content in relational format via a crafted view called “SCRAPED_CONTENT”.
--
CREATE OR REPLACE FORCE VIEW SCRAPED_CONTENT
AS
SELECT id as pkid
, ai.rno
, ai.name
, ai.source_timestamp
, ai.username
, ai.picture_size
, ai.picture_width
, ai.picture_height
, ai.picture_comment
, ai.canonicaltitle
, ai.url
, ai.descriptionurl
, ai.metadata
, ai.commonmetadata
, ai.extmetadata
, ai.sha1
, ai.mime
, ai.mediatype
, ai.bitdepth
, ai.ns
, ai.title
FROM json_scraped_content,
JSON_TABLE(json FORMAT JSON,
'$.query'
NULL ON ERROR
COLUMNS
( rno for ordinality
, name VARCHAR2(4000) PATH '$.allimages[*].name'
, source_timestamp VARCHAR2(4000) PATH '$.allimages[*].timestamp'
, username VARCHAR2(4000) PATH '$.allimages[*].user'
, picture_size VARCHAR2(4000) PATH '$.allimages[*].size'
, picture_width VARCHAR2(4000) PATH '$.allimages[*].width'
, picture_height VARCHAR2(4000) PATH '$.allimages[*].height'
, picture_comment VARCHAR2(4000) PATH '$.allimages[*].comment'
, canonicaltitle VARCHAR2(4000) PATH '$.allimages[*].canonicaltitle'
, url VARCHAR2(4000) PATH '$.allimages[*].url'
, descriptionurl VARCHAR2(4000) PATH '$.allimages[*].descriptionurl'
, metadata VARCHAR2(4000) FORMAT JSON WITH CONDITIONAL WRAPPER PATH '$.allimages[*].metadata'
, commonmetadata VARCHAR2(4000) FORMAT JSON WITH CONDITIONAL WRAPPER PATH '$.allimages[*].commonmetadata'
, extmetadata VARCHAR2(4000) FORMAT JSON WITH CONDITIONAL WRAPPER PATH '$.allimages[*].extmetadata'
, sha1 VARCHAR2(4000) PATH '$.allimages[*].sha1'
, mime VARCHAR2(4000) PATH '$.allimages[*].mime'
, mediatype VARCHAR2(4000) PATH '$.allimages[*].mediatype'
, bitdepth VARCHAR2(4000) PATH '$.allimages[*].bitdepth'
, ns VARCHAR2(4000) PATH '$.allimages[*].ns'
, title VARCHAR2(4000) PATH '$.allimages[*].title'
)
) AS ai;
--
Mark the “FORMAT JSON” again (BLOB column in use). If using a CLOB or VARCHAR2 etc. column, this is not additionally needed.
FROM json_scraped_content,
JSON_TABLE(json FORMAT JSON,
You can dig deeper in the JSON content, for example via the EXTMETADATA bits and pieces,
--
CREATE OR REPLACE FORCE VIEW SCRAPED_JSON_EXTMETADATA
AS
SELECT sc.id as pkid
, ai.DateTime
, ai.ObjectName
, ai.CommonsMetadataExtension
, ai.Categories
, ai.Assessments
, ai.ImageDescription
, ai.Artist
, ai.Credit
, ai.LicenseShortName
, ai.UsageTerms
, ai.Attribution
, ai.LicenseUrl
, ai.Copyrighted
FROM json_scraped_content sc,
JSON_TABLE(sc.json FORMAT JSON,
'$.query.allimages[*].extmetadata'
NULL ON ERROR
COLUMNS
( rno for ordinality
, DateTime VARCHAR2(4000) PATH '$.DateTime.value'
, ObjectName VARCHAR2(4000) PATH '$.ObjectName.value'
, CommonsMetadataExtension VARCHAR2(4000) PATH '$.CommonsMetadataExtension.value'
, Categories VARCHAR2(4000) PATH '$.Categories.value'
, Assessments VARCHAR2(4000) PATH '$.Assessments.value'
, ImageDescription VARCHAR2(4000) PATH '$.ImageDescription.value'
, Artist VARCHAR2(4000) PATH '$.Artist.value'
, Credit VARCHAR2(4000) PATH '$.Credit.value'
, LicenseShortName VARCHAR2(4000) PATH '$.LicenseShortName.value'
, UsageTerms VARCHAR2(4000) PATH '$.UsageTerms.value'
, Attribution VARCHAR2(4000) PATH '$.Attribution.value'
, LicenseUrl VARCHAR2(4000) PATH '$.LicenseUrl.value'
, Copyrighted VARCHAR2(4000) PATH '$.Copyrighted.value'
)
--
or create master-detail sections or if you want, or create code making use of JSON nested-column sections.
Marc Bleron wrote a nice post about this functionality in his post called “JSON_TABLE chaining“.
The last post of this series, will finish up with showing the final outcome/result, security stuff and PL/SQL code.
You can use UTL_URL API to escape/unescape url reserved characters :
http://docs.oracle.com/database/121/ARPLS/u_url.htm#ARPLS073
Thanks Marc.
Didn’t know about that one yet and, indeed, this would remove the ugly replace/replace/etc code in my example with a more neat call based on UTL_URL.
M.