Tuesday, January 17, 2006

A starting point for the research on Null Value in the XML documents

In the morning, I went to Prof.Kintigh's office. We discussed about some problems in the implementation. Everything looks ok so far. We would prepare for the workshop in the next week. I think that I'd better attend then.
At noon, I had a meeting with Dr.Candan, talking about our research on Null Value in the XML document. The discussion made it a little more clear about the basic problems and some potential solutions. I don't believe these ideas are mature enough, because I still have some questions. At least it's starting. The following is some conclusions we got from the short meeting.

  1. One direction of the research is to convert the XML documents into the relational database. Then use suitable marks to represent the Null Values from the XML documents.
  2. We could represent each node in an XML tree structure to be a 3-tuple (node-id, tag, parent-id).

1) node-id is an identifier for each node in the XML documents; it may be a simple number without any meaning, and it's easily scalable; a range/path code containing the structure information about the concerned XML tree may also be exploited, but it's operating costly when addition or deletion of nodes are concerned. On the other hand, node-id might be substituted by a kind of query (such as an XQuery) leading to a set of node-ids. Or one of marks that catches different senses of Null Values.

2) tag describes the value contained in the node. In most cases, it's a string. It's also possible to be a mark defined in Dr.Candan's report - A Unified Treatment of Null Values using Constraints.

3) parent-id tells the information about parents of the concerned node. This part is about the structure information in the XML documents to do with the concerned node. As far as we know, a constraint composed of Boolean operators and XQuerys might be used to represent the Null Value happening at this point. But the question is what to define and how to define then. We should be careful not to make thing too complex.

Honestly, we don't think it's the final version of the representation for the node. Moreover, the current definition of the Null Value in the XML document is not clear enough.


From the point of 1&2, some concerns are as the following:
  1. What is the definition of Null Value in the XML document?
  2. How to represent the Null Value in the XML documents.
  3. Is it necessary to convert an XML document containing Null values into a relational database in order to process those null values?
  4. If the answer to 3 is yes, how to do the conversion? What is the influence for the XML operations?
  5. If the answer to 3 is no, what to deal with Null Values, and how to give out a formal representation in order to make it easy to process Null Values in the XML documents.

No comments: