Friday, September 08, 2006

At 10AM, we have a meeting. (Dr.Candan, Prof.Sapino, and I) We discussed about the possible schemes that could be used in the matching between different schema or ontology. 
I read through some papers referred by our paper for ICDE 2007, and most of these papers are related to the topic - matching.
When integrating different ontologies, we usually consider three cases where matching might happen to:
(The ontology is assumed to be like a hierarchical structure, and  it's composed of nodes and edges. Each node in the hierarchy represents a concept, while edges reflect the relationships among nodes. )
1) Node - Node
* if two nodes from different ontologies are matched, they could represent the same concept. In the context of our paper, they would be merged as a single node, and at the same time the relationship between their parents can be exclusive-or in the resulting integrated ontology. 

2) Node - Group of Nodes
* There are some ways to define a group of nodes in the ontology (or hierarchical structure).
(1) it is an explicit set of nodes. 
(2) it is a sub-tree structure or a path segment.
(3) it is composed of ancestors/descendants/siblings of one node.
(4) it is results of one declarative query. 
(5) ... 

3) Group of Nodes - Group of Nodes
* the definition of graph of nodes is similar to that in the previous case. 

In the meeting, we first discussed the way of node-based matching. 
Look at the figure, the node b and f are merged, and a new node (represented by the rectangle) is introduced for it. All children become the children of the new node, and their parents are exclusive candidates for the parent of the new node. 
Note that we have to consider the 

Next we paid attention to the way of path-based matching. (structural matching)
Path p1 and p2 are matched, even though they have the different structures. According to Dr.Candan, one possible way to merge the two parts is to put them together as a single structure, but the ancestor-descendant relationships among these nodes are not clear or determined unless the user gives an assignment. But I still don't feel comfortable about his idea. I need consider it over. 

Wednesday, September 06, 2006

In the morning, Dr.Candan, Prof. Sapino and I had a meeting. In the meeting, we discussed about the demo for ICDE 2007, and our research work in the following. 
1. The demo for ICDE 2007
1) Our goal is the implementation of a data integrating system, which is composed of some different components: import, export, matching, conflicts rendering, query processing, results visualization, and user interaction etc. 
In the current implementation (partial), we only construct a simple component supporting the conflicts rendering. Other parts should be concerned as well in the final version of the implementation.

2) Additionally, the potential implementation should provide on-going interface to the user, for a friendly human interaction with the system.

2. The research for the paper of SIGMOD 2007

The deadline of SIGMOD 2007 is at Nov. 19. We will challenge this conference with one paper. 
(If our paper submitted to ICDE 2007 is rejected, we would plan to resubmit it to WWW 2007. Thus, we could have two papers almost at the same time to work on.)

The content in this paper will be based the paper we submitted to ICDE 2007. In that paper, we focused on the integration of data, by solving value-based and structural conflicts happened. We also built a model to assist this goal. 

In the potential paper, we will add the matching procedure in the existing architecture. 
The basic idea is as following: 
0) The input should include not only one ontological structure (what's that?), say a hierarchical structure. 
1) The first step is to match these structures. In result, one combined hierarchical structure could be obtained, and perhaps the user might be able to modify the structure based on his request or add some constraints. 
2) According to the algorithm we brought out in the previous paper, one model with extensions if necessary, is constructed to represent the constraints and the combined structure. 
3) The model could be translated as a set of assertions, which could be the intermediated meta data that could be stored in the form of xml or rdf. 
4) Based on the constructed model, the query processing could be discussed. And some execution issues could be concerned as well. 

3. Task in the following days:
Because we almost had nothing about Matching in our previous paper for ICDE, we have to make a general survey of the Matching. According to Dr.Candan's suggestion, some papers should be read and a direction is supposed to be decided before we go ahead. 
We have at least two options: 
1) Integrating other people's work into ours, if it's suitable to be used;
2) Considering the way of changing our model. 
 

In the morning, I have a meeting with Dr.Candan. The main topic is about my research. 
First, I showed him with the simple implementation I made for the potential demo for icde2007. He gave me some good suggestions:
1. Introducing RDF as the form of file to store important information in the application, such as the assertions, the graph etc. 
2. It's better to display all vertices in the graph with the layout of hierarchical structure, like a tree. For the moment, the package of JUNG does not provide the relevant support. However, there are some solutions. 
For example, we could use MST (Minimal Spanning Tree) algorithm to construct a virtual tree from the original graph, and use this virtual tree as the input of the algorithm to visualize tree structure provided by JUNG. The positions of all vertices could be obtained at last, to modify the original graph to be tree-like.
When MST is concerned, we need be careful to select the root of the virtual tree. (Maybe the number of fan-in or fan-out can be clue!)
On the other hand, we also talk about how to display over one sub-trees in the graph. 

3. Dr.Candan assigned two papers to me, which are in the reference in the cleandb paper. 
4. He also asked me to give the slides for his presentation in CleanDB for our paper.