Wednesday, December 06, 2006

In the morning, we had a meeting about the proceeding of the research. In the following work, we have two concentrations: one is about Demo, and the other is about the Journal paper for VLDB. 
  1. Research paper
    1. Based on the CleanDB paper, a journal paper should be compiled. Since the CleanDB paper is relatively simple because of the space limit, more details should be provided in this journal paper, such as:
      1. Motivation
      2. Extensions
      3. Expressive power
      4. Evaluation
    2. Some new aspects need noticing, such as 
      1. In order to deal with more complex query, (not only path query, but branch query ...), we should improve the query processing by introducing the algorithms to find ranked candidates for the branch query. (one way is to exploit the spanning tree to handle the branch queries, and the other is to use the join operation on the basis of path results.)
      2. Now, we can only deal with part of XPath query. In the future, XPath, even XQuery should be considered. 
      3. For results of one path query, they could have different sources or sinks, since many nodes can have the same tag, or value. It is not very hard if virtual nodes are used as the common source and sink. On problem in this case might be related to the identification of conflicts. (??? I am still considering what it is??? ) 
  2. Implementation of Demo: Three main parts make up the whole process: matching, merging and query evaluation. Furthermore, some components could be defined for each part. We also have some questions about the implementation.
    1. What data model we will use in our demo? In other words, what is the input of the system? The candidate includes OWL, which could be used for ontologies from multiple sources. 
    2. Matching: 
      1. Text similarity based matching; (please find the right algorithm ...)
      2. User can provide additional information to assist the matching among nodes
      3. Structure based matching, where we could make use of Kim's work (?? how)
    3. Merging algorithm, 
      1. for the moment, we use the scheme presented in the paper (submitted to Sigmod) to merger matched nodes.
    4. Zone identification and agreement computation
      1. From the integrated view of inputting ontologies, the zone graph should be constructed.
      2. Based on the zonal graph, zones are identified and agreements of zonal choices are computed. 
      3. A edge-labeled directed graph is then generated to be the agreement-based graph.
    5. Path instances enumeration for XPath
      1. Based on the user's XPath query, candidates of the results are enumerated and ranked based on the path agreements. (here, we could introduce k-span tree searching algorithm to extend the complexity of queries, from path to branch ...)
    6. Constraints evaluation and fuzzy program optimization
      1. this part is domain specific.
      2. degree of conflicts needs measuring.
The first 3 issues above and the last one are domain specific, and the rest is relatively independent of the domain of application. 


No comments: