Thursday, September 14, 2006

From the perspective of data integration, we have two ontologies from different sources as the input of QUEST. Before continuing, we define some notation for simplicity.?
1. Two ontologies (or schemas) are A and B.?
2. map(A, B): defines a set of mapping rules between A and B, which depict relationships of elements or structures in A with those in B. These rules also define conflicts happened during the matching or merging.?
3. comb(A, B): gives the integrated view of both A and B. Mapping rules may be merged in this integrated representation.?
4. tran(A): converts the ontology/schema A into the graphic representation of QUEST.
5. tran(R): integrates the mapping rule R into the graphic representation of QUEST.

Three cases are considered when the data integration of A and B is concerned:
1) A + B + map(A, B) --> comb(A, B) --> tran(comb(A, B))
The two ontologies A and B are matched based on some algorithm and mapping rules between A and B are generated as well. All of these are merged into one single integrated view comb(A, B).
In order to translate comb(A, B) into QUEST model in our paper, it's probably to adapt the comb(A, B) to fit our existing model. 
The papers "A graph theoretical foundation for integrating RDF ontologies" and "A graph-oriented model for articulation of ontology interdependencies" including "Semantic data integration in hierarchical domains", are good examples which could provide an intermediate and integrated form. 
?? Could we use it? Modifications are required for the model of comb(A, B). 

2) T(A) + T(B) + T(map(A, B)) 
Many papers about schema matching, don't resort to an integrated view of schemas from different sources. Instead, they focus on the construction of mapping rules linking schemas from different sources. Usually, it's seemly easy to convert a schema like A into QUEST model (T(A)) based on the algorithm in our paper. When the mapping rules (map(A, B)) are concerned, our model is probably supposed to be extended. 

3) A --> T(A) + B --> T(B) + map(T(A), T(B))
In this case, the matching operation is made on the basis of QUEST representations of A and B, that is T(A) and T(B). Usually, the user will be requested during the matching process. In order for the representation of null values, QUEST model is a little more complex, so that it's not direct for a user instead of an expert to finish the matching work. 

The conclusion is that 2) might be preferred according to the difficulties that we could meet with when modifying the existing algorithm to generate QUEST model. 
Thus, in the following we focus on the case: T(A) + T(B) + T(map(A, B)) ;
If the ontology of A is kind of standard input, it is similar to a hierarchical structure, and should be easily converted into QUEST form based on our algorithm. 
map(A,B) obtains a set of mapping rules about the relationships between schemas from different sources. We divide these rules into 2 categories:
(1) mapping rules
In this category, 3 different situations can be as follows:
a. node - node: 
The two nodes from different schemas are equivalent of some matching concept. (That's they are equal to represent one concept)
One method is to merge these two nodes as one in the final QUEST model.
The other is to introduce a new node to represent the single concept, taking these nodes as children.

b. node - group of nodes
A group of nodes is composed of over one nodes, and two subcategories are set based on the structure of the nodes. 
If there are no a structure in the group or the structure can be ignored, it's a simple set of nodes. 
* Merging or introducing new concept node is the basic method to deal with these cases.

Otherwise, the nodes in the group are organized in the structural form, like tree, graph, or some patterns. 
* It's a little complex 

c. group of nodes - group of nodes

(2) constraints (see paper "A graph theoretical foundation for integrating RDF ontologies")
a. Horn constraints: r1 ^ r2 ^ ... ^ rn -> t

b. Negative constraints: <> etc.

Wednesday, September 13, 2006

From the perspective of data integration, we have two ontologies from different sources as the input of QUEST. Before continuing, we define some notation for simplicity. 
1. Two ontologies (or schemas) are A and B. 
2. map(A, B): defines a set of mapping rules between A and B, which depict relationships of elements or structures in A with those in B. These rules also define conflicts happened during the matching or merging. 
3. comb(A, B): gives the integrated view of both A and B. Mapping rules may be merged in this integrated representation. 
4. tran(A): converts the ontology/schema A into the graphic representation of QUEST.
5. tran(R): integrates the mapping rule R into the graphic representation of QUEST.

Three cases are considered when the data integration of A and B is concerned:
1) A + B + map(A, B) --> comb(A, B) --> tran(comb(A, B))
The two ontologies A and B are matched based on some algorithm and mapping rules are generated as well. All of these are merged into one single integrated view comb(A, B). 
Therefore, QUEST model is obtained through translating the integrated view comb(A, B). 


2)

3)




Tuesday, September 12, 2006

I've read some papers about data/information integration. Most of them focuses on the matching operation in the integrating procedure. In my impression, they didn't put much on the merging of different sources. Maybe it lies on some reasons. First, many applications of data integration do not require an integrating view of different sources, like E-Commence, which pays more attention to the communication among peers. Second, in the data integrating system, the matching part seemly is more important than the merging part, because the matching is the fundamental operation in the data integration. Merging should be done based on the result of matching. 

Let's face the truth. We must have a component in the potential system to do the matching work. "A survey of approaches to automatic schema matching" and "Generic Schema Matching with Cupid" are two good references for the work on the area of Matching. When a simple implementation of our algorithm is concerned, we may borrow ideas from others' work. According to Dr.Candan, our focus is on the integrating part: how to integrate ontologies from different sources into one globe view.