Wednesday, December 13, 2006

Learning OWL (2)

(http://www.w3.org/TR/owl-guide/)

The OWL Web Ontologoy Language is a language for defining and instantiating Web ontologies. Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related.

An OWL ontology may include descriptions of classes, properties and their instances.

In addition, OWL makes an open world assumption. That is, descriptions of resources are not confined to a single file or scope. While class C1 may be defined originally in ontology O1, it can be extended in other ontologies. The consequences of these additional propositions about C1 are monotonic. New information cannot retract previous information. New information can be contradictory, but facts and entailments can only be added, never deleted.

A class definition has two parts: a name introduction or reference and a list of restrictions. Each of the immediate contained expressions in the class definition further restricts the instances of the defined class. Instances of the class belong to the intersection of the restrictions. So far we have only seen examples that include a single restriction, forcing the new class to be a subclass of some other named class.

The rdfs:label entry provides an optional human readable name for this class. Presentation tools can make use of it. The "lang" attribute provides support for multiple languages. A label is like a comment and contributes nothing to the logical interpretation of an ontology.

Basic Elements

Most of the elements of an OWL ontology concern classes, properties, instances of classes, and relationships between these instances.

I. Simple Classes and Individuals

A. Simple Named Classes: Class, rdfs:subClassOf

1. The most basic concepts in a domain should correspond to classes that are the roots of various taxonomic trees. Every individual in the OWL world is a member of the class owl:Thing. Thus each user-defined class is implicitly a subclass of owl:Thing. Domain specific root classes are defined by simply declaring a named class. OWL also defines the empty class, owl:Nothing.

B. Individuals

1. In addition to classes, we want to be able to describe their members. We normally think of these as individuals in our universe of things. An individual is minimally introduced by declaring it to be a member of a class.

C. Design and use

1. In certain contexts something that is obviously a class can itself be considered an instance of something else.

2. It is very easy to confuse the instance-of relationship with the subclass relationship.

II. Simple Properties

A. Properties let us assert general facts about the members of classes and specific facts about individuals.

B. Definition

1. A property is a binary relation. Two types of properties are distinguished:

a) datatype properties, relations between instances of classes and RDF literals and XML Schema datatypes

b) object properties, relations between instances of two classes. Note that the name object property is not intended to reflect a connection with the RDF term rdf:object.

2. When we define a property there are a number of ways to restrict the relation. The domain and range can be specified. The property can be defined to be a specialization (subproperty) of an existing property. Properties, like classes, can be arranged in a hierarchy.

C. Properties and Datatypes

1. We distinguish properties according to whether they relate individuals to individuals (object properties) or individuals to datatypes (datatype properties). Datatype properties may range over RDF literals or simple types defined in accordance with XML Schema datatypes.

D. Properties of Individuals

III. Property Characteristics

A. TransitiveProperty: P(x,y) and P(y,z) implies P(x,z)

B. SymmetricProperty: P(x,y) iff P(y,x)

C. FunctionalProperty: P(x,y) and P(x,z) implies y = z

D. inverseOf: P1(x,y) iff P2(y,x)

E. InverseFunctionalProperty: P(y,x) and P(z,x) implies y = z

IV. Property Restrictions

In addition to designating property characteristics, it is possible to further constrain the range of a property in specific contexts in a variety of ways.

A. allValuesFrom and someValuesFrom;

1. local to their containing class definition.

B. Cardinality

C. hasValue

1. specify classes based on the existence of particular property values. Hence, an individual will be a member of such a class whenever at least one of its property values is equal to the hasValue resource.

Ontology Mapping

In order for ontologies to have the maximum impact, they need to be widely shared. It is important to realize that much of the effort of developing an ontology is devoted to hooking together classes and properties in ways that maximize implications. It will be challenging to merge a collection of ontologies.

I. Equivalence between Classes and Properties

A. equivalentClass, is used to indicate that two classes have precisely the same instances.

B. equivalentProperty

II. Identity between Individuals

A. sameAs

III. Different Individuals

A. different from

B. All different

Complex Classes

OWL provides additional constructors with which to form classes. These constructors can be used to create so-called class expressions. Note that Class expressions can be nested without requiring the creation of names for every intermediate class. This allows the use of set operations to build up complex classes from anonymous classes or classes with value restrictions.

I. Set Operators

A. Intersection: intersectionOf

B. Union: unionOf

C. Complement: complementOf

II. Enumerated Classes

A. oneOf

III. Disjoint Classes

A. disjointWith

Monday, December 11, 2006

Issues to be Concerned in the Implementation of Demo

We have some steps to follow in the demo. Here, these are displayed below one by one:
  1. Input of Ontologies

A. OWL files are taken as the input

1. Parse OWL file

a. Learn the definition of OWL

b. Get all classes, instances, and relationships

2. Represent entities and relationships with a hierarchical graphic view,

a. for instance, an E-R model, thus a label depicting the relationship could be associated with each edge.

B. Transformation

1. OWL – (input, parsing) --> Hierarchical Graphic View (such as E-R model) – (export) --> OWL

2. If there exist some integrity constraints, they should be defined with some formal way, so as to be understood by the program. (???)

  1. Matching Ontologies.

A. Automatic matching:

1. text similarity based matching, from the content of OWL directly.

2. When a string repeats in some ontologies, it’s not clear that how to match them. For instance, ‘name’ occurs in one ontology several times, and also happens in some other ontology not only once, the solutions could be:

a. User decides (for now)

b. Leave it alone

c. Context-based

3. visualize the matching with some suitable way.

a. Table might be used to display all text-similar matching, like:

Ontology 1

Ontology 2

......

Ontology n

1

Large Mam

Large Mammmal

--

2

Baron

--

Baro

B. Manual matching: (human interaction involved)

1. matching types:

a. node-node (1:1)

b. node-set (1:m)

c. set-set (n:m)

2. the challenge here is to implement matching operations directly on the graphic view (in other words, a friendly interface assisting to the user to finish the matching work).

  1. Merging Ontologies

A. Zonal graph is generated based on results from previous step and the merging rules defined in our paper. In other words, by means of introducing exclusive choices and coordinated choices into the hierarchical graphic view above.

1. Three different cases as follows should be handled:

a. 1:1 matching:

b. 1:n matching:

c. m:n matching:

2. If the zonal graph is treated as the intermediated form of the integration, it may be necessary to store it in some form in order for the later use.

B. Some preparations for the following steps

1. Zones could be recorded or stored somewhere for the computation of agreement values, since each zone could be identified at this step.

2. Even more, the agreement of each connection could be computed at this step as well.

  1. Zone Identification and Agreement Computation

A. Get all zones caused by merging.

B. Compute the agreement of each edge representing one relationship in OWL file.

1. For each zone, compute the number of models.

2. Calculate the agreement of each source-sink pair in the zone.

C. Produce the agreement-based integrated graphic view, whose edges are associated with their agreements.

1. The graph generated here is more general, as an edge-labeled directed graph.

  1. Query Processing

A. For now, we could only handle the simple path query without containing branches. The basic components in the query include:

1. a/b

2. a//b

3. a/*/b

4. //* or *// is equal to // (it seems that not only once is required for parsing of the query expression. )

B. Algorithm: when the path query is represented as 'a1 t1 a2 t2 a3 t3 ......', how to process it? (where a? is node, t? is {/, //, *}.)

1. Direction

2. Splitting and Processing and Joining etc. (looks not so easy)

3. The results should be ranked based on the path agreement.

C. Display the results to the user with some suitable way, in order for the user to provide with feedback.

1. What kinds of feedback are potential?

a. change the ranking

b. eliminate some results

c. ask more results

d. ???

  1. Conflicts Analysis and Agreement Update

A. Definition of conflicts and constraints according to our paper

1. zonal conflicts

2. path integrity conflicts

3. acyclicity conflicts

4. user's feedback

5. other global / integrity constraints

6. user defining constraints???

B. Generation of fuzzy program to capture validity postulates

1. Construct the fuzzy program

2. solve the program with Lingo API

3. analyze the results from Lingo, and

C. Display the conflicts solution and the agreement update

1. the edges whose agreement is changed should be redrawn in the integrated graphic view.

D. How to reflect the user's feedback to the original ontology!? In other words, how to connect the user’s feedback and the system update explicitly.

Sunday, December 10, 2006

Issues to be Concerned in the Implementation of Demo
We have some steps to follow in the demo. Here, these are displayed below one by one:
  1. Input of Ontologies
    1. OWL files are taken as the input of the demo
      1. Parse OWL file
        1. Learn the definition of OWL
      2. Represent entities and relationships with a hierarchical graphic view, 
        1. for instance, an E-R model.
    2. Transformation
      1. OWL -- parsing --> Hierarchical Graphic View (such as E-R model) -- export --> OWL
  2. Matching Ontologies. 
    1. Automatic matching:
      1. text similarity based matching, from the content of OWL directly.
      2. mostly 1:1 matching
      3. visualize the matching with some suitable way.
    2. Manual matching: (human interaction involved)
      1. matching types: 
        1. node-node (1:1) 
        2. node-set (1:m)
        3. set-set (n:m)
      2. matching operations on the graphic view are required (a friendly interface assisting to the user to finish the matching work).
  3. Merging Ontologies
    1. Zonal graph is generated based on results from previous step, and according to the merging rules defined in our paper. In other words, introduce exclusive choices and coordinated choices into the hierarchical graphic view above.
      1. We could also use the merging rules in the example we used in the paper. The basic idea is as following rules:
        1. algorithm please !?!?
      2. Is there a way to transform the zone graph into OWL file? How to express the concept of zone in the form of OWL.
    2. Some preparations for the following steps
      1. Zones could be recorded or stored somewhere for the computation of agreement values which can even be calculated at this step.
  4. Zone Identification and Agreement Computation
    1. Get all zones caused by merging.
    2. Compute the agreement of each edge representing one relationship in OWL file. 
      1. For each zone, compute the number of models.
      2. Calculate the agreement of each source-sink pair in the zone.
    3. Produce the agreement-based integrated graphic view, whose edges are associated with their agreements. 
  5. Query Processing
    1. For now, we could only handle the simple path query, without branches. The basic components in the query include:
      1. a/b
      2. a//b
      3. a/*/b (? how)
      4. //* or *// is equal to // (it seems that not only one parsing of the query expression is required. )
    2. Algorithm please: when the path query is represented as '1 t1 a2 t2 a3 t3 ......', how to process it?
      1. Direction
      2. Splitting and Processing and Joining etc. (looks not so easy L )
      3. At the same time, the results should be ranked based on the path agreement. 
    3. Display the results to the user with some suitable way, in order for the user to provide with feedback. 
      1. What kinds of feed back do we have? 
        1. change the ranking
        2. eliminate some results
        3. ask more results
  6. Conflicts Analysis and Agreement Update
    1. Definition of conflicts and constraints according to our paper
      1. zonal conflicts
      2. path integrity conflicts
      3. acyclicity conflicts
      4. user's feedback
      5. other global / integrity constraints
    2. Generation of fuzzy program to capture validity postulates
      1. solve the program with Lingo API
      2. analyze the results from Lingo, and 
    3. Display the conflicts solution and the agreement update
      1. the edges whose agreement is changed should be redrawn in the integrated graphic view. 
    4. How to reflect the user's feedback to the original ontology!? 

Thursday, December 07, 2006

Some Issues to be Concerned in our Research
There are 2 parts in the work of next month, one of them is about research, and the other is about the implementation of the Demo. Now, let's take a close look at what we should do with the research work first. (The other part will be checked soon. )
  1. Query extension - XPath & XQuery: we could introduce the evaluation of complex queries in our work. We have two choices:
    1. Define the intermediate representation of the query. Then, transformation from XPath or XQuery to the intermediate form should be defined. 
    2. Another one is to equivalently transform the XQuery into one XPath query expression or one set of XPaths queries etc. 
  2. Dealing with the branch queries: (necessary, since we want to handle more complex queries in XPath)
    1. In our current work, we can process the simple path query by means of k-shortest paths algorithm in the directed graph. 
    2. When the branch query is concerned, we could make use of the existing algorithm like k-shortest paths, to finish the evaluation. 
      1. The assumption is that we should have a way to measure the 'agreement' of the branch, like that of the path. (More general definition of agreement. )
      2. Another question is related to the way how to get the instance satisfying the branch query from the directed graph. The possible solutions include:
        1. JOIN method: splitting the branch query into path queries,  evaluating each path query, and finally joining all instances from each path query. 
        2. some holistic query-evaluating algorithm could be used to process the branch queries, for instance Prufer sequence gives a choice to evaluate the branch query holistically. 
       One problem is that most of holistic algorithms require tree-like data modal, not         the directed         graph. Therefore, algorithms to find spanning tree in the directed         graph may firstly be exploited in this case.
  1. The last issue is about the definition of conflicts in our paper. It might need improving in the new situation where complex queries compared with the path query, are involved.
  1. In sum, we (actually it's me) should do something to try solving problems above. I have a very simple plan, as follows:
    1. XPath and XQuery 
      1. write a report on the basic knowledge of these two concepts.
    2. Path query --> Branch query
      1. implement the Join algorithm to evaluate the branch query
      2. implement the holistic query evaluating algorithm to evaluate the branch query.
    3. Understand the current definition of conflicts in our paper, and summarize the properties of the conflicts. It should be able to handle more complex cases about conflicts, like those among path instances that have different sources or sinks. 

Wednesday, December 06, 2006

In the morning, we had a meeting about the proceeding of the research. In the following work, we have two concentrations: one is about Demo, and the other is about the Journal paper for VLDB. 
  1. Research paper
    1. Based on the CleanDB paper, a journal paper should be compiled. Since the CleanDB paper is relatively simple because of the space limit, more details should be provided in this journal paper, such as:
      1. Motivation
      2. Extensions
      3. Expressive power
      4. Evaluation
    2. Some new aspects need noticing, such as 
      1. In order to deal with more complex query, (not only path query, but branch query ...), we should improve the query processing by introducing the algorithms to find ranked candidates for the branch query. (one way is to exploit the spanning tree to handle the branch queries, and the other is to use the join operation on the basis of path results.)
      2. Now, we can only deal with part of XPath query. In the future, XPath, even XQuery should be considered. 
      3. For results of one path query, they could have different sources or sinks, since many nodes can have the same tag, or value. It is not very hard if virtual nodes are used as the common source and sink. On problem in this case might be related to the identification of conflicts. (??? I am still considering what it is??? ) 
  2. Implementation of Demo: Three main parts make up the whole process: matching, merging and query evaluation. Furthermore, some components could be defined for each part. We also have some questions about the implementation.
    1. What data model we will use in our demo? In other words, what is the input of the system? The candidate includes OWL, which could be used for ontologies from multiple sources. 
    2. Matching: 
      1. Text similarity based matching; (please find the right algorithm ...)
      2. User can provide additional information to assist the matching among nodes
      3. Structure based matching, where we could make use of Kim's work (?? how)
    3. Merging algorithm, 
      1. for the moment, we use the scheme presented in the paper (submitted to Sigmod) to merger matched nodes.
    4. Zone identification and agreement computation
      1. From the integrated view of inputting ontologies, the zone graph should be constructed.
      2. Based on the zonal graph, zones are identified and agreements of zonal choices are computed. 
      3. A edge-labeled directed graph is then generated to be the agreement-based graph.
    5. Path instances enumeration for XPath
      1. Based on the user's XPath query, candidates of the results are enumerated and ranked based on the path agreements. (here, we could introduce k-span tree searching algorithm to extend the complexity of queries, from path to branch ...)
    6. Constraints evaluation and fuzzy program optimization
      1. this part is domain specific.
      2. degree of conflicts needs measuring.
The first 3 issues above and the last one are domain specific, and the rest is relatively independent of the domain of application. 


Tuesday, December 05, 2006

Semantic Heterogeneity in Global Information System  The Role of Metadata, Context and Ontologies 
The concentration of this paper is about the approaches to tackle the semantic heterogeneity problem in the context of Global Information Systems (GIS). In the beginning, I didn't think it's so useful for me as a survey, because it looked normal and hardly got much. It was proved in the end, although some content mentioned in the paper is helpful for our research. Now let's take a close look at the survey paper.

1. In the beginning of this paper, the author defines the necessity of approaches to the semantic heterogeneity. The problem of semantic heterogeneity is defined as identification of semantically related objects in different databases and the resolution of schematic differences among them. The approaches brought up in the paper has the key objective to reduce the problem of knowing the contents and the structure of each of the huge number of information repositories to the significantly smaller problem of knowing the contents of the domain specific ontologies, which a user familiar with the domain is likely to know or easily understand. 

2. What is Metadata?
The basic idea of the paper is to construct metadata from the original Database, Metadata is data or information about data. In the paper, it's categorized as follows:
    1. Content independent meta-data, doesn't depend on the content of the document with which it is associated such information as location, modification-data etc.
    2. Content dependent metadata, something to deal with the content of the document it is associated, like the size of the document, number of rows etc. This category can be divided into smaller classes:
      1. Direct content-based metadata: based directly on the content; examples include full-text indices, inverted tree, document vectors etc.
      2. Content-descriptive metadata: describe the contents without direct utilization of the contents of the document. Furthermore, it could be divided into:
        1. Domain independent metadata: capture information present in the document independent of the application or subject domain of the information. HTML document type definition is a good example in this category.
        2. Domain specific metadata: described in a manner specific to the application or subject domain of information. Thus issues of vocabulary is important, because terms have to be chosen in a domain specific manner.
According to the paper, the domain specific metadata is more important to deal with the issues related to semantic heterogeneity, since it could capture information in a specific application or a domain. And it could be viewed as a vocabulary of terms for construction of more domain specific metadata descriptions. 

In the paper, two kinds of meta-data are introduced, including metadata contexts, and conceptual context. 
    1. Metadata contexts mainly abstracts the representational details;
    2. Conceptual contexts captures domain knowledge. Together with the structured data, the paper uses this metadata to capture the information content.

3. Constructing c-contexts from ontological terms. 
The method used in the paper is to use terms from domain specific ontologies as the vocabulary to characterize the information. In other words, the meta-data of a document could be represented as a vector of attribute-value pairs. Based on this definition, the mechanism of reasoning and manipulation is also discussed. In order to map contextual descriptions to the Database schema, a set of projection rules are defined and used. 
Finally, some issues about language and ontology in context representation are presented. 
    1. About the value of context attributes:
Context here is a collection of contextual coordinates and their values. There are some basic requirements for the definition of these values. (1) declarative in nature helping to perform inferences on the context. (2) express the context as a collection of contextual coordinates, each describing a specific aspect of information present in the database or requested by a query, which is consistent with the representation of the c-context. (3) have primitives in the model world and ontology.
    1. About the ontology: The scalability of the ontology is more concerned at this step. The paper tries to present two approaches to the combination of various ontologies. 
      1. The common one 
        1. build an extensive global ontology.
        2. exploit the semanties of a single problem domain. 
      2. Re-use of existing ontologies/classifications
        1. combining different existing ontologies, but some problems should be noticed, like the overlap between different ontologies. 

4. Semantic interoperability using terminological relationships, like synonyms, hyponyms, hypernyms.
    1. When discussing the synonyms to interoperate across ontologies, the paper used OBSERVER as an example. 
      1. An architecture for interoperation is presented. It's mainly composed of query processor, ontology server, interontologies relationships manager (IRM) and ontologies. (This structure may be useful for our work. At least, we could make a comparison between it and our own architecture. In some sense, I don't think it could work very well, because the component IRM assumes too much work, and it's complex enough to intimidate any designer, although IRM helps a lot for the scalability of the whole procedure. )
    1. Using hyponyms and hypernyms to interoperate across ontologies: this section seems very useful because it describes the problems we have in our project. In reality, hierarchical relationships like hypernyms, hyponyms are more common than synonym ones. The solution introduced by the paper is to substitute a non-translated term by the intersection of its immediate parents or the union of its immediate children. The point is that synonyms, hyponyms and hypernyms should be firstly identified inside and between the user and target ontologies. 




Saturday, December 02, 2006

A Survey on Information Systems Interoperability
In this survey, some basic concepts are presented. It's pretty good to go over these concepts in this paper. Its structure is clear and it's easy to understand most of the content. The author had the rich experience to apply the techniques of information systems into the agriculture. I think that is a system to manage multiple database containing agricultural information, therefore it might be pretty helpful on our work, since our goal is to construct a data-integration system to manage multiple archeological database.

Now, let's take a look at his paper section by section, (the easiest way to review J)
  1. In the first section, the motivation and the basic method used are discussed respectively. 
    1. the goal is the construction of data warehouse (or materialized view) integrating several kinds of data sources, particularly for scientific applications in agriculture. Interesting, because it's a little similar to our goal in KADIS, the only difference is that our application is about archeology. 
    2. The background of the application is that: distinct data sources may be maintained independently, and research on semantic data heterogeneity is focused, and an incremental and modularized approach is suggested by the paper to deal with the issues of data integration. 
  2. Information System Interoperability
    1. The paper suggests that the only way to reach interoperability is by publishing the interfaces, schemas and formats used for information exchange, making their semantics as explicit as possible, so that they can be properly handled by the cooperative systems. 
    2. Three viewpoints must be considered about the information systems' interoperability: application domain, conceptual design and software system technology. For each viewpoint, interoperability should be achieved.
  3. Data System Interoperability
    1. in this section, the definitions of centralized database system and heterogeneous database system are firstly presented.
    2. Two categories of approaches to enable integrated access to multiple physical databases: schema integration and the federated approach. 
    3. web database is also mentioned in the section, and the challenge of the querying Web database research is the construction of a unified and simple interface.
  4. Data Integration
    1. The basic procedure of data integration concerned here include the resolution of heterogeneity conflicts and transformations of source data to accommodate them in the integrated view.
    2. The kinds of data to be integrated and the heterogeneity conflicts should be firstly categorized. 
    3. The structure of data is discussed: the structured data and semi-structured data. 
    4. Data heterogeneity, or conflict is summarized. Two ways are used to define the data conflicts or data heterogeneity. 
      1. representational conflict and semantic conflict;
      2. based on the different levels of abstraction, such as instance, schema, data model. The conflicts can be classified as : data conflicts, schema conflicts, data versus schema conflicts, and data model conflicts. 
    5. Some proposals are brought up to solve these conflicts in the RDB and semi-structured data. Here some surveys need reading. 
    6. Another way to solve the conflict, is the construction of the standard to describe the semantics. (common semantics)
    7. A series of procedures are suggested in the paper to generate the unified view of heterogeneous data. It's inspiring for our work, I think.
  5. Building blocks to integrate data in cooperative systems
    1. In the section, the author describes the software framework, modules, and techniques that could be used to contribute the integrated data views.
  6. The semantic web
    1. A simple digest is presented here, giving us a general idea of the semantic Web standards and technologies. 
      1. Character Encoding + URI 
      2. XML + Schema
      3. RDF + RDFS
      4. Ontology
  7. Web services

Semantic Integration Research in the Database Community  A Brief Survey

This survey firstly presents the applications that require semantic integrations, and brings up difficulties during the integration process. Schema matching and data matching, which play very important influence in the semantic integration, are focused on in the following content. Finally, some other open issues are discussed. 
Applications mentioned in the survey include:
  1. schema integration: merging a set of given schemas into a single global schema. The basic procedure is match - merge.
  2. Translate data between multiple databases. Data coming from multiple sources must be transformed to data conforming to a single target schema.
  3. Data integration system: provide the user with a uniform query interface (mediated schema) to a multitude of data sources. It's very inspiring, because our work on KADIS will belong to this category.
  4. P2P, peer data management: allow peer to query and retrieve data directly from each other, without building the mediated schema.
  5. Model management : create tools for easily manipulating models of data. 
  6. Many factors increase the need for the applications above, such as the rapid development of the Internet, the widespread of adoption of XML as a standard syntax to share data, the growth of the Semantic Web etc. 

However, semantic integration is an extremely difficult problem, because of the challenges below:
  1. It's hard to infer the semantics of the involved elements, unless the creators of data, or exports are available.
  2. The clues used for matching schema elements are often unreliable. Sometimes, even the clues are not complete.
  3. The global nature of matching makes it costly to find the BEST matching elements in different sources.
  4. Depending upon the applications, the matching is often subjective. That means, the criteria of matching might changes with the changes of applications.

Next, the survey discuss the progress in schema matching and data matching respectively. Three aspects are concerned, they are:
  1. Matching techniques:
    1. Rule-based solutions: hand-crafted rules to match schemas (domain-independent, and domain-specific)
      1. Benefits: inexpensive; fast because it typically operates on schemas; work very well in certain types of applications; capture valuable user knowledge about the domain.
      2. Drawback: can not exploit data instances effectively; can not exploit the previous matching efforts to assist in the current matchings. 
    2. Learning-based solutions: neural network, Naive Bayes etc. are used in this category; external evidence, like past matches, is exploited; domain-specific schemas and matches, even the users are involved in this solution.
  2. Architecture of matching solutions: module-based multi-matcher architecture, each module exploits well a certain type of information to predict matches. 
    1. On the other hand, domain-specific information is taken as global information to be handled after the matching phase. 
    2. Related work in knowledge-intensive domains.
  3. Types of semantic matches:
    1. one-to-one match
    2. complex match : need domain knowledge; the correct complex match is often not the top-ranked match, but somewhere in the top few matches predicted.

Data matching is discussed in the following: this part is important, because the main goal of KADIS is to match and query data from multiple sources. The techniques used in data matching are similar to those in schema matching, even though its focus is on the tuple matching. 

Finally, the paper also discussed the open issues related to the whole semantic integration process, beyond matching schema and data. These issues include: 
  1. User interaction: the key point is how to reduce the user's burden during the user interaction with the system;
  2. Formal foundations:  develop formal semantics of matching and try to formally explain the mechanism of matching.
  3. Industrial strength schema matching: apply algorithms in real world setting. Help to understand better the applicability of current research and suggest future direction.
  4. Mapping maintenance: dynamically maintain the semantic mappings between different sources, because of the changes of the source data.
  5. Reasoning with imprecise matches on a large scale: use and evaluate the system where parts of the mapping always remain unverified and potentially incorrect because of the overwhelming size of the information.
  6. Schema integration: construct a global schema through matching and merging schemas from multiple data sources, which is about the high-level operations, like model management etc. 
  7. Data translation: elaborate matches into mappings, to enable the translations of queries and data across schemas;
  8. Peer-to-Peer data management: