Qyan's Brave New World

Semantic Heterogeneity in Global Information System The Role of Metadata, Context and Ontologies

The concentration of this paper is about the approaches to tackle the semantic heterogeneity problem in the context of Global Information Systems (GIS). In the beginning, I didn't think it's so useful for me as a survey, because it looked normal and hardly got much. It was proved in the end, although some content mentioned in the paper is helpful for our research. Now let's take a close look at the survey paper.

1. In the beginning of this paper, the author defines the necessity of approaches to the semantic heterogeneity. The problem of semantic heterogeneity is defined as identification of semantically related objects in different databases and the resolution of schematic differences among them. The approaches brought up in the paper has the key objective to reduce the problem of knowing the contents and the structure of each of the huge number of information repositories to the significantly smaller problem of knowing the contents of the domain specific ontologies, which a user familiar with the domain is likely to know or easily understand.

2. What is Metadata?

The basic idea of the paper is to construct metadata from the original Database, Metadata is data or information about data. In the paper, it's categorized as follows:

Content independent meta-data, doesn't depend on the content of the document with which it is associated such information as location, modification-data etc.
Content dependent metadata, something to deal with the content of the document it is associated, like the size of the document, number of rows etc. This category can be divided into smaller classes:

Direct content-based metadata: based directly on the content; examples include full-text indices, inverted tree, document vectors etc.
Content-descriptive metadata: describe the contents without direct utilization of the contents of the document. Furthermore, it could be divided into:

Domain independent metadata: capture information present in the document independent of the application or subject domain of the information. HTML document type definition is a good example in this category.
Domain specific metadata: described in a manner specific to the application or subject domain of information. Thus issues of vocabulary is important, because terms have to be chosen in a domain specific manner.

According to the paper, the domain specific metadata is more important to deal with the issues related to semantic heterogeneity, since it could capture information in a specific application or a domain. And it could be viewed as a vocabulary of terms for construction of more domain specific metadata descriptions.

In the paper, two kinds of meta-data are introduced, including metadata contexts, and conceptual context.

Metadata contexts mainly abstracts the representational details;
Conceptual contexts captures domain knowledge. Together with the structured data, the paper uses this metadata to capture the information content.

3. Constructing c-contexts from ontological terms.

The method used in the paper is to use terms from domain specific ontologies as the vocabulary to characterize the information. In other words, the meta-data of a document could be represented as a vector of attribute-value pairs. Based on this definition, the mechanism of reasoning and manipulation is also discussed. In order to map contextual descriptions to the Database schema, a set of projection rules are defined and used.

Finally, some issues about language and ontology in context representation are presented.

About the value of context attributes:

Context here is a collection of contextual coordinates and their values. There are some basic requirements for the definition of these values. (1) declarative in nature helping to perform inferences on the context. (2) express the context as a collection of contextual coordinates, each describing a specific aspect of information present in the database or requested by a query, which is consistent with the representation of the c-context. (3) have primitives in the model world and ontology.

About the ontology: The scalability of the ontology is more concerned at this step. The paper tries to present two approaches to the combination of various ontologies.

The common one

build an extensive global ontology.
exploit the semanties of a single problem domain.

Re-use of existing ontologies/classifications

combining different existing ontologies, but some problems should be noticed, like the overlap between different ontologies.

4. Semantic interoperability using terminological relationships, like synonyms, hyponyms, hypernyms.

When discussing the synonyms to interoperate across ontologies, the paper used OBSERVER as an example.

An architecture for interoperation is presented. It's mainly composed of query processor, ontology server, interontologies relationships manager (IRM) and ontologies. (This structure may be useful for our work. At least, we could make a comparison between it and our own architecture. In some sense, I don't think it could work very well, because the component IRM assumes too much work, and it's complex enough to intimidate any designer, although IRM helps a lot for the scalability of the whole procedure. )

Using hyponyms and hypernyms to interoperate across ontologies: this section seems very useful because it describes the problems we have in our project. In reality, hierarchical relationships like hypernyms, hyponyms are more common than synonym ones. The solution introduced by the paper is to substitute a non-translated term by the intersection of its immediate parents or the union of its immediate children. The point is that synonyms, hyponyms and hypernyms should be firstly identified inside and between the user and target ontologies.

Qyan's Brave New World

Tuesday, December 05, 2006

No comments:

Contributors

Links

Blog Archive

Follow my heart ...