Re:Reviving Semantic Relationships in Beagle



"The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes [XML] and RDF's flexible approach to representing data [RDF Concepts]. The next element required for the Semantic Web is a web ontology language which can formally describe the semantics of classes and properties used in web documents. In order for machines to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema [RDF Vocabulary]."
http://www.w3.org/TR/webont-req/

Lucene is keyword counting, optimized on the assumption that human vocabulary is a finite size. Keywords are optimal for free text searches where the syntax of the data is unknown. Further limiting the vocabulary space in a specific ontology would shrink the search space and should increase the lookup speed.

With syntax you can represent "facts" about objects.

The semantic web emphasizes the use of syntax over free text keywords.
RDF is a triple of relational data.
MYSQL is optimized for storing relational data on a file-system.
Almost all projects that operate on large graphs use MYSQL. (Beagle++, Jena)

Beagle claims that its stack is essentially equivalent to a database.
Just as other projects have built standard semantic web interfaces to its internal representation of data, Beagle would need to build such an interface. This would be a RDF store. The semantic web vision is for this interface to be a standard format, most likely RDF.

On the search end of things:
Beagle's Xesam is a lite version of the SPARQL query language.
SPARQL is a SQL query language for searching RDF graphs.
SPARQL matches triple relations, which is broader than keywords.

LARQ is a project to combine Lucene and SPARQL.
http://jena.sourceforge.net/ARQ/lucene-arq.html

OWL adds some commonsense about types of relations, to help automated inference of RDF graphs.

An ontology is a description of a technical language. Commonsense is required to infer across ontologies. A well defined ontology facilitates this inference by removing ambiquity.

To use an OWL internal representation, each Beagle-filter would have to not only tag a wordcout with a keyword, but it would have to define the meaning of those keywords via an ontology description of its concept relations within that Filters domain.

For futher commonsense data:
Wordnet and Opencyc are available in OWL format.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]