![]() |
Web Intelligence Technologies, Natural Language Processing Group, Department of Computer Science, University of Sheffield |
The Armadillo Architecture is a knowledge mining system used to extract information from several sources. It is designed to be generic enough to cater for most domains. The underlying idea is to have a system based around Semantic Web Services (SWS). By this we are referring to a system whereby its underlying functions are distributed in an environment (normally the web) and these functions accept an input and give some form of output.
We also go a step further by specifying that they must be semantically enabled, i.e. the inputs and outputs are semantically typed and therefore can refer to anything (being a concrete object, an abstract concept, etc.) that has some sort of meaning.
The task of the system is to obtain some information as input regarding a particular domain and give back to the user a structured view of all semantic relations branching from the imputed data.
In the first step in the loop, possible annotations from a document are identified using an existing lexicon (e.g. the one associated to the ontology).
These are just potential annotations and must be confirmed using some strategies (e.g. disambiguation or multiple evidence). Then other annotations not provided by the lexicon are identified e.g. by learning from the context in which the known ones were identified.
All new annotations must be confirmed and can be used to learn some new ones as well. They will then become part of the lexicon.
Finally all annotations are integrated (e.g. some entities are merged) and stored into a database.
Armadillo employs the following methodologies:
– Adaptive Information Extraction from texts (IE): used for spotting information and to further learning new instances.
– Information Integration (II): used to (1) discover an initial set of information to be used to seed learning for IE and (2) to confirm the newly acquired (extracted) information, e.g. using multiple evidence from different sources. For example, a new piece of information is confirmed if it is found in different (linguistic or semantic) contexts.
– Web Services: the architecture is based on the concept of ”services”. Each service is associated to some part of the ontology (e.g. a set of concepts and/or relations) and works in an independent way. Each service can use other services (including external ones) for performing some sub-tasks. For example a service for recognizing researchers names in a University Web Site will use a Named Entity Recognition system as a sub-service that will recognise potential names (i.e. generic people’s names) to be confirmed using some internal strategies as real researchers names (e.g. as opposed to secretaries’ names).
– RDF repository: where the extracted information is stored and the link with the pages is maintained.
