OntoFlex is a windows based tool for ontology construction, maintenance, and foreign language document search. It supports translation needs in dialects or languages where no machine translation (MT) resources exist. It is meant to be tailorable in the field by Analyst/Translator pairs. It has been designed to perform foreign language document “triage” ( separate out from a large group of documents those which should be fully translated based on the tactical situation at the time). It uses an ontology of English terms and phrases with high quality translations into foreign languages to search the repository of foreign language documents.
The structure of the ontology is constructed as a hyperlinked, a-cyclic, directed graph.
The graph includes "Is-a", "Peer-to" and other links between terms and phrases. The metadata in the ontology consists of links between terms and data associated with the links and terms/phrases.
- Phrase Matching
- Word Matching
- Ranking
- File Reading
- Results Output
- Read from RSS/ATOM
- Read from a list of web-sites
- CLI
- CLI API discovery
- CLI Update Progress (back to front-end)
- Chunk Matching
- GUI; Uses the existing .NET UI as a front-end.
- Web/Tomcat
- Hadoop (see hadoop.md); proof-of-conecpt, not end-to-end yet.
- Swing/UI
mvn clean install
- Tika Project 1.14
- Commons Lang 3.5
- Commons IO 2.5
- Commons Collections 4.1
- Jackson JSON 2.8.8 - referenced, unused.
- Commons Beansutils 1.9.3
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
- Swing/UI layout
- FX web browser for displaying results as HTML.
- Hadoop