Skip to content

Latest commit

 

History

History
47 lines (37 loc) · 2.3 KB

README.md

File metadata and controls

47 lines (37 loc) · 2.3 KB

OntoFlex ported to Java from C#/.NET

OntoFlex is a windows based tool for ontology construction, maintenance, and foreign language document search. It supports translation needs in dialects or languages where no machine translation (MT) resources exist. It is meant to be tailorable in the field by Analyst/Translator pairs. It has been designed to perform foreign language document “triage” ( separate out from a large group of documents those which should be fully translated based on the tactical situation at the time). It uses an ontology of English terms and phrases with high quality translations into foreign languages to search the repository of foreign language documents.

The structure of the ontology is constructed as a hyperlinked, a-cyclic, directed graph.

The graph includes "Is-a", "Peer-to" and other links between terms and phrases. The metadata in the ontology consists of links between terms and data associated with the links and terms/phrases.

What works:

  • Phrase Matching
  • Word Matching
  • Ranking
  • File Reading
  • Results Output
  • Read from RSS/ATOM
  • Read from a list of web-sites
  • CLI
  • CLI API discovery
  • CLI Update Progress (back to front-end)

What doesn't work:

  • Chunk Matching
  • GUI; Uses the existing .NET UI as a front-end.

To Do:

  • Web/Tomcat
  • Hadoop (see hadoop.md); proof-of-conecpt, not end-to-end yet.
  • Swing/UI

Complile:

mvn clean install

Dependencies (Automatically Handled by Maven):

Tika Project

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Proof of Concept

  • Swing/UI layout
  • FX web browser for displaying results as HTML.
  • Hadoop