Topic-Model-Experiment-Tookit

Tookit for text analysis using stm. The tools including dependency extractors that extract subject-verb-object pairs from texts.

Dependencies

Create six directories named "S", "V", "O", "SV", "VO", and "SO" under classpath.
Create a directory under classpath and put all text files in it.
Execute dependency generator, it takes the name of source directory as command line argument.
Run Dependency generator, and dependency pairs will be generated under those 6 directories. For example, if the source directory named "courtDoc", run:
```
java DependencyGenerator courtDoc
```

Run LDACMatrix to create term-document matrix and vocab for stm. LDACMatrix takes one argument, the base directory that contains those 6 directories. Matrix matXXX.dac and vocabXXX will be created under classpath.
Modify Experiment.sh under stmData to call R script (you may also call this function directly in R). For example, if the matrix name is matS.ldac and the vocab file name is vocabS:
```
RScript ./Experiment.R S
```

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitattributes		.gitattributes
README.md		README.md
chinese.properties		chinese.properties
pom.xml		pom.xml