Skip to content
/ aks Public

aks is a utility for extracting n-grams from texts

Notifications You must be signed in to change notification settings

handyc/aks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aks

This program is a utility for extracting n-grams from texts. It extracts every contiguous string from a collection of texts, from length 1 up to a maximum length determined by the user. The included scripts then perform sorting routines on the n-gram files to determine which strings occur most frequently. This method is especially useful for texts composed in languages that do not feature orthographic spacing between individual words.

usage:

./aks [language] [maximum n value] [source directory]

./processmasters [maximum n value] [source directory]

examples:

aks tibetan_roman 32 /home/handyc/texts

aks tibetan_uchen 32 /home/handyc/texts

aks chinese 32 /home/handyc/texts

aks sanskrit_unicode 32 /home/handyc/texts

You may need to change permissions on the scripts in order to allow yourself to run them.

The best way to reach me currently is through my SDF email account.

About

aks is a utility for extracting n-grams from texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published