- option for shading in rolling.classify()
- performance.measures() greatly improved
- supervised classifiers updated, to be compliant with cross-validation
- SVM output fixed
- bugs in rolling.classify() fixed
- bugs in load.corpus() causing codepages mismatches fixed
- general code cleanup
- perfom.svm() improved to work with R >4.0.0
- oppose() not restricted anymore to have at least 2 texts per set
- better color management in rolling.classify()
- CPU performance improvements
- fixes required by CRAN to meet R >3.6.3 requirements
- CPU performance improvements
- improvements in performance.measures()
- confusion matrices fixed
- oppose() update, to allow having just one text per set
- improvements in crossv(): confusion matrix fully operational
- new funcion performance.measures(), providing recall, precision, f1, etc.
- performance measures made available via classify()
- new function size.penalize() to assess minimal sample size
- extension to the generic plot() function, to plot size.penalize() results
- Unicode (UTF-8) made the default encoding, also for Windows
- check.encoding() and change.encoding() introduced
- GUI allows for changing the working directory with one click
- metadata handling through a dedicated variable
- {Steffen Pielström joins!}
- support for JCK (Japanese-Chinese-Korean) significantly improved
- a fix for exporting networks to Gephi ver. 0.9.2
- support for rmarkdown: stylo(), classify(), oppose()
- supports the following taggers: TaKIPI (for Polish), Alpino (Dutch)
- the Imposters method reimplemented, via the new function imposters()
- fine tuning the parameters of the Imposters method via imposters.optimize()
- Cosine Delta implemented and aviable via GUI
- Min-Max distance implemented
- Entropy distance implemented
- support for interactive network visualisations via stylo.network()
- corrected Spanish pronouns
- fixes in documentation
- countless minor fixes
- citation hint updated; to see the changes type: citation("stylo")
- the impostors method almost implemented, see help(perform.impostors)
- confusion table for supervised classification via classify()
- a separate funtion for cross-validation, see help(crossv)
- a significant change in SVM wrapper: the procedure automatically gets rid of the variables with all 0s in the training set
- the file inst/CITATION updated to meet recent CRAN requirements
- man files for perform.delta, perform.svm etc. updated: new executable examples added, so that one can perform a supervised test without any corpus
- perform.knn(), perform.svm() etc. improved, in order to handle custom vectors of classes provided by a user
- an improved output of the oppose() function
- significant performance improvement in make.table.of.frequencies()
- PCA values (rotation, explained variance, etc.) saved in final results
- the package 'stringi' involved to optimize n-gram computing
- three datasets added to the package
- data(novels), a collection of 9 novels by the Bronte sisters and Jane Austen (full text)
- data(galbraith), a table of frequencies of 26 novels by 5 authors, including Galbraith's "Cacoo's Calling"
- data(lee), a table of frequencies of 28 American novels by 8 authors, including the new novel by Harper Lee
- new version of make.table.of.frequencies(), which speeds up the tasks radically
- delete.markup(), delete.stop.words(), make.samples(), make.frequency.list(), txt.to.features(), txt.to.words.ext() remodelled so that can be applied to single texts and/or to corpora
- countless improvements in most of the functions
- UTF-8 issue in txt.to.words.ext() fixed, according to the CRAN's request
- support for Georgian
- plot size in rolling.classify() improved
- distance measure engine thoroughly restructured
- custom distance measures allowed
- cosine distance introduced
- new functions: dist.cosine(), dist.delta(), dist.argamon(), dist.eder(), dist.simple()
- extracting POS tags via the function parse.pos.tags()
- support for Coptic
- customizable graphs size in rolling.classify()
- custom graph filename
- integration with CLARIN-PL stylometric infrastructure
- non-ASCII chars in the source code neutralized (required by CRAN)
- random sampling substantially improved
- bug fixes: options for assign.plot.colors()
- bug fixes: 'start.at' parameter in stylo()
- bug fixes (mostly: colors on dendrograms)
- new sequential methods available: rolling SVM, rolling NSC, and rolling Delta
- bug in load.corpus.and.parse() fixed
- bug in rolling.delta() fixed
- network related bug in stylo() neutralized
- classification procedures as separate functions: perform.delta(), perform.svm(), perform.knn(), perform.naivebayes(), perform.nsc()
- classification output enhanced
- doc files for new functions added
- culling implemented as a separate function
- custom stop words deletion: delete.stop.words()
- a thoroughly re-written oppose() to use the same tokenizing, corpus loading, sampling etc. functions as stylo() and classify()
- zeta.chisquare(), zeta.craig(), and zeta.eder() derrived as separate functions
- gui.oppose() derrived as a separate function
- distinctive words visualization in oppose() improved
- draw.polygons derrived as a separate function (hidden to the end user, though)
- cross-validation in classify() improved
- fixed bug in cross-validation for naivebayes
- a very unpleasant bug in oppose() fixed: the preferred and avoided words were calculated using the I set only
- help files significatnly improved
- support for Unicode on Windows
- support for a few non Latin scripts
- experimental support for CJK (Chinese-Japanese-Korean)
- the function txt.to.words() remodelled
- loading corpus files improved
- printing variables on screen improved
- better class inheritance
- an issue with hclust and "ward", "ward.D" fixed
- man files extended and updated
- cross-validation in classify()
- lots of bugs fixed
- tSNE implemented
- preserve.case option
- more flexible function for splitting input text
- custom regular expressions to tokenize input texts
- support for external corpora or frequencies
- support for external set of features (e.g. frequent words)
- class "stylo.results" for formatting final results
- class "stylo.corpus" for formatting loaded corpora
- class "stylo.data" for formatting tables and vectors
- PCA coordinates piped to final results
- optional choice between relative/raw frequencies
- xml support improved (bug fixed)
- codepage bug in oppose() fixed
- CRAN-related issue with .Rbuildignore fixed
- network analysis support significantly improved
- improvements in man pages
- bug fixes, minor improvements
- different options for k-NN and SVM
- submitted to CRAN for the first time (!)
- batch mode improved
- several clustering algorithms available
- man pages revised and improved
- poster presentation at DH2013 (Lincoln, NE)
- minor improvements
- namespace issues solved
- documentation corrected (typos)
- arguments can be passed from command-line
- man pages cleaned and extended
- global variables abandoned
- innumerable minor improvements
- thousands of changes and improvements
- documentation improved and augmented
- stylo R package (un)officially released
- changes in names of some functions
- code cleaning, improvements, improvements, ...
- first prototype of an R package
- first attempt to port the stylo script into R package
- code OS-independent
- minor cleaning
- experimental support for network analysis (output to Gephi)
- bugs fixed
- added option to dump samples for closer post-analysis inspection
- customizable plot area, font size, etc.
- thoroughly rewritten code for margins assignment
- scatterplots represented either by points, or by labels, or by both (customizable label offset)
- saving the words (features) actually used
- saving the table of actually used frequencies
- new output/input extensions: optional custom list of files to be analyzed, saving distance table(s) to external files
- support for TXM Textometrie Project
- color cluster analysis graphs (at last!)
- code revised, cleaned, bugs fixed
- added 2 new PCA visualization flavors
- new GUI written
- added functionality for normal sampling
- support for Dutch added
- {Mike Kestemont joins!}
- option for choosing corpus files
- code cleaned; bugs fixed
- the core code rewritten
- I/II set division abandoned
- GUI remodeled
- GUI tooltips added
- different input formats supported (xml etc.)
- config options loaded from external file
- the code forked into (1) the Stylo script, supporting explanatory analyses (MDS, Cons. Trees, ...), (2) the Classify script for machine-learning methods (Delta, SVM, NSC, Bayes)
- feature selection (word and character n-grams)
- three ways of splitting words in English
- bugs fixed
- GUI code rearranged and simplified
- better output
- better text files uploading
- new options for culling and ranking of candidates
- the official world-premiere, at DH2011 (Stanford, CA)
- the code simplified; minor cleaning
- uploading wordlist from external source
- thousands of improvements
- the code simplified
- skip top frequency words option added
- better graphs
- attempt at better graph layout
- more graphic options
- dozens of improvements
- module for color graphs
- module for PCA
- module for uploading corpus files improved
- the core code simplified and improved (faster!)
- reordered GUI
- minor cleaning
- the z-scores module improved
- better counter of "good guesses"
- option for randomly generated samples
- minor improvements
- platform-independent outputfile saving
- GUI thoroughly integrated with initial variables
- corrected MFW display in graph
- more analysis description in outputfile
- auto graphs for MSD and CA
- remodeled GUI
- GUI: radiobuttons, checkbuttons
- language-determined pronoun selection
- dialog box (GUI)
- {Jan Rybicki joins!}
- module for different distance measures
- thousands of improvements (I/O, interface, etc.)
- numerous little improvements
- deleting pronouns
- module for culling
- module for bootstrapping
- module for uploading plain text files
- innumerable improvements
- the code simplified
- {this version was completed on a train from Leipzig to Krakow (a looong trip...), after a very successful R course taught by Stefen Gries at ESU "C&T", Leipzig, Germany (26-31/08/2009)}
- loop for different MFW settings
- some bash and awk scripts translated into R