Perform distribution analysis on heavy-tailed distributed data
Raw data are expected to be positive integers. Maximum likelihood estimation (MLE) will be performed to fit the following models to the data:
- Exponential distribution
- Power-law distribution
- Power-law distribution with exponential cutoff
- Pairwise power-law distribution
- Poisson distribution
- Yule–Simon distribution
- Lognormal distribution
- Truncated lognormal distribution
- Shifted power-distribution with exponential cutoff
- Truncated shifted power-law distribution
An optimizer based on sequential least squares programming (SLSQP) is applied to maximize the likelihood function. (Initially, it was based on L-BFGS-B, but L-BFGS-B cannot handle inequality constraints, which are used to avoid overflow.)
The model with minimum AIC (or say the largest Akaike weight) will be selected as the best-fitted model.
The analysis mainly focuses on the tails, and the start of the tail will be determined through minimizing the K-S distance between fitted models and the empirical distribution.
Installation:
pip install heavytailed
or
conda install -c wangxiangwen heavytailed
Example Usage:
from heavytailed import compare
compare.comparison('testdata/raw_25_bets.dat', xmin=25)
The MLE could be a non-convex function, therefore it is suggested to try different initial values (for distribution parameters) to avoid local minima.
If you find this package useful in your publication, please kindly consider citing the following two articles:
- Wang, X., & Pleimling, M. (2017). Foraging patterns in online searches. Physical Review E, 95(3), 032145.
- Wang, X., & Pleimling, M. (2018). Behavior analysis of virtual-item gambling. Physical Review E, 98(1), 012126.