This project consists of a Statistical purpose analysis on the Italian firms, it borns with an educational aim for the Statistical Methods for Data Science course(A.Y. 2017-2018) at the Università di Pisa.
We've been analyzing Italian firms by trying to answer very common claims in the economical/statistical field, that are:
- what's the measure of size that best describes the firms size;
- quantifying the correlation between different measures of the firms size;
- how are the firms sizes distributed;
- how is the firms growth distributed;
- is the mean growth statistically different from zero;
- is the growth distribution symmetric or asymmetric.
Then we've also tried to distinguish the behaviours within distinct subsamples of the whole dataset, such as: distinct subsectors, distinct years, distinct firms sizes.
All the analysis have been done with R (version 3.5.0).
To perform useful operations on our data we've used the dplyr package; for power law distribution we've used poweRlaw library. For plotting we've mostly used ggplot package. Any other needed package is listed in packages.txt file.
A brief description of the distinct directories and files you may find in this repository:
- the data directory contains RData files that refer to our original data.
- the files directory contains:
- distrResults which contains all the RData files for the results of fitted distributions on distinct (sub)samples;
- images which contains all the images of plotting, CIs etc.
- utils.R is an R file for very general utilities(eg: loading needed packages, loading datasets into current workspace);
- functions.R is an R script that contains several useful functions for analysis purposes;
- first_analysis.R contains a very general analysis on the whole dataset, eg: basic statistics of the distinct features;
- correlation.R contains correlation analysis and linear regression for Employee and Revenue attributes;
- test_distr.R contains all the analysis done for Size distribution of the firms;
- powerlaw.R has been written to further analyze the power law hypothesis on the firms size by using Employee attribute;
- growth_rate_dist.R and all the remaining files which name starts by "growth"(one for each (sub)sample) contain analysis on the growth of the italian firms;
- distributionResultsAnalysis contains the results that we've obtained and thus analyzed from files contained in "files/distrResults"
- packages.txt contains a list of the packages needed to perform the analysis.
For deeper and clearer explanations about the procedures and the results, please read our final report.