Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics ideas for master's thesis #63

Open
jonkri opened this issue Oct 3, 2024 · 3 comments
Open

Statistics ideas for master's thesis #63

jonkri opened this issue Oct 3, 2024 · 3 comments

Comments

@jonkri
Copy link

jonkri commented Oct 3, 2024

I'm about to do a master's thesis in Software Engineering. I would like to apply (Bayesian) statistics and, ideally, conduct some kind of experiment. I posted a message on Haskell-Cafe about it yesterday. I have also asked the Hackage administrator to see if I could have access to the Hackage metadata.

I was wondering if you have any suggestions for statistical questions that I could look into that would be of interest from a PVP point of view, for example some kind of analysis related to dependencies or breakages.

Thanks!

@jonkri jonkri changed the title PVP statistics ideas for master's thesis Statistics ideas for master's thesis Oct 3, 2024
@hasufell
Copy link
Member

hasufell commented Oct 3, 2024

I think it would be interesting to know:

  • how many maintainers violate PVP (e.g. missed a major bump despite API breaking changes)... also mind the corner case of re-exporting other packages API, which is a disaster in its own
  • how many maintainers do lazy major bumps although there was no API breakage (I'm looking at you @michaelpj 😁)
  • come up with some vague estimations about man-hours spent on updating one's package for one dependency (major bump) and then calculate the total amount of man-hours wasted in the entire ecosystem per, say, year (bonus points if you include GHC)
  • what are the most common bump patterns (for both major and minor)
  • what do people use the 4th and 5th etc. version components for

All the things I proposed kinda require to also have an understanding of the API of the package, not just the metadata.

I'm not sure that's within your scope. But it can be done statically.

@jonkri
Copy link
Author

jonkri commented Oct 3, 2024

Very interesting! Thank you, @hasufell!

I wonder what would be a good way of determining the API of packages. 🤔 Could GHCi's :browse (with and without *) command suffice, perhaps? Or would I need to dig deeper, perhaps getting into parsing .hi files?

@ulysses4ever
Copy link

@jonkri at the Cabal project, we are looking into API checking to ensure PVP based on https://github.com/Kleidukos/print-api This package is in early development, so be warned :-) There are downsides to it (but they're probably inherent to any tool based on GHC API), which you can read about here: haskell/cabal#10259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants