Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review #21

Closed
gdalle opened this issue Nov 10, 2024 · 14 comments
Closed

JOSS review #21

gdalle opened this issue Nov 10, 2024 · 14 comments

Comments

@gdalle
Copy link
Contributor

gdalle commented Nov 10, 2024

Hi, and congrats on your JOSS submission! I'm one of your reviewers and I'll be writing my remarks below as the review progresses. You can find my checklist here.

Paper

  • The paper is longer than 1000 words, which is the recommended limit. However I don't think shortening it would be helpful, because it does contain useful mathematical background.
  • There are missing and invalid DOIs among the references.
  • L15: The right citation for the Julia language is usually the SIAM Review paper, not the Arxiv preprint you cited.
  • L30-32: How do the following ingredients come into play inside your package? This is not at all explained in the paper.
    • symbolic differentiation
    • optimization
    • numerically stable computation
  • L30-32: Why do you use symbolic differentiation (with Symbolics.jl) and not algorithmic differentiation (e.g. with Zygote.jl or Enzyme.jl)?
  • L32: The citation you picked for "numerically stable computation" does not seem related to Julia but to R? In fact, I would assume that this third ingredient is also widely available in other languages as well.
  • L33: What kind of benchmarks allow you to state that "ExpFamilyPCA.jl delivers speed"?
  • L54: It would be useful to define the exponential family in general (as well as its natural and mean parameter) before diving into the details of the link function.
  • L61: The link called "appendix" points to a documentation page, which may become out of date if you change the structure of your docs. The same goes for other external links in the paper.
  • L81: Is it standard to introduce regularization around a $\mu_0$, or did you suggest it yourselves? How do you pick $\mu_0$ in practice?
  • L88: Which figure do you recreate? Can you give more details on what these "belief profiles" represent?
  • L88: Is the figure recreation reproducible?
  • L91: "PCA struggles even with 10 basis" is missing the word "components".
  • L104: It would help to clarify the types of the objects that fit!, compress and decompress work with.
@gdalle
Copy link
Contributor Author

gdalle commented Nov 10, 2024

Code

  • You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.
  • You may want to turn your example from the index into a README test to make sure it doesn't get outdated.
  • It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.
  • Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.
  • Are there any benchmarks of your package against competitors?
  • For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.
  • The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?
  • It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?
  • Different results for two consecutive calls to fit! on the same data #22

@gdalle
Copy link
Contributor Author

gdalle commented Nov 10, 2024

Documentation

  • You may want to turn your example from the home page (as well as any other bits of code in the docs) into Documenter doctests to make sure they don't get outdated.

Math

Bregman divergences

$f(\mu) = \nabla_\mu F(\mu)$ is the convex conjugate (defined later) of $F$

I'm not sure I understand. Isn't this just the gradient?

Similarly, we also have $\theta = f(\mu)$

How do $f$ and $F$ relate to the exponential family defined by $h$ and $G$? This is not yet specified.

The last line is equivalent to $B_F(x \Vert \mu)$ up to a constant

Why is that the case?

EPCA objectives

Recall from the introduction] that the regularized EPCA objective aims to minimize the following expression:

The hyperparameter $\mu$ was called $\mu_0$ in other places.

Given that $g$ is strictly increasing (as $G$, the conjugate of $F$, is strictly convex and differentiable), we can compute $g$ numerically.

You probably mean $g^{-1}$?

In summary, the EPCA objective function and the decompression function $g$ can be derived from various components.

It would be nice to add a table summing up how specific versions like BernoulliEPCA are defined (using one of the combinations 1 to 7).

Constructors

Gamma

A_upper: The upper bound for the matrix A, default is -eps().

Why is the default upper bound negative? Wouldn't typemax make more sense as a catch-all upper bound?

API documentation

  • It would be good to use doctests to prevent docstring examples from getting out of sync with the code.
  • The package DocStringExtensions.jl can also be helpful here.

metaprogramming::Bool: Enables metaprogramming for symbolic calculus conversions. Default is true.

This is not very clear to an average user, even to one familiar with autodiff.

@FlyingWorkshop
Copy link
Collaborator

FlyingWorkshop commented Nov 18, 2024

Thank you again for all the incredibly detailed feedback @gdalle! Working on addressing the feedback as I go.

  • L15: Verify the citation for the Julia language; it should be the SIAM Review paper, not the Arxiv preprint.

  • L30-32: Explain how the following ingredients are integrated into your package: Added a footnote of explanation in the paper

    • Symbolic differentiation
    • Optimization
    • Numerically stable computation
  • L30-32: Justify the use of symbolic differentiation (with Symbolics.jl) instead of algorithmic differentiation (e.g., Zygote.jl or Enzyme.jl). Also addressed in the new footnote

  • L32: Review the citation for "numerically stable computation"; it appears to reference R instead of Julia and might be more broadly applicable to other languages. Also addressed in the new footnote

  • L33: Provide benchmarks to substantiate the claim that "ExpFamilyPCA.jl delivers speed." (responded in other comment)

  • L54: Define the exponential family in general, including its natural and mean parameters, before discussing the link function. (Added definition for the exponential family. The natural and mean parameters are discussed in the link function section).

  • L61: Update the "appendix" link to avoid pointing to documentation that may become outdated; ensure other external links are similarly reviewed. -- updated all the links to link to v1.1.0 documentation

  • L81: Clarify if introducing regularization around a specific $\mu_0$ is standard practice or your own contribution. Explain how $\mu_0$ is chosen in practice. (added clarification about regularization and some context about $\mu_0$).

  • L88: Provide more details about the figure recreation, including:

    • Which figure is being recreated. -- added clarification in a footnote
    • What the "belief profiles" represent -- a belief profile is a probability profile for a belief distribution. I updated the language to say "belief distribution" instead. I agree that this is clearer.
    • Whether the figure recreation is reproducible --- reproduction is possible here
  • L91: Add the missing word "components" to the sentence "PCA struggles even with 10 basis."

  • L104: Clarify the object types that fit!, compress, and decompress work with. -- added a comment in the snippet to clarify that X and Y are matrices of data. I think it's ok to leave it a comment rather than spelling it out explicitly b/c (1) Julia allows multiple dispatch so methods can support different signatures and (2) the documentation is more explicit (see here).

@FlyingWorkshop
Copy link
Collaborator

Answering these questions, as I have time. Will continue to update this post as I answer more questions.

  • You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.
  • You may want to turn your example from the index into a README test to make sure it doesn't get outdated.
  • It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.

Agree to all the above, working on these.

  • Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.
  • Are there any benchmarks of your package against competitors?

No. Besides traditional PCA, there are no other EPCA implementations in Julia, so not sure if benchmarking would make sense for most of the distributions. That said, I did do some testing with MultivariateStats.jl's implementation of traditional PCA which is faster than our implementation of Gaussian EPCA. I haven't looked at their source code, but I suspect it's because they use the closed form solution for PCA whereas we use the same general iterative optimization procedure for Gaussian EPCA that we use for all EPCA objectives. I suspect it would be very hard (and not entirely the package's focus) to implement a faster version of PCA than MultivariateStats.jl's.

  • For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.

I did some crude benchmarking using BenchmarkTools while designing the optimization pipeline. Surprisingly, I found that higher order methods performed roughly the same as or much slower than NM.

  • The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?

There are in fact many ways to derive the EPCA objectives, more than the 7 ways I listed in the documentation. The 4 I ended up picking where the ones that I believed would be most useful and efficient in practice. EPCA1 through EPCA4 each represent an 'archetypal' specification of the EPCA objective. For example, EPCA1 represents the EPCA objective using $F$ and $g$. However, we can derive $g$ from $G$, so we can equally-well specify the EPCA objective from $F$ and $G$, but this specification is marginally slower than the former because it requires symbolic differentiation. Thus, $F$ and $g$ is a better 'archetypal' struct than $F$ and $G$. We can make similar arguments for the other numbered EPCA structs. In short, each numbered EPCA struct is a way to specify the EPCA objective that is quick in practice. The other methods housed in each file are just more more involved ways to specify the same archetype.

Let me know if this explanation makes sense. Happy to clarify.

  • It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?

Agree, looking into this.

Will try to add a test for this. Also saw your previous comment on changing similar to copy. That was a great catch and your analysis was very helpful; it's highly appreciated. Thank you.

@gdalle
Copy link
Contributor Author

gdalle commented Nov 20, 2024

No. Besides traditional PCA, there are no other EPCA implementations in Julia, so not sure if benchmarking would make sense for most of the distributions. That said, I did do some testing with MultivariateStats.jl's implementation of traditional PCA which is faster than our implementation of Gaussian EPCA. I haven't looked at their source code, but I suspect it's because they use the closed form solution for PCA whereas we use the same general iterative optimization procedure for Gaussian EPCA that we use for all EPCA objectives. I suspect it would be very hard (and not entirely the package's focus) to implement a faster version of PCA than MultivariateStats.jl's.

Fair enough.

I did some crude benchmarking using BenchmarkTools while designing the optimization pipeline. Surprisingly, I found that higher order methods performed roughly the same as or much slower than NM.

That is rather surprising. I think it might have been tied to the way you compute gradients or Hessians. It is definitely something I can help with if you want to collaborate, but it is not a barrier to paper acceptance by any stretch.

In short, each numbered EPCA struct is a way to specify the EPCA objective that is quick in practice.

Maybe you could add a summary in the docs of which construction schemes (among the 7 you list) are fast and which are slightly slower?

@FlyingWorkshop
Copy link
Collaborator

@gdalle, thanks again for all the thorough feedback! I've just finished responding to the bullets.

@FlyingWorkshop
Copy link
Collaborator

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method. I believe our current test suite does cover similar constructions as the README though (see here).

Regarding codecov, I don't have the admin privileges in the SISL GitHub org needed to set up codecov.

@gdalle
Copy link
Contributor Author

gdalle commented Dec 11, 2024

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method.

Regardless of whether the README is tested during CI, it seems weird to put code in a code example that people can't run. Can you maybe replace sample_from_poisson(...) with rand(Distributions.Poisson(lambda), ...)?

Regarding codecov, I don't have the admin privileges in the SISL GitHub org needed to set up codecov.

It's a shame, codecov is very useful to figure out if some parts of your code are untested. Can you ask one of the admins to set it up.

Since we don't have codecov yet, I looked at the tests to understand what they cover. Why do you run a full test on some distributions and only a smoke test on others?

@gdalle
Copy link
Contributor Author

gdalle commented Dec 11, 2024

I don't think you answered any of my remarks on the documentation?

@FlyingWorkshop
Copy link
Collaborator

Here's the cleaned up documentation:

Math

Bregman Divergences

  • $f$ is indeed the ordinary gradient of $F$. Fixed the typo.
  • $\theta = f(\mu)$ is beautiful, but admittedly obtuse at first glance. The equality follows from symmetry in the dual spaces. Since $f$ inverts the link function $g$, applying $f$ to $\mu = g(\theta)$ yields $f(\mu) = \theta$. Added a remark/reference to the appendix section on why $f$ inverts $g$.
  • $f$ and $F$ (and also $g$ and $G$) are related to the exp family through convex conjugation and the Bregman loss. -- I've also rewritten the Bregman divergences section to hopefully be much clearer on all fronts.

EPCA Objectives

  • change $\mu$ to $\mu_0$
  • the decompression function is indeed $g$. This can be tricky (I had to double check myself), but it's easiest to see from the objective that $X \approx g(\Theta)$, so it is the proper function to recover our decompressed data.
  • I actually debated including a similar table to the one you propose in the original documentation but ultimately decided against it because the parametric EPCA models use performance hacks that don’t fit neatly into categories like EPCA1 vs EPCA2. Creating a similar table for EPCA objectives 1–7 is also tricky since (1) they’re theoretically equivalent, and (2) multiple objectives can work for the same distribution. I prefer not to prescribe which distributions align with which objective. That said, if you feel strongly about it, I’m happy to create a table.

Gamma Constructor

  • The gamma constructor was one of the trickiest to create b/c small instability in the solver kept yielding negative log errors. The solution was to widen the bound from theory (obvious in hindsight). The bound that worked for gammas was -eps(). I also created the NegativeDomain and PositiveDomain with more generous bounds so that people don't have to sorry about this for quick usage. My informal analysis also suggested that these bounds had a negligible impact on performance.

API Documentation

  • Added an admonition clarifying the metaprogramming flag.

@FlyingWorkshop
Copy link
Collaborator

FlyingWorkshop commented Dec 12, 2024

It's a shame, codecov is very useful to figure out if some parts of your code are untested. Can you ask one of the admins to set it up.

Was able to add a codecov badge!

Since we don't have codecov yet, I looked at the tests to understand what they cover. Why do you run a full test on some distributions and only a smoke test on others?

This is a very insightful question and it was the largest roadblock while developing the package. The reason that not all distributions are tested under all objectives is b/c of domain errors. Some distributions that have objective functions with restricted domains are unfeasible/unstable to optimize under certain specifications, so we don't test every specification. However, we still perform a baseline sanity test smoke_test to make sure everything performs well even if they can't be tested under every specification.

@FlyingWorkshop
Copy link
Collaborator

@gdalle, I've also just pushed another update to improve the codecov :) was a very good idea to set it up!

@FlyingWorkshop
Copy link
Collaborator

FlyingWorkshop commented Jan 5, 2025

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method.

Regardless of whether the README is tested during CI, it seems weird to put code in a code example that people can't run. Can you maybe replace sample_from_poisson(...) with rand(Distributions.Poisson(lambda), ...)?

@gdalle, I added a note to the README that should hopefully clear this up. I don't want to replace sample_from_poisson even if it is an unimplemented dummy function b/c I think it would harm code legibility.

@gdalle
Copy link
Contributor Author

gdalle commented Jan 6, 2025

Thanks for the changes, I'm satisfied with those and will close the issue

@gdalle gdalle closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants