JOSS review #21

gdalle · 2024-11-10T08:25:32Z

Hi, and congrats on your JOSS submission! I'm one of your reviewers and I'll be writing my remarks below as the review progresses. You can find my checklist here.

Paper

The paper is longer than 1000 words, which is the recommended limit. However I don't think shortening it would be helpful, because it does contain useful mathematical background.
There are missing and invalid DOIs among the references.
L15: The right citation for the Julia language is usually the SIAM Review paper, not the Arxiv preprint you cited.
L30-32: How do the following ingredients come into play inside your package? This is not at all explained in the paper.
- symbolic differentiation
- optimization
- numerically stable computation
L30-32: Why do you use symbolic differentiation (with Symbolics.jl) and not algorithmic differentiation (e.g. with Zygote.jl or Enzyme.jl)?
L32: The citation you picked for "numerically stable computation" does not seem related to Julia but to R? In fact, I would assume that this third ingredient is also widely available in other languages as well.
L33: What kind of benchmarks allow you to state that "ExpFamilyPCA.jl delivers speed"?
L54: It would be useful to define the exponential family in general (as well as its natural and mean parameter) before diving into the details of the link function.
L61: The link called "appendix" points to a documentation page, which may become out of date if you change the structure of your docs. The same goes for other external links in the paper.
L81: Is it standard to introduce regularization around a $\mu_0$, or did you suggest it yourselves? How do you pick $\mu_0$ in practice?
L88: Which figure do you recreate? Can you give more details on what these "belief profiles" represent?
L88: Is the figure recreation reproducible?
L91: "PCA struggles even with 10 basis" is missing the word "components".
L104: It would help to clarify the types of the objects that fit!, compress and decompress work with.

The text was updated successfully, but these errors were encountered:

gdalle · 2024-11-10T08:35:01Z

Code

You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.
You may want to turn your example from the index into a README test to make sure it doesn't get outdated.
It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.
Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.
Are there any benchmarks of your package against competitors?
For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.
The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?
It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?
Different results for two consecutive calls to fit! on the same data #22

gdalle · 2024-11-10T09:11:09Z

Documentation

You may want to turn your example from the home page (as well as any other bits of code in the docs) into Documenter doctests to make sure they don't get outdated.

Math

Bregman divergences

$f(\mu) = \nabla_\mu F(\mu)$ is the convex conjugate (defined later) of $F$

I'm not sure I understand. Isn't this just the gradient?

Similarly, we also have $\theta = f(\mu)$

How do $f$ and $F$ relate to the exponential family defined by $h$ and $G$? This is not yet specified.

The last line is equivalent to $B_F(x \Vert \mu)$ up to a constant

Why is that the case?

EPCA objectives

Recall from the introduction] that the regularized EPCA objective aims to minimize the following expression:

The hyperparameter $\mu$ was called $\mu_0$ in other places.

Given that $g$ is strictly increasing (as $G$, the conjugate of $F$, is strictly convex and differentiable), we can compute $g$ numerically.

You probably mean $g^{-1}$?

In summary, the EPCA objective function and the decompression function $g$ can be derived from various components.

It would be nice to add a table summing up how specific versions like BernoulliEPCA are defined (using one of the combinations 1 to 7).

Constructors

Gamma

A_upper: The upper bound for the matrix A, default is -eps().

Why is the default upper bound negative? Wouldn't typemax make more sense as a catch-all upper bound?

API documentation

It would be good to use doctests to prevent docstring examples from getting out of sync with the code.
The package DocStringExtensions.jl can also be helpful here.

metaprogramming::Bool: Enables metaprogramming for symbolic calculus conversions. Default is true.

This is not very clear to an average user, even to one familiar with autodiff.

FlyingWorkshop · 2024-11-18T22:23:56Z

FlyingWorkshop · 2024-11-20T00:09:36Z

Answering these questions, as I have time. Will continue to update this post as I answer more questions.

You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.

You may want to turn your example from the index into a README test to make sure it doesn't get outdated.

It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.

Agree to all the above, working on these.

Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.

Are there any benchmarks of your package against competitors?

No. Besides traditional PCA, there are no other EPCA implementations in Julia, so not sure if benchmarking would make sense for most of the distributions. That said, I did do some testing with MultivariateStats.jl's implementation of traditional PCA which is faster than our implementation of Gaussian EPCA. I haven't looked at their source code, but I suspect it's because they use the closed form solution for PCA whereas we use the same general iterative optimization procedure for Gaussian EPCA that we use for all EPCA objectives. I suspect it would be very hard (and not entirely the package's focus) to implement a faster version of PCA than MultivariateStats.jl's.

For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.

I did some crude benchmarking using BenchmarkTools while designing the optimization pipeline. Surprisingly, I found that higher order methods performed roughly the same as or much slower than NM.

The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?

There are in fact many ways to derive the EPCA objectives, more than the 7 ways I listed in the documentation. The 4 I ended up picking where the ones that I believed would be most useful and efficient in practice. EPCA1 through EPCA4 each represent an 'archetypal' specification of the EPCA objective. For example, EPCA1 represents the EPCA objective using $F$ and $g$. However, we can derive $g$ from $G$, so we can equally-well specify the EPCA objective from $F$ and $G$, but this specification is marginally slower than the former because it requires symbolic differentiation. Thus, $F$ and $g$ is a better 'archetypal' struct than $F$ and $G$. We can make similar arguments for the other numbered EPCA structs. In short, each numbered EPCA struct is a way to specify the EPCA objective that is quick in practice. The other methods housed in each file are just more more involved ways to specify the same archetype.

Let me know if this explanation makes sense. Happy to clarify.

It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?

Agree, looking into this.

Different results for two consecutive calls to fit! on the same data #22

Will try to add a test for this. Also saw your previous comment on changing similar to copy. That was a great catch and your analysis was very helpful; it's highly appreciated. Thank you.

gdalle · 2024-11-20T09:05:37Z

No. Besides traditional PCA, there are no other EPCA implementations in Julia, so not sure if benchmarking would make sense for most of the distributions. That said, I did do some testing with MultivariateStats.jl's implementation of traditional PCA which is faster than our implementation of Gaussian EPCA. I haven't looked at their source code, but I suspect it's because they use the closed form solution for PCA whereas we use the same general iterative optimization procedure for Gaussian EPCA that we use for all EPCA objectives. I suspect it would be very hard (and not entirely the package's focus) to implement a faster version of PCA than MultivariateStats.jl's.

Fair enough.

I did some crude benchmarking using BenchmarkTools while designing the optimization pipeline. Surprisingly, I found that higher order methods performed roughly the same as or much slower than NM.

That is rather surprising. I think it might have been tied to the way you compute gradients or Hessians. It is definitely something I can help with if you want to collaborate, but it is not a barrier to paper acceptance by any stretch.

In short, each numbered EPCA struct is a way to specify the EPCA objective that is quick in practice.

Maybe you could add a summary in the docs of which construction schemes (among the 7 you list) are fast and which are slightly slower?

FlyingWorkshop · 2024-12-06T22:33:05Z

@gdalle, thanks again for all the thorough feedback! I've just finished responding to the bullets.

FlyingWorkshop · 2024-12-06T22:37:02Z

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method. I believe our current test suite does cover similar constructions as the README though (see here).

Regarding codecov, I don't have the admin privileges in the SISL GitHub org needed to set up codecov.

gdalle · 2024-12-11T22:11:05Z

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method.

Regardless of whether the README is tested during CI, it seems weird to put code in a code example that people can't run. Can you maybe replace sample_from_poisson(...) with rand(Distributions.Poisson(lambda), ...)?

Regarding codecov, I don't have the admin privileges in the SISL GitHub org needed to set up codecov.

It's a shame, codecov is very useful to figure out if some parts of your code are untested. Can you ask one of the admins to set it up.

Since we don't have codecov yet, I looked at the tests to understand what they cover. Why do you run a full test on some distributions and only a smoke test on others?

gdalle · 2024-12-11T22:13:17Z

I don't think you answered any of my remarks on the documentation?

FlyingWorkshop · 2024-12-12T05:02:27Z

Here's the cleaned up documentation:

Math

Bregman Divergences

$f$ is indeed the ordinary gradient of $F$. Fixed the typo.
$\theta = f(\mu)$ is beautiful, but admittedly obtuse at first glance. The equality follows from symmetry in the dual spaces. Since $f$ inverts the link function $g$, applying $f$ to $\mu = g(\theta)$ yields $f(\mu) = \theta$. Added a remark/reference to the appendix section on why $f$ inverts $g$.
$f$ and $F$ (and also $g$ and $G$) are related to the exp family through convex conjugation and the Bregman loss. -- I've also rewritten the Bregman divergences section to hopefully be much clearer on all fronts.

EPCA Objectives

change $\mu$ to $\mu_0$
the decompression function is indeed $g$. This can be tricky (I had to double check myself), but it's easiest to see from the objective that $X \approx g(\Theta)$, so it is the proper function to recover our decompressed data.
I actually debated including a similar table to the one you propose in the original documentation but ultimately decided against it because the parametric EPCA models use performance hacks that don’t fit neatly into categories like EPCA1 vs EPCA2. Creating a similar table for EPCA objectives 1–7 is also tricky since (1) they’re theoretically equivalent, and (2) multiple objectives can work for the same distribution. I prefer not to prescribe which distributions align with which objective. That said, if you feel strongly about it, I’m happy to create a table.

Gamma Constructor

The gamma constructor was one of the trickiest to create b/c small instability in the solver kept yielding negative log errors. The solution was to widen the bound from theory (obvious in hindsight). The bound that worked for gammas was -eps(). I also created the NegativeDomain and PositiveDomain with more generous bounds so that people don't have to sorry about this for quick usage. My informal analysis also suggested that these bounds had a negligible impact on performance.

API Documentation

Added an admonition clarifying the metaprogramming flag.

FlyingWorkshop · 2024-12-12T05:14:35Z

It's a shame, codecov is very useful to figure out if some parts of your code are untested. Can you ask one of the admins to set it up.

Was able to add a codecov badge!

Since we don't have codecov yet, I looked at the tests to understand what they cover. Why do you run a full test on some distributions and only a smoke test on others?

This is a very insightful question and it was the largest roadblock while developing the package. The reason that not all distributions are tested under all objectives is b/c of domain errors. Some distributions that have objective functions with restricted domains are unfeasible/unstable to optimize under certain specifications, so we don't test every specification. However, we still perform a baseline sanity test smoke_test to make sure everything performs well even if they can't be tested under every specification.

FlyingWorkshop · 2024-12-16T05:18:48Z

@gdalle, I've also just pushed another update to improve the codecov :) was a very good idea to set it up!

FlyingWorkshop · 2025-01-05T23:20:04Z

Regarding the README test: this is a great idea, but the snippets currently use sample_from_poisson which is a generic helper that is not implemented, so cannot test the method.

Regardless of whether the README is tested during CI, it seems weird to put code in a code example that people can't run. Can you maybe replace sample_from_poisson(...) with rand(Distributions.Poisson(lambda), ...)?

@gdalle, I added a note to the README that should hopefully clear this up. I don't want to replace sample_from_poisson even if it is an unimplemented dummy function b/c I think it would harm code legibility.

gdalle · 2025-01-06T11:14:08Z

Thanks for the changes, I'm satisfied with those and will close the issue

This was referenced Nov 10, 2024

[REVIEW]: ExpFamilyPCA.jl: A Julia Package for Exponential Family Principal Component Analysis openjournals/joss-reviews#7403

Open

Miscellaneous improvements #23

Merged

FlyingWorkshop closed this as completed Dec 6, 2024

FlyingWorkshop reopened this Dec 11, 2024

gdalle closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JOSS review #21

JOSS review #21

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

FlyingWorkshop commented Nov 18, 2024 •

edited

Loading

FlyingWorkshop commented Nov 20, 2024

gdalle commented Nov 20, 2024

FlyingWorkshop commented Dec 6, 2024

FlyingWorkshop commented Dec 6, 2024

gdalle commented Dec 11, 2024

gdalle commented Dec 11, 2024

FlyingWorkshop commented Dec 12, 2024

FlyingWorkshop commented Dec 12, 2024 •

edited

Loading

FlyingWorkshop commented Dec 16, 2024

FlyingWorkshop commented Jan 5, 2025 •

edited

Loading

gdalle commented Jan 6, 2025

JOSS review #21

JOSS review #21

Comments

gdalle commented Nov 10, 2024 • edited Loading

Paper

gdalle commented Nov 10, 2024 • edited Loading

Code

gdalle commented Nov 10, 2024 • edited Loading

Documentation

Math

Constructors

API documentation

FlyingWorkshop commented Nov 18, 2024 • edited Loading

FlyingWorkshop commented Nov 20, 2024

gdalle commented Nov 20, 2024

FlyingWorkshop commented Dec 6, 2024

FlyingWorkshop commented Dec 6, 2024

gdalle commented Dec 11, 2024

gdalle commented Dec 11, 2024

FlyingWorkshop commented Dec 12, 2024

Math

Bregman Divergences

EPCA Objectives

Gamma Constructor

API Documentation

FlyingWorkshop commented Dec 12, 2024 • edited Loading

FlyingWorkshop commented Dec 16, 2024

FlyingWorkshop commented Jan 5, 2025 • edited Loading

gdalle commented Jan 6, 2025

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

FlyingWorkshop commented Nov 18, 2024 •

edited

Loading

FlyingWorkshop commented Dec 12, 2024 •

edited

Loading

FlyingWorkshop commented Jan 5, 2025 •

edited

Loading