Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Energy score estimators #29

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

simon-hirsch
Copy link
Contributor

Hi @sallen12 this is for #25, early-stage work in progress but I'm happy about early feedback.

The gufuncs look good in my opinion, the backend based functions are a bit harder. For numpy, it's quite straight-forward to just np.permutation or to subset by a sampled index. For the other backends, I'm not too sure what is the appropriate way to to shuffle the ensemble members once. Do you got an idea here?

@frazane
Copy link
Owner

frazane commented May 31, 2024

I would put the permutation under the public API function so it's more transparent, and since it can be quite expensive I would also give the option to skip it. Also, I think using np.random.shuffle (with the transposed array if we want it applied to the last dimension) should be more performant because it operates in-place without copies, but it's something worth testing with a micro benchmark.

We need to add shuffle to the backends 👍

@simon-hirsch
Copy link
Contributor Author

simon-hirsch commented Jun 3, 2024

Adding it to the backend seems like a good option. Will have a look at shuffle vs permutation, but also how torch and tf are implementing these. We could also have something like:

B.shuffle(array, axis, inplace=False)

so that we avoid any confusion about in-place operations and not-in-place operations.

Won't find time this week, but the week after 👍🏽

@simon-hirsch
Copy link
Contributor Author

So, roll is quite straight-forward, but shuffle is actually tricky because we need to think about setting a seed.

We can set the seed in the function call, i.e. shuffle(x, axis, seed=123) or somewhere in the backend, i.e. as self.seed=123 and then use it from there - I'm slightly inclined to have it in the function call (more explicit and usually you evaluate forecasts only once), but it would get tricky at the point a function is called more than once. @frazane what do you think?

I acknowledge that this current version is somewhat inconsistent because I did jax last. All other packages allowed me to ignore the issue in this first draft :D

@frazane
Copy link
Owner

frazane commented Jun 18, 2024

I would also be in favor of having the seed in the function call.

@simon-hirsch
Copy link
Contributor Author

Hi there, here we go:

  • Move setting the seed to backend.shuffle(..., seed)
  • Adjust backend and gufunc scoring rules
  • Add a few sentences in the docs
  • Add a simple test, that assures that the iid estimator is <10% worse than full sample and k-band estimator with $k=100$ is less than 5% worse than full sample. I admit that these boundaries are fully made up on the fly.

I think we're ready to roll - feedback and criticism welcome.

PS: Actually, the last point got me thinking. At some point, showing the speed to estimation precision trade-off might be an interesting case study for an example notebook.

@simon-hirsch
Copy link
Contributor Author

Hi @sallen12 @frazane, any feedback on this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants