Bounds estimation for produced photons / electrons #66

JelleAalbers · 2020-03-23T09:15:35Z

If you set max_sigma high enough, flamedisx will give accurate results regardless of how bad the bounds estimation is. Empirically we found that max_sigma 5 is reasonable for our current bound estimation.

It would be good to redo the derivations of the bound estimation formulas, and see if we can improve them. This would improve our speed / accuracy profile.

JelleAalbers · 2020-08-08T14:56:15Z

Specifically, the problem is estimating the number of produced [photons/electrons] N given a known detection efficiency p and number of [photons/electrons] detected, k.

There is a stackexchange post on this here. The accepted answer suggest a procedure involving the negative binomial. Unless I made a mistake while implementing it, it has unacceptably high false exclusions (low coverage) for low N and p. This is for 99% intervals:

You could also imagine an oversimplified solution like this:

Start with the N for which the observed k is the most likely outcome Nm = k/p
Find the relative error on k-values, given a binomial with that N: relative_error = mu /sigma = Nm p / sqrt(Nm p (1-p)
Assume this is also the relative error of Nm; i.e. report the interval Nm * (1 +- max_sigma * relative_error) on N.
The performance for 99% intervals is similar to the stackoverflow result:

Currently, flamedisx uses a mysterious equation here, leading to an interval of Nm +- max_sigma * (q + (q**2 + 4 * Nm * q)**0.5)/2, with q=1/p. I'd have to dig up how this was derived; likely in a handwavy / semi-empirical / let's-add-factors-of-two-until-it-works way. However, it does seem to be sufficiently conservative for low p-values:

However, you can see the asymptote is no longer correct. For higher p-values the mystery interval is much too conservative -- the false exclusion rate is in the 10^-4 range around N=4, then drops even more. This makes flamedisx slower, but at least its values are not wrong.

For the moment, we could keep the mystery equation, but it is a bit embarrassing and makes things unnecessarily slow. Perhaps we could instead use one of the other motivated solutions, with an 'empirical fix' for the low p, low N regime? Finally, an option is to compute and cache exact confidence intervals using the Neyman construction in the low (N, p) regime. We'd have to do it for several p's and interpolate somehow. Fortunately, this is all in the numpy part of flamedisx, specifically the annotation stage, so we can be creative.

pelssers · 2020-08-10T14:00:26Z

We should definitely redo the derivation of the mystery formula, but there might be a simple fix in the meantime.
Setting q=(1-p)/p in the mystery formula (inspired by reading the stackoverflow post you linked) instead of the original q=1/p I get the below result:

JelleAalbers · 2021-08-09T04:27:49Z

Believe it or not, but exactly a year after I wrote the long post above, the derivation of the mystery equation has washed back up from the seas of time...

Let me try to translate, but remember it's a hand-wavy argument, though apparently somewhat successful:

A binomial(N, p) has variance / mean = (N p (1-p)) / N p = (1-p) = q
We're after the inverse problem, estimating N given an observed k. There are two issues:
A. For a given N, the possible k values are spread out;
B. Our best estimate of N (Nm = k/p) has an error (which we want to find out)
We know A gives a variance / mean of q, i.e. σ^2 = mean * q
To consider B, we replace mean -> mean + σ to find the +1 sigma bound to get σ^2 = (mean + σ) * q
Solving for σ, σ = 0.5 (q +- sqrt(q^2 + 4 mean q)); we can discard the - sign solution since σ must be positive.

This is indeed the 1 sigma bound used in the mystery equation, assuming I made a typo in coding up q=1/p instead of q=1-p. Why q = (1-p)/p = 1/p-1 worked better for @pelssers I have no idea; maybe q=1-p is even better ;-)

I also didn't consider that the bounds might be asymmetric. Repeating the exercise with mean -> mean - σ gives σ = 0.5 (-q +- sqrt(q^2 + 4 mean q)); clearly the + sign gives a positive σ, but one which is smaller than for the positive case.

CristianAntochi · 2021-08-09T09:20:36Z

Hi @JelleAalbers!
I think I encountered a similar problem in a textbook exercise when computing the 1sigma confidence belt for a binomial p. I think it makes sense. However with the toyMCs for that exercise I too got at some point q~1/p and I think it comes from an inversion related to Nm=k/p.

JelleAalbers added the enhancement New feature or request label Mar 23, 2020

JelleAalbers changed the title ~~Bounds estimation derivation and improvements~~ Bounds estimation for produced photons / electrons Aug 8, 2020

pelssers added a commit that referenced this issue Aug 11, 2020

Correct mystery equation (see #66)

4d3754c

pelssers mentioned this issue Aug 14, 2020

Bugfixes #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounds estimation for produced photons / electrons #66

Bounds estimation for produced photons / electrons #66

JelleAalbers commented Mar 23, 2020

JelleAalbers commented Aug 8, 2020

pelssers commented Aug 10, 2020

JelleAalbers commented Aug 9, 2021 •

edited

Loading

CristianAntochi commented Aug 9, 2021

Bounds estimation for produced photons / electrons #66

Bounds estimation for produced photons / electrons #66

Comments

JelleAalbers commented Mar 23, 2020

JelleAalbers commented Aug 8, 2020

pelssers commented Aug 10, 2020

JelleAalbers commented Aug 9, 2021 • edited Loading

CristianAntochi commented Aug 9, 2021

JelleAalbers commented Aug 9, 2021 •

edited

Loading