-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fitting overhaul #305
fitting overhaul #305
Changes from 1 commit
d16b6ad
41dccc6
815f6c3
5973bef
d51fd7f
7b6f536
5838636
2ce7830
a7d19ee
6ed7271
050aaec
5e1484f
ca2daba
5393a03
7062548
1beaf6b
6fa674a
6e587a6
5478678
8fc2938
fb47dde
7261abe
15702b4
9548134
03c9b36
f3d9222
f2266d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
import functools | ||
import collections | ||
from typing import Dict, List, Optional, Tuple, Union | ||
from scipy import optimize | ||
from scipy import optimize, stats | ||
|
||
__all__ = ['FitBase'] | ||
|
||
|
@@ -39,9 +39,9 @@ def __init__( | |
:param initial_values: dictionary specifying initial parameter values to use in | ||
the fit. These override the values found by :meth estimate_parameters: | ||
""" | ||
self._x = x | ||
self._y = y | ||
self._y_err = y_err | ||
self._x = None if x is None else np.asarray(x) | ||
self._y = None if x is None else np.asarray(y) | ||
self._y_err = None if y_err is None else np.asarray(y_err) | ||
|
||
self._validate_param_names(fixed_params) | ||
if fixed_params is not None: | ||
|
@@ -171,6 +171,31 @@ def residuals(self): | |
"""Returns the fit residuals.""" | ||
return self._y - self._func_free(self._x, self._p) | ||
|
||
@property | ||
def chi2(self) -> Tuple[float, float]: | ||
""" Chi-Squared test. | ||
|
||
Assumes the values of `y` are independent and normally distributed. | ||
Returns the tuple (Chi-Squared, p-value) | ||
Comment on lines
+178
to
+179
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't had a closer look at this at all, but it might be valuable to offer some guidance here to users regarding the fact that our data is pretty much never actually normally distributed when working with ion readout data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mmm...I'm not sure that I have much intelligent to say here (but that probably mainly speaks to my limited knowledge of statistics) beyond "if the errors are well-approximated by normal then the Chi2 p-value is probably a decent estimator for the fit significance." A couple of other comments here:
|
||
|
||
The the p-value is the probability that the observed deviations from the fit | ||
could be observed by chance assuming the data is normally distributed (values | ||
close to unity indicate high-quality fits). | ||
""" | ||
if self._y_err is None: | ||
raise ValueError("Cannot calculate chi square without knowing y_err") | ||
|
||
n = len(self._x) - len(self._free_params) | ||
if n < 1: | ||
raise ValueError("Cannot calculate chi squared with " | ||
f"{len(self._free_params)} fit parameters and only " | ||
f"{len(self._x)} data points.") | ||
|
||
y_fit = self.evaluate(self._x)[1] | ||
chi_2 = np.sum(np.power((self._y - y_fit) / self._y_err, 2)) | ||
p = stats.chi2.sf(chi_2, n) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Todo: check the standard definition of |
||
return chi_2, p | ||
|
||
def _calculate_derived_params(self): | ||
""" | ||
Updates fit results with values and uncertainties for derived parameters. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 Potentially contentious... 🚨
If no analysis identifier is given we attempt to generate a unique identifier by mangling together the fit type with the result channels (there is more info we could in principle add here -- even the entire fit description -- but this is pretty good). The generated identifier mangles the names of the result channels fed into the fit (in practice this means that the channels fed into
y
andy_err
since only 1D fits are supported at present).Result channels are given a name in the fragment they are originally defined, but can be renamed in the context they are used in (e.g. subscans rename result channels by appending a prefix to avoid collisions). In the original code, the local names within the scan context were used which seems sensible. However, a side effect here is that if the user doesn't explicitly give us an analysis identifier we cannot generate one until
describe_online_analyses
.This commit introduces a change in behaviour by generating the analysis identifier using the original channel names. The motivation was that (as far as I can tell, but I may be wrong) the result channels need to be defined in the
DefaultAnalysis.__init__
method which does not have access to the scan context. I wanted to generatep
andchi2
result channels for the analysis and it seemed sensible to name them based on the analysis identifier, which then required a change in behaviour.The new behaviour is more likely to cause an accidental collision, but it's not obvious to me how much this is a practical concern (what real use-case would this affect?). So, I think it's fine, but I also haven't got my head fully around the details so may be missing something
NB in many cases the result channels defined in the original fragment will just be the readout
p
andp_err
so there isn't much value in mangling that into the name since they don't tell us much...In any case, the tests now pass, but that may say more about the test coverage than the correctness of the design here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hindsight, a better design here would probably have been to have used an
ExportedResult
to exportchi2
orp
rather than the implementation here. That way we could revert all of the changes aroundanalysis_identifier
. It's also a bit more explicit and more consistent with the rest of the interface. It requires a little bit of special casing inDefaultAnalysis
to handle, but is probably the better design overall. TODO!nit pick: would
FitResult
be a more descriptive name thanExportedResult
?