Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitting overhaul #305

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d16b6ad
add new ndscan.fitting module
hartytp Aug 8, 2022
41dccc6
use new fitting framework
hartytp Aug 8, 2022
815f6c3
NamedFit -> FitDescription, remove the Analysis base class
hartytp Aug 9, 2022
5973bef
fix move to FitDescription, yapf, flake
hartytp Aug 9, 2022
d51fd7f
add oitg fitting compatibility layer
hartytp Aug 9, 2022
7b6f536
remove accidentally committed poetry files, change fitting imports to…
hartytp Aug 9, 2022
5838636
fitting.oitg: handle the fact that some fit functions do not provide …
hartytp Aug 9, 2022
2ce7830
test_experiment_entrypoint: update after fitting changes
hartytp Aug 9, 2022
a7d19ee
test_experiment_entrypoint: fix typo
hartytp Aug 9, 2022
6ed7271
ndscan.fitting.oitg: handle the fact that the oitg code does not spec…
hartytp Aug 9, 2022
050aaec
test: update to new fitting code
hartytp Aug 9, 2022
5e1484f
fitting: add a kind parameter to FitDescription
hartytp Aug 10, 2022
ca2daba
FitDescription: add from_dict method
hartytp Aug 10, 2022
5393a03
fitting: fix scale factors, sinusoid: add x0 parameter
hartytp Aug 10, 2022
7062548
Specify an x-axis scale rather than manually specifing scale factors for
hartytp Aug 10, 2022
1beaf6b
fitting: add y-axis scale factor as well
hartytp Aug 10, 2022
6fa674a
fitting.sinusoid: docs
hartytp Aug 10, 2022
6e587a6
Infer fit scale factors from dataset rather than from parameter/result
hartytp Aug 10, 2022
5478678
fitting: add get_default_scale_factors back, sinusoid: give scale fac…
hartytp Aug 10, 2022
8fc2938
fitting: improve sinusoid parameter estimator, don't use abolute sigm…
hartytp Aug 10, 2022
fb47dde
fitting.sinusoid: fix phase estimation and phase wrapping
hartytp Aug 10, 2022
7261abe
fitting.sinusoid: document parameters
hartytp Aug 11, 2022
15702b4
fitting.sinusoid: finish initial implementation
hartytp Aug 11, 2022
9548134
fitting: clip initial values to lie within bounds
hartytp Aug 11, 2022
03c9b36
fitting: add rabi flop fit (untested)
hartytp Aug 11, 2022
f3d9222
fitting: add chi-squared results, bug fixes
hartytp Aug 12, 2022
f2266d5
fix tests
hartytp Aug 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 46 additions & 29 deletions ndscan/experiment/default_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,13 @@
Both can produce annotations; particular values or plot locations highlighted in the
user interface.
"""
import numpy as np
import logging
from typing import Any, Callable, Dict, List, Iterable, Optional, Set, Tuple, Union
import collections
import dataclasses

from .parameters import ParamHandle
from .result_channels import ResultChannel, FloatChannel
from .result_channels import ResultChannel, FloatChannel, OpaqueChannel
from .. import utils

__all__ = [
Expand Down Expand Up @@ -58,6 +57,9 @@ class FitDescription():
parameters. If not specified, the defaults from the fit class are used.
initial_values: dictionary specifying initial parameter values to use in
the fit. These override the values found by the fit's parameter estimator.
export_chi2: If True and y-axis error is provided we export the fit Chi-Squared
and p-value as result channels named ``{analysis_identifier}_chi2`` and
``{analysis_identifier}_p``.
"""

fit_class_name: str
Expand All @@ -66,6 +68,7 @@ class FitDescription():
param_bounds: Dict[str, Tuple[float, float]]
fixed_params: Dict[str, float]
initial_values: Dict[str, float]
export_chi2: bool = True
kind: str = dataclasses.field(init=False, default="fit_description")

@classmethod
Expand Down Expand Up @@ -327,8 +330,8 @@ class OnlineFit(DefaultAnalysis):
dictionaries mapping coordinate names to fit result names. If ``None``,
the defaults provided by the fit function will be used.
:param analysis_identifier: Optional explicit name to use for online analysis.
Defaults to ``fit_<fit_type>``, but can be set explicitly to allow more than one
fit of a given type at a time.
Defaults to ``fit_{fit_module}.{fit_class}``, but can be set explicitly to allow
more than one fit of a given type at a time.
:param constants: dictionary specifying constant values for any non-floated
parameters. If not specified, the defaults from the fit class are used.
:param initial_values: dictionary specifying initial parameter values to use in
Expand All @@ -340,6 +343,9 @@ class OnlineFit(DefaultAnalysis):
:param fit_module: python module containing the fit class. Will default to
`ndscan.fitting` in the future. To use the oitg fitting functions, `fit_module`
should be set to the default `ndscan.fitting.oitg`.
:param export_chi2: If True and y-axis error is provided we export the fit
Chi-Squared and p-value as result channels named
``{analysis_identifier}_chi2`` and ``{analysis_identifier}_p``.
"""
def __init__(self,
fit_class: str,
Expand All @@ -350,15 +356,21 @@ def __init__(self,
initial_values: Optional[Dict[str, Any]] = None,
bounds: Optional[Dict[str, Tuple[float, float]]] = None,
exported_results: Optional[List[ExportedResult]] = None,
fit_module: str = 'ndscan.fitting.oitg'):
fit_module: str = 'ndscan.fitting.oitg',
export_chi2: bool = True):

if analysis_identifier is None:
analysis_identifier = f"fit_{fit_module}.{fit_class}"

# TODO: store a FitDescription object instead of all of these attributes...
self.fit_class_name = fit_class
self.fit_module = fit_module
self.data = data
self.annotations = annotations
self.analysis_identifier = analysis_identifier
self.initial_values = initial_values or {}
self.exported_results = exported_results or []
self.export_chi2 = export_chi2

self._result_channels = []
for result in self.exported_results:
Expand All @@ -372,6 +384,14 @@ def __init__(self,
display_hints={"error_bar_for": channel.path})
self._result_channels.append(err_channel)

if self.export_chi2 is not None and 'y_err' in data.keys():
self._result_channels.append(
OpaqueChannel(path=f"{analysis_identifier}_chi2",
description="Fit Chi-Squared value"))
self._result_channels.append(
OpaqueChannel(path=f"{analysis_identifier}_p",
description="Fit p-value"))

duplicate_channels = [
channel.path
for channel, count in collections.Counter(self._result_channels).items()
Expand Down Expand Up @@ -413,17 +433,9 @@ def describe_online_analyses(
if isinstance(v, ResultChannel)
]

analysis_identifier = self.analysis_identifier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Potentially contentious... 🚨

If no analysis identifier is given we attempt to generate a unique identifier by mangling together the fit type with the result channels (there is more info we could in principle add here -- even the entire fit description -- but this is pretty good). The generated identifier mangles the names of the result channels fed into the fit (in practice this means that the channels fed into y and y_err since only 1D fits are supported at present).

Result channels are given a name in the fragment they are originally defined, but can be renamed in the context they are used in (e.g. subscans rename result channels by appending a prefix to avoid collisions). In the original code, the local names within the scan context were used which seems sensible. However, a side effect here is that if the user doesn't explicitly give us an analysis identifier we cannot generate one until describe_online_analyses.

This commit introduces a change in behaviour by generating the analysis identifier using the original channel names. The motivation was that (as far as I can tell, but I may be wrong) the result channels need to be defined in the DefaultAnalysis.__init__ method which does not have access to the scan context. I wanted to generate p and chi2 result channels for the analysis and it seemed sensible to name them based on the analysis identifier, which then required a change in behaviour.

The new behaviour is more likely to cause an accidental collision, but it's not obvious to me how much this is a practical concern (what real use-case would this affect?). So, I think it's fine, but I also haven't got my head fully around the details so may be missing something

NB in many cases the result channels defined in the original fragment will just be the readout p and p_err so there isn't much value in mangling that into the name since they don't tell us much...

In any case, the tests now pass, but that may say more about the test coverage than the correctness of the design here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, a better design here would probably have been to have used an ExportedResult to export chi2 or p rather than the implementation here. That way we could revert all of the changes around analysis_identifier. It's also a bit more explicit and more consistent with the rest of the interface. It requires a little bit of special casing in DefaultAnalysis to handle, but is probably the better design overall. TODO!

nit pick: would FitResult be a more descriptive name than ExportedResult?

if analysis_identifier is None:
# By default, mangle fit type and channels into a pseudo-unique identifier,
# which should work for the vast majority of cases (i.e. unless the user
# creates needlessly duplicate analyses).
analysis_identifier = ("fit_" + f"{self.fit_module}.{self.fit_class_name}" +
"_" + "_".join(channels))

def analysis_ref(key):
return AnnotationValueRef("online_result",
analysis_name=analysis_identifier,
analysis_name=self.analysis_identifier,
result_key=key)

fit_params = self.fit_klass.get_params()
Expand Down Expand Up @@ -452,18 +464,17 @@ def analysis_ref(key):
parameters={"associated_channels": channels}))

analysis = {
analysis_identifier:
FitDescription(
fit_class_name=self.fit_class_name,
fit_module=self.fit_module,
data={
name: context.describe_coordinate(obj)
for name, obj in self.data.items()
},
param_bounds=self.bounds,
fixed_params=self.constants,
initial_values=self.initial_values,
)
self.analysis_identifier:
FitDescription(fit_class_name=self.fit_class_name,
fit_module=self.fit_module,
data={
name: context.describe_coordinate(obj)
for name, obj in self.data.items()
},
param_bounds=self.bounds,
fixed_params=self.constants,
initial_values=self.initial_values,
export_chi2=self.export_chi2)
}
return [a.describe(context) for a in annotations], analysis

Expand All @@ -485,15 +496,21 @@ def execute(
# TODO: Generalise to higher-dimensional fits.
x = axis_data[self.data['x']._store.identity]
y = result_data[self.data['y']]
y_err = result_data[self.data.get('y_err')]

fit = self.fit_klass(
x=np.asarray(x),
y=np.asarray(y),
y_err=None,
x=x,
y=y,
y_err=y_err,
param_bounds=self.bounds,
fixed_params=self.constants,
initial_values=self.initial_values,
)
if self.export_chi2 and y_err is not None:
chi2, fit_p = fit.chi2
self._result_channels[f"{self.analysis_identifier}_chi2"].push(chi2)
self._result_channels[f"{self.analysis_identifier}_p"].push(fit_p)

for result in self.exported_results:
p, p_err = getattr(fit, result.fit_parameter)
channel = self._result_channels[result.result_name]
Expand Down
33 changes: 29 additions & 4 deletions ndscan/fitting/fitting.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import functools
import collections
from typing import Dict, List, Optional, Tuple, Union
from scipy import optimize
from scipy import optimize, stats

__all__ = ['FitBase']

Expand Down Expand Up @@ -39,9 +39,9 @@ def __init__(
:param initial_values: dictionary specifying initial parameter values to use in
the fit. These override the values found by :meth estimate_parameters:
"""
self._x = x
self._y = y
self._y_err = y_err
self._x = None if x is None else np.asarray(x)
self._y = None if x is None else np.asarray(y)
self._y_err = None if y_err is None else np.asarray(y_err)

self._validate_param_names(fixed_params)
if fixed_params is not None:
Expand Down Expand Up @@ -171,6 +171,31 @@ def residuals(self):
"""Returns the fit residuals."""
return self._y - self._func_free(self._x, self._p)

@property
def chi2(self) -> Tuple[float, float]:
""" Chi-Squared test.

Assumes the values of `y` are independent and normally distributed.
Returns the tuple (Chi-Squared, p-value)
Comment on lines +178 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a closer look at this at all, but it might be valuable to offer some guidance here to users regarding the fact that our data is pretty much never actually normally distributed when working with ion readout data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm...I'm not sure that I have much intelligent to say here (but that probably mainly speaks to my limited knowledge of statistics) beyond "if the errors are well-approximated by normal then the Chi2 p-value is probably a decent estimator for the fit significance."

A couple of other comments here:

  1. I was thinking of perhaps adding an additional get_significance (returns a number between 0 and 1) method to FitBase which would default to returning the Chi Squared p-value. That leaves the option open for people to do something cleverer if desired
  2. I was also thinking of splitting FitBase.fit into two parts: a fit initialisation part and a part that does the fitting. The latter would default to least-squares fitting but allows people to do some other form of parameter estimation if desired.
  3. I was thinking of also letting ExportedResult take a result channel and a fit significance parameter. If the fit significance is above the threshold then the linked dataset is updated. This could also be done through a post-fit call back inside the Analysis class


The the p-value is the probability that the observed deviations from the fit
could be observed by chance assuming the data is normally distributed (values
close to unity indicate high-quality fits).
"""
if self._y_err is None:
raise ValueError("Cannot calculate chi square without knowing y_err")

n = len(self._x) - len(self._free_params)
if n < 1:
raise ValueError("Cannot calculate chi squared with "
f"{len(self._free_params)} fit parameters and only "
f"{len(self._x)} data points.")

y_fit = self.evaluate(self._x)[1]
chi_2 = np.sum(np.power((self._y - y_fit) / self._y_err, 2))
p = stats.chi2.sf(chi_2, n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: check the standard definition of p (is it cdf or sf? i.e. if all points lie on the curve is p 1 or 0).

return chi_2, p

def _calculate_derived_params(self):
"""
Updates fit results with values and uncertainties for derived parameters.
Expand Down