Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitting overhaul #305

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d16b6ad
add new ndscan.fitting module
hartytp Aug 8, 2022
41dccc6
use new fitting framework
hartytp Aug 8, 2022
815f6c3
NamedFit -> FitDescription, remove the Analysis base class
hartytp Aug 9, 2022
5973bef
fix move to FitDescription, yapf, flake
hartytp Aug 9, 2022
d51fd7f
add oitg fitting compatibility layer
hartytp Aug 9, 2022
7b6f536
remove accidentally committed poetry files, change fitting imports to…
hartytp Aug 9, 2022
5838636
fitting.oitg: handle the fact that some fit functions do not provide …
hartytp Aug 9, 2022
2ce7830
test_experiment_entrypoint: update after fitting changes
hartytp Aug 9, 2022
a7d19ee
test_experiment_entrypoint: fix typo
hartytp Aug 9, 2022
6ed7271
ndscan.fitting.oitg: handle the fact that the oitg code does not spec…
hartytp Aug 9, 2022
050aaec
test: update to new fitting code
hartytp Aug 9, 2022
5e1484f
fitting: add a kind parameter to FitDescription
hartytp Aug 10, 2022
ca2daba
FitDescription: add from_dict method
hartytp Aug 10, 2022
5393a03
fitting: fix scale factors, sinusoid: add x0 parameter
hartytp Aug 10, 2022
7062548
Specify an x-axis scale rather than manually specifing scale factors for
hartytp Aug 10, 2022
1beaf6b
fitting: add y-axis scale factor as well
hartytp Aug 10, 2022
6fa674a
fitting.sinusoid: docs
hartytp Aug 10, 2022
6e587a6
Infer fit scale factors from dataset rather than from parameter/result
hartytp Aug 10, 2022
5478678
fitting: add get_default_scale_factors back, sinusoid: give scale fac…
hartytp Aug 10, 2022
8fc2938
fitting: improve sinusoid parameter estimator, don't use abolute sigm…
hartytp Aug 10, 2022
fb47dde
fitting.sinusoid: fix phase estimation and phase wrapping
hartytp Aug 10, 2022
7261abe
fitting.sinusoid: document parameters
hartytp Aug 11, 2022
15702b4
fitting.sinusoid: finish initial implementation
hartytp Aug 11, 2022
9548134
fitting: clip initial values to lie within bounds
hartytp Aug 11, 2022
03c9b36
fitting: add rabi flop fit (untested)
hartytp Aug 11, 2022
f3d9222
fitting: add chi-squared results, bug fixes
hartytp Aug 12, 2022
f2266d5
fix tests
hartytp Aug 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
308 changes: 206 additions & 102 deletions ndscan/experiment/default_analysis.py

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion ndscan/experiment/scan_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ def __init__(self,

def has_level(self, level: int) -> bool:
""

def num_points(limit):
return np.floor(abs(self.centre - limit) / self.spacing)

Expand Down
10 changes: 8 additions & 2 deletions ndscan/experiment/utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import json
import numpy
from typing import Any, Iterable, Optional
import dataclasses
import enum


def path_matches_spec(path: Iterable[str], spec: str) -> bool:
Expand All @@ -20,22 +22,26 @@ def is_kernel(func) -> bool:
return meta.core_name is not None and not meta.portable


class NumpyToVanillaEncoder(json.JSONEncoder):
class Encoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, numpy.integer):
return int(obj)
if isinstance(obj, numpy.floating):
return float(obj)
if isinstance(obj, numpy.ndarray):
return obj.tolist()
if dataclasses.is_dataclass(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't dislike this (although the dump_json docstring would need adapting), but I wonder whether we can come up with some sort of best practices for using dataclasses/… in PYON. (I used JSON here instead of PYON on purpose to make it easier to load the result files from e.g. Mathematica or Julia, but that could be adapted.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I wonder whether we can come up with some sort of best practices for using dataclasses/… in PYON

You mean best practices within ndscan or best practices for general pyon? If we do go down the route of replacing the dictionary schema with something like dataclasses, I was wondering of having a Schema (or SerialisableDataClass or whatever) base class which helps to formalise some of this. That could then provide to_dict and from_dict (or describe and decode or whatever) methods to help keep things uniform.

return dataclasses.asdict(obj)
if isinstance(obj, enum.Enum):
return obj.value
return json.JSONEncoder.default(self, obj)


def dump_json(obj: Any) -> str:
"""Serialise ``obj`` as a JSON string, with NumPy numerical/array types encoded as
their vanilla Python counterparts.
"""
return json.dumps(obj, cls=NumpyToVanillaEncoder)
return json.dumps(obj, cls=Encoder)


def to_metadata_broadcast_type(obj: Any) -> Optional[Any]:
Expand Down
6 changes: 6 additions & 0 deletions ndscan/fitting/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .fitting import *
from .sinusoid import *

__all__ = []
__all__.extend(fitting.__all__)
__all__.extend(sinusoid.__all__)
312 changes: 312 additions & 0 deletions ndscan/fitting/fitting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
import numpy as np
import functools
import collections
from typing import Dict, List, Optional, Tuple, Union
from scipy import optimize, stats

__all__ = ['FitBase']


# TODO: type annotations! (What's the best way of annotating np arrays?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I find annoying too -- I'd like to be able to say what dimensions I expect the array to be.

I will send you a link to how I've handled this in an OxIonics project, but afaik there's nothing built into numpy for this. You'd just have to use np.ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at this as well in the context of NAC3, and came to the same conclusion; this isn't nicely supported in NumPy at all. There is some support for type annotations now (e.g. you can do ndarray[int32, 2]), but IIRC it isn't even clearly defined what the index types should even be.

# TODO: docstring formatting for sphinx
class FitBase:
def __init__(
self,
x=None,
y=None,
y_err=None,
param_bounds: Optional[Dict[str, Tuple[float, float]]] = None,
fixed_params: Optional[Dict[str, float]] = None,
initial_values: Optional[Dict[str, float]] = None,
):
"""
If `x` and `y` are provided we fit the provided dataset. `x`, `y` and `y_err`
may be updated post construction to fit other datasets (c.f. :meth fit:).

Fit results are exposed as class properties. For example, a fit parameter
'foo' is accessed via 'fit.foo', which returns a tuple with the fitted value
of 'foo' and its estimated uncertainty.

:param x: array of x-axis values.
:param y: array of y-axis values.
:param y_err: array of y-axis uncertainties.
:param param_bounds: dictionary of tuples containing the lower and upper bounds
for each parameter. If not specified, the defaults from
:meth get_default_bounds: are used.
:param fixed_params: dictionary specifying constant values for any non-floated
parameters. If not specified, the defaults from
:meth get_default_fixed_params: are used.
:param initial_values: dictionary specifying initial parameter values to use in
the fit. These override the values found by :meth estimate_parameters:
"""
self._x = None if x is None else np.asarray(x)
self._y = None if x is None else np.asarray(y)
self._y_err = None if y_err is None else np.asarray(y_err)

self._validate_param_names(fixed_params)
if fixed_params is not None:
self._fixed_params = fixed_params
else:
self._fixed_params = self.get_default_fixed_params()
self._free_params = [
param for param in self.get_params()
if param not in self._fixed_params.keys()
]

self._validate_param_names(param_bounds)
self._param_bounds = self.get_default_bounds()
self._param_bounds.update(param_bounds or {})
self._param_bounds = {
param: bound
for param, bound in self._param_bounds.items() if param in self._free_params
}

self._validate_param_names(initial_values, self._free_params)
self._initial_values = initial_values or {}

self._x_scale = None
self._y_scale = None
self._scale_factors = None
self._p = {}
self._perr = {}

def getter(name, instance):
return instance._p[name], instance._p_err[name]

cls = type(self)
for param in self.get_params() + self.get_derived_params():
setattr(cls, param, property(functools.partial(getter, param)))

if self._x is not None and self._y is not None:
self.fit()

def fit(self) -> Tuple[Dict[str, float], Dict[str, float]]:
"""
Fit the dataset and return the fitted parameter values and uncertainties.
"""
valid_pts = np.logical_and(np.isfinite(self._x), np.isfinite(self._y))
x = self._x[valid_pts]
y = self._y[valid_pts]
y_err = None if self._y_err is None else self._y_err[valid_pts]

self._x_scale = np.max(np.abs(self._x))
self._y_scale = np.max(np.abs(self._y))
self._scale_factors = self.get_default_scale_factors()
assert set(self._scale_factors) == set(self.get_params())

initial_values = self._estimate_parameters()
initial_values.update(self._initial_values)

for param in self.get_free_params():
param_bounds = self._param_bounds[param]
initial_values[param] = np.clip(initial_values[param], param_bounds[0],
param_bounds[1])

lower_bounds = [self._param_bounds[param][0] for param in self._free_params]
upper_bounds = [self._param_bounds[param][1] for param in self._free_params]

p0 = [initial_values[param] for param in self._free_params]
p0 = [max(estimate, lower) for estimate, lower in zip(p0, lower_bounds)]
p0 = [min(estimate, upper) for estimate, upper in zip(p0, upper_bounds)]

scale_factors = [
self._scale_factors.get(param, 1) for param in self._free_params
]
scale_factors = np.asarray(scale_factors)

p0 = np.asarray(p0) / scale_factors
lower_bounds = np.asarray(lower_bounds) / scale_factors
upper_bounds = np.asarray(upper_bounds) / scale_factors

p, p_cov = optimize.curve_fit(
f=functools.partial(self._func_free, scale_factors),
xdata=x,
ydata=y,
p0=p0,
sigma=y_err,
absolute_sigma=y_err is not None,
bounds=(lower_bounds, upper_bounds),
)

p *= scale_factors
p_cov *= scale_factors**2

self._p_cov = p_cov
p_err = np.sqrt(np.diag(p_cov))

self._p = {param: value for param, value in zip(self._free_params, p)}
self._p_err = {param: value for param, value in zip(self._free_params, p_err)}
self._p.update(self._fixed_params)
self._p_err.update({param: 0 for param in self._fixed_params.keys()})

self._calculate_derived_params()
assert set(self.get_params() + self.get_derived_params()) == set(self._p.keys())
return self._p, self._p_err

def _func_free(self, scale_factors, x, *args):
"""Evaluate the fit function with given values for the free parameters."""
params = dict(self._fixed_params)

p = np.asarray(args)
p *= scale_factors
params.update({k: v for k, v in zip(self._free_params, p)})

return self.func(x, params)

def evaluate(self, x_fit: Union[np.array, int] = 100):
"""Evaluates the function along x-fit and returns the tuple (x_fit, y_fit)
with the results.

`x_fit` may either be a scalar or an array. If it is a scalar it gives the
number of equally spaced points between `min(self.x)` `max(self.x)`. Otherwise
it gives the x-axis to use.
"""
if np.isscalar(x_fit):
x_fit = np.linspace(np.min(self._x), np.max(self._x), x_fit)
y_fit = self.func(x_fit, self._p)
return x_fit, y_fit

@property
def residuals(self):
"""Returns the fit residuals."""
return self._y - self._func_free(self._x, self._p)

@property
def chi2(self) -> Tuple[float, float]:
""" Chi-Squared test.

Assumes the values of `y` are independent and normally distributed.
Returns the tuple (Chi-Squared, p-value)
Comment on lines +178 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a closer look at this at all, but it might be valuable to offer some guidance here to users regarding the fact that our data is pretty much never actually normally distributed when working with ion readout data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm...I'm not sure that I have much intelligent to say here (but that probably mainly speaks to my limited knowledge of statistics) beyond "if the errors are well-approximated by normal then the Chi2 p-value is probably a decent estimator for the fit significance."

A couple of other comments here:

  1. I was thinking of perhaps adding an additional get_significance (returns a number between 0 and 1) method to FitBase which would default to returning the Chi Squared p-value. That leaves the option open for people to do something cleverer if desired
  2. I was also thinking of splitting FitBase.fit into two parts: a fit initialisation part and a part that does the fitting. The latter would default to least-squares fitting but allows people to do some other form of parameter estimation if desired.
  3. I was thinking of also letting ExportedResult take a result channel and a fit significance parameter. If the fit significance is above the threshold then the linked dataset is updated. This could also be done through a post-fit call back inside the Analysis class


The the p-value is the probability that the observed deviations from the fit
could be observed by chance assuming the data is normally distributed (values
close to unity indicate high-quality fits).
"""
if self._y_err is None:
raise ValueError("Cannot calculate chi square without knowing y_err")

n = len(self._x) - len(self._free_params)
if n < 1:
raise ValueError("Cannot calculate chi squared with "
f"{len(self._free_params)} fit parameters and only "
f"{len(self._x)} data points.")

y_fit = self.evaluate(self._x)[1]
chi_2 = np.sum(np.power((self._y - y_fit) / self._y_err, 2))
p = stats.chi2.sf(chi_2, n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: check the standard definition of p (is it cdf or sf? i.e. if all points lie on the curve is p 1 or 0).

return chi_2, p

def _calculate_derived_params(self):
"""
Updates fit results with values and uncertainties for derived parameters.
"""
pass

@staticmethod
def func(x, params):
"""The fit function."""
raise NotImplementedError

@staticmethod
def get_params() -> List[str]:
"""Returns list of fit params"""
raise NotImplementedError

def get_free_params(self) -> List[str]:
""" Returns a list of the free (not held constant) fit parameters. """
return self._free_params

@staticmethod
def get_derived_params() -> List[str]:
"""Returns list of derived parameters"""
return []

@staticmethod
def get_default_fixed_params() -> Dict[str, float]:
"""Returns a dictionary mapping names of parameters which are not floated by
default to their values.
"""
return {}

@staticmethod
def get_default_bounds() -> Dict[str, Tuple[float, float]]:
"""
Returns a dictionary mapping parameter names to a tuple of (upper, lower) bounds
"""
raise NotImplementedError

@staticmethod
def get_default_annotations():
""" Returns a dictionary of default annotations for the fit. """
return {}

def get_default_scale_factors(self) -> Dict[str, float]:
""" Returns a dictionary of default parameter scale factors. """
return {param: 1 for param in self.get_params()}

def _estimate_parameters(self) -> Dict[str, float]:
"""
Returns a dictionary of estimates for the parameter values for the current
dataset.
"""
raise NotImplementedError

def _validate_param_names(self, param_names, valid_params=None):
if param_names is None:
return

valid_params = set(
valid_params if valid_params is not None else self.get_params())
params = set(param_names)

duplicates = [
param for param, count in collections.Counter(params).items() if count > 1
]
if duplicates:
raise ValueError(f"Duplicate parameters: {','.join(duplicates)}")

invalid = params - params.intersection(valid_params)
if invalid:
raise ValueError(f"Invalid parameters: {','.join(invalid)}")

@property
def x(self):
""" Dataset x axis """
return self._x

@x.setter
def x(self, x):
self._x = np.asarray(x)
self._p = {}
self._perr = {}
self._x_scale = None
self._y_scale = None
self._scale_factors = None

@property
def y(self):
""" Dataset y axis """
return self._y

@y.setter
def y(self, y):
self._y = np.asarray(y)
self._p = {}
self._perr = {}
self._x_scale = None
self._y_scale = None
self._scale_factors = None

@property
def y_err(self):
""" Dataset y-axis uncertainty """
return self._y_err

@y_err.setter
def y_err(self, y_err=None):
self._y_err = np.asarray(y_err) if y_err is not None else None
self._p = {}
self._perr = {}
self._x_scale = None
self._y_scale = None
self._scale_factors = None
Loading