Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitting overhaul #305

Closed
wants to merge 27 commits into from
Closed

Conversation

hartytp
Copy link
Contributor

@hartytp hartytp commented Aug 8, 2022

  • adds a new ndscan.fitting module with Sinusoid as an example fit class
  • adds support for online analyses using fit classes from any module accessible from the current environment
  • allows the user to automatically generate result channels from online fits

Example of the new framework

import numpy as np
import artiq.experiment as aq
import ndscan.experiment as nd


class TestFitting(nd.ExpFragment):
    def build_fragment(self):
        self.setattr_param(
            "t", nd.FloatParam, description="time", default=1 * aq.us, unit="us"
        )
        self.setattr_param(
            "t0", nd.FloatParam, description="time offset", default=1 * aq.us, unit="us"
        )
        self.setattr_param(
            "f", nd.FloatParam, description="frequency", default=1 * aq.MHz, unit="MHz"
        )
        self.setattr_result("y", nd.FloatChannel)

    def run_once(self):
        self.y.push(np.sin(2 * np.pi * self.f.get() * (self.t.get() - self.t0.get())))

    def get_default_analyses(self):
        # creates result channels 'omega', 'a', 'omega_err' and 'a_err'
        return [
            nd.OnlineFit(
                fit_module="ndscan.fitting",
                fit_class="Sinusoid",
                annotations={'t0': {'x': 'x0'}},
                constants={'phi': 0},
                data={"x": self.t, "y": self.y},
                exported_results=[nd.ExportedResult("omega"), nd.ExportedResult("a")],
            )
        ]


ScanTestFitting = nd.make_fragment_scan_exp(TestFitting)

To do before merge:

  • fix backwards compatibility with existing code. Will likely require a small modification to oitg.fitting and a bit of special casing in ndscan
  • sort out type annotations in ndscan.fitting (how should we annotate arrays?)
  • fix docstring formatting
  • fix tests
  • add tests for new code (how thorough do we want to be on this?)

One other nice consequence of this is that the error channels this creates are properly configured. In contrast with CustomAnalysis there is no way of telling ndscan that one channel is an error bar for another.

Why replace oitg.fitting?

I've replaced oitg.fitting.FitBase with a new class ndscan.fitting.FitBase, which is a complete rewrite from scratch. While the idea behind oitg.fitting.FitBase is great, I've always found the implementation rather convoluted and unergonomic. The implementation here is (hopefully!) simpler and easier to use while still providing all of the functionality we want.

I also found the quality of the fits in oitg.fitting to be a bit variable. e.g. why do we have so many functions which are all essentially fit sinusoids, but only one has a decent heuristic? AFAICT this is partly because the way the package is structured (exporting functions rather than the classes) makes inheritance / code reuse harder.

Given how important online fits are to ndscan, it always felt odd to me that ndscan relies on oitg to provide this functionality. Moving the fit base class into ndscan removes this coupling. In so doing, it makes it easier to break backward compatibility between the old and new fit functions. We also allow the fit classes to provide metadata such as default annotations.

Edit: another thing I don't like about the current oitg code is the way it handles parameter scaling. It normalises the parameters by their initial (estimated) values. This has a habit of blowing up if the initial guess is small. I've chosen to go down a different route here and allow the fit functions to define how each parameter scales with the x / y units, which should be more robust.

Using arbitrary modules

My expectation is that ndscan.fitting will grow to contain a small number of high-quality basic fitting functions (probably porting a subset of oitg.fitting), which can be supplemented by user code, but the bar for getting code into this should be high.

After this PR we allow users to supplement these inbuilt fits with their own library of fit classes derived from ndscan.fitting.FitBase. One nice feature of this is that if your applet is run from an environment with your fragment library it it, the applets can run any fit code defined alongside the fragments. This makes it really easy to define small modifications to fit functions (e.g. to do a bit of pre/post processing, add a derived result, etc) by creating derived classes.

Exporting results

One of the things that has always frustrated me is the amount of boilerplate it takes to get fit results into result channels. This PR creates an interface which makes this ergonomic and simple -- no more CustomeAnalysis to just duplicate the online fit!

Related issues

#251
#141

Re #272 (comment): the new NamedFit object gives an example of how this kind of additional structure might look...

ndscan/fitting/__init__.py Outdated Show resolved Hide resolved

klass = utils.import_class(self.fit_module, self.fit_class_name)
self.fit_klass = klass
self.bounds = bounds if bounds is not None else klass.get_default_bounds()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm...should the bounds replace the defaults completely or just overwrite specific ones 🤔 . Same in the fitting code. Same question for constants etc...

param_bounds: Optional[Dict[str, Tuple[float, float]]] = None,
fixed_params: Optional[Dict[str, float]] = None,
initial_values: Optional[Dict[str, float]] = None,
scale_factors: Optional[Dict[str, float]] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow fit classes to specify default scale factors. This isn't so useful for general purpose fits like the sinusoid but might be for more specialised fit classes. I haven't bothered to add it for an initial implementation as it's easy to add later.

parameters. If not specified, the defaults from the fit class are used.
initial_values: dictionary specifying initial parameter values to use in
the fit. These override the values found by the fit's parameter estimator.
scale_factors: dictionary specifying scale factors for parameters. The
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did briefly consider having a mechanism that allows the user to get default scale factors from the axis units (e.g. if I'm scanning a time in us then 1e6 is probably a good scale factor) but this felt more effort than it's worth! We could also do something like the default scale factor estimation in oitg.fitting but AFAICT this can do nasty things if the initial parameter guess is close to (but not equal to) zero -- although I might be wrong, I haven't fully understood that code!

Copy link
Contributor

@lochsh lochsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super quick first pass on basic style issues -- I'll do another pass to look at the design 👍

Comment on lines 19 to 23
import numpy as np
import logging
from typing import Any, Callable, Dict, List, Iterable, Optional, Set, Tuple, Union
import collections
from enum import Enum
import dataclasses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import numpy as np
import logging
from typing import Any, Callable, Dict, List, Iterable, Optional, Set, Tuple, Union
import collections
from enum import Enum
import dataclasses
import collections
import dataclasses
from enum import Enum
import logging
from typing import Any, Callable, Dict, List, Iterable, Optional, Set, Tuple, Union
import numpy as np

I don't know about ndscan, but generally we'd arrange the imports in 3 groups:

  • standard library
  • third party
  • internal

and it's generally considered good practice to alphabetise within the groups, to discourage deliberate ordering of imports to mask dependency issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't see the point of differentiating between third-party packages coming from the standard library and from other sources; it just adds an artificial visual distinction for where there is no actual semantic difference (e.g. ARTIQ is, for ndscan.experiment, in effect the "standard library").

But yes, please do try to keep things roughly grouped and in alphabetical order. (I'm sure there are places where that's not quite followed in ndscan; I've long outsourced caring about mechanical stylistic issues to tools, but I don't currently have one for imports in the CI setup, although I wouldn't be opposed in principle to adding something like isort.)

Comment on lines 54 to 56
def __post_init__(self):
if type(self) == NamedFit:
self.kind = AnalysisType.NAMED_FIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the __post_init__ of NamedFit, without the conditional. Unless there was some reason for it to be here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually unsure about this whole construct, but I'll think about that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually unsure about this whole construct, but I'll think about that later.

Yeah, this was a bit of an experiment.

Comment on lines 252 to 240
raise ValueError(f"Duplicate analysis result channel name '{name}' " +
f"in analysis for axes [{axes}]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Duplicate analysis result channel name '{name}' " +
f"in analysis for axes [{axes}]")
raise ValueError(f"Duplicate analysis result channel name '{name}' "
f"in analysis for axes [{axes}]")

No need for the + here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep to the existing coding style. Implicit string concatenation is evil; I'm still hoping the Python core devs will at some point see the light and remove it (e.g. as suggested by Guido himself as well).

from scipy import optimize


# TODO: type annotations! (What's the best way of annotating np arrays?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I find annoying too -- I'd like to be able to say what dimensions I expect the array to be.

I will send you a link to how I've handled this in an OxIonics project, but afaik there's nothing built into numpy for this. You'd just have to use np.ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at this as well in the context of NAC3, and came to the same conclusion; this isn't nicely supported in NumPy at all. There is some support for type annotations now (e.g. you can do ndarray[int32, 2]), but IIRC it isn't even clearly defined what the index types should even be.

@lochsh
Copy link
Contributor

lochsh commented Aug 9, 2022

@hartytp could you explain the design in default_analysis.py? The Analysis and DefaultAnalysis classes don't seem to have any relation (inheritance-wise), neither do OnlineFit and NamedFit. From the names alone this raises questions about the design for me.

I note Analysis only has one child, and the AnalysisType enum only has one entry. What's the intention behind both of these classes' existence?

The docstring for NamedFit says "function used in online analyses", but there's no callable attributes.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

@lochsh all good questions!

I note Analysis only has one child, and the AnalysisType enum only has one entry. What's the intention behind both of these classes' existence?

This was essentially just trying to mimic the structure that was already used in ndscan but replacing dictionaries with objects to improve discoverability.

Some background...previously the code in default analysis looked like

return [a.describe(context) for a in annotations], {
analysis_identifier: {
"kind": "named_fit",
"fit_type": self.fit_type,
"data": {
name: context.describe_coordinate(obj)
for name, obj in self.data.items()
},
"constants": self.constants,
"initial_values": self.initial_values
}
}

Since the online fits are done in applets, which only communicate with the master through datasets, all the scan data gets serialised and stuffed inside a dataset, which happens inside TopLevelRunner. Because everything is dictionaries in the current ndscan I always find the dataflow here rather confusing (everything is just called "schema" which makes searching for things essentially impossible, so you kind of need to know how it all fits together to find anything). On the applet side, the dictionaries get picked up here. This is then used inside online analysis e.g.

self._fit_type = self._schema["fit_type"]

Coming back to your questions

I note Analysis only has one child, and the AnalysisType enum only has one entry

Because the current ndscan code is written in a way that only allows one analysis type at present (otherwise we hit the warning I posted above), but seems designed specifically to allow the possibility of future types at present. As to whether we'd be happy committing to only having one analysis type going forwards, that's a question for @dnadlinger -- I don't have enough perspective about the plans for this package to comment on that. If we were happy committing to only having named fits, I agree that we could simplify the design here (scrap the enum etc).

Edit: putting it another way, I think this is an instance where I'm struggling because the structure/intent of what's going on here is rather implicit. The move to data classes here was trying to make it all a bit more explicit.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

The docstring for NamedFit says "function used in online analyses", but there's no callable attributes.

Maybe the docstring here could be improved (any suggestions?). The intention here is that this dataclass describes the fit object, giving all the information we need to perform the fit later on. The fit object should be any class available in the python path that inherits from ndscan.fitting.FitBase.

One might think "why not pass the fit object in directly rather than just describe it?". The motivation here is that we need the applets to be able to do their own local fitting in separate processes. So we need to give them enough information to instantiate the fit object for themselves.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

The Analysis and DefaultAnalysis classes don't seem to have any relation (inheritance-wise), neither do OnlineFit and NamedFit. From the names alone this raises questions about the design for me.

Agreed, these names could be clearer! I think @dnadlinger was considering renaming DefaultAnalysis to Analysis at some point in the future, which would make sense to me anyway!

I struggled a bit with the naming of the Analysis enum since I couldn't really tell from reading the code what functionality we needed to provide. As I mentioned above, it may be that we can just scrap it. Then maybe we rename DefaultAnalysis to Analysis (possibly with an alias for backward compatibility with existing code).

As to OnlineFit v NamedFit: the former describes all of the information that ndscan needs for an online fit and also provides functions like execute. The latter is a small amount of information that describes the fit function itself and is intended to be serializable so the applets can use it as well (serializing the entire OnlineFit data structure would make less sense IMHO).

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

Thinking about this more, I wonder if a better design would be to scrap the AnalysisType enum entirely and then have Analysis (or whatever we call it after we've clarified its functionality) provide a from_dict static method. I also like that because it would allow us to remove

kind = schema["kind"]
if kind == "named_fit":
del schema["kind"]
analysis = experiment.NamedFit(**schema)
self._online_analyses[name] = OnlineNamedFitAnalysis(analysis, self)
else:
logger.warning("Ignoring unsupported online analysis type: '%s'", kind)
so we localise the parts of the codebase which ned to understand about the Analysis subclasses.

Side note: right now a lot of the serialisation in ndscan feels a bit ad hoc. Some -- but not all -- of the classes which get serialises have describe methods. I wonder if it would be better to have a Serializable base class which provides as_dict and from_dict methods which can be overridden to recursively serialise / deserialize objects in a more principled way? I haven't thought about this too much so maybe it's a bad idea and, in any case, it would be way outside the scope of this PR and would need separate discussion / buy in from @dnadlinger. Just raising while it's on my mind.

@AUTProgram
Copy link

AUTProgram commented Aug 9, 2022

I agree that cleaning up fitting routines and having a few robust, high-quality fit functions that work under a wide variety of conditions is a good idea. Fitting data is a very general task, is there a reason to make this specific to ndscan? People might want to use these classes/functions for data analysis in other contexts, too. For example, they might be used in jupyter notebooks for general-purpose fitting. Having a dependency on ndscan (which probably requires artiq) seems like a lot of overhead in this case. Could FitBase and derived classes not live outside ndscan (similar to oitg now, but with improved quality) in a more general repo?

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

  • rebased on upstream master

@lochsh I thought more about your feedback. Since I am not aware of any need for a case where kind will not be named_fit I've removed the Analysis class and corresponding enum and replaced the NamedFit dataclass with FitDescription. This feels like a nice simplification which cleans up the internal API. This is all internal API so easy to modify if there are other use cases I'm missing but the simpler version feels nicer for now.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

an only marginally cursed compatibility layer added to make this compatible with existing code.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

There are now only two test failures: test_experiment_entrypoint.FragmentScanExpCase and test_experiment_entrypoint.FragmentScanExpCase both fail in check_unprocessed_arguments. Once this is resolved I'll test again.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

I think this PR is now substantively complete in its first iteration. The main outstanding item is deciding what new tests we want and implementing them. Other than that, it's waiting for reviews.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 9, 2022

I agree that cleaning up fitting routines and having a few robust, high-quality fit functions that work under a wide variety of conditions is a good idea. Fitting data is a very general task, is there a reason to make this specific to ndscan? People might want to use these classes/functions for data analysis in other contexts, too. For example, they might be used in jupyter notebooks for general-purpose fitting. Having a dependency on ndscan (which probably requires artiq) seems like a lot of overhead in this case. Could FitBase and derived classes not live outside ndscan (similar to oitg now, but with improved quality) in a more general repo?

Thanks for raising that @AUTProgram ! This is one of the design choices I made here that I'm not totally sure about. I completely agree that we should expect that our user-written fit code will be used outside of artiq experiments e.g. data post analysis notebooks as you say.

I feel that the FitBase class is closely coupled to ndscan (e.g. it provides default annotations for plots) so it makes sense to have it in ndscan. But I can also see that I'm probably rather biased in this assessment by the recentness of having written FitBase as part of an overhaul of ndscan fitting.

So, what are our options?

  1. Put the new FitBase in oitg.
  • need to manage having new and old fitting code in oitg
  • my current understanding is that oitg is a bit of a dumping ground for any code that the uni group wants to make public, but isn't involved enough to get its own repo. As such, it feels like a shame to build such a key part of ndscan around it, but maybe that's a silly concern?
  • changes to FitBase will affect ndscan so need to keep oitg and ndscan in sync, but again, probably not a bit deal
  1. Keep it in ndscan.fitting
  • downside is that then any user analysis built on this framework needs to pull in ndscan and its dependencies. How bad is that? It's a bit of bloat, but maybe acceptable. I expect, for example, that our analysis repo will end up with ndscan as a dependency anyway because we'll want to be able to use ndscan_show etc.
  1. Put it in a new repo. e.g. ndscan_fitting or ndscan_utils. Feels a bit overkill unless there are more ndscan-related tools that we want to make available
  2. Make a second python package (ndscan_fitting or similar) in the same repo as ndscan which has the fit code so they can be installed without artiq etc.
  3. Something else?

I'm not totally sure which I prefer. (1) is probably the path of least resistance though even if it's not my preference.

In the end, I think this is going to depend somewhat on what the fate of this PR turns out to be. I'm not sure if it's likely to end up getting merged into upstream. If not and we push some version of it onto our fork only it changes some of my thinking about the best place to put this sort of code.

@dnadlinger
Copy link
Member

Regarding dependencies, code in the other subpackages shouldn't have dependencies into ndscan.experiment to keep the "client-side" part usable from analysis environments without ARTIQ (we might want to split up the Python packages in the future once we properly add the ARTIQ dependency).

@hartytp
Copy link
Contributor Author

hartytp commented Aug 11, 2022

Regarding dependencies, code in the other subpackages shouldn't have dependencies into ndscan.experiment to keep the "client-side" part usable from analysis environments without ARTIQ (we might want to split up the Python packages in the future once we properly add the ARTIQ dependency).

ack. It's easy to move some classes around to avoid this. I just wasn't sure where the best place for them to be was.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 11, 2022

Thanks for the PR. As always in open-source development, for larger-scale contributions that involve questions about the wider design, project scope, etc., it is generally a good idea to raise these separately first before starting work on an implementation.

I do largely agree with that. The flipside is that sometimes it's useful to build a strawman implementation to get a clear feel for how the design choices will play out in detail.

Small improvements to the same can go into oitg, and we can always add the option to specify custom code somehow for online fits (e.g. in module/class form, as suggested here).

Am I wrong in thinking that this PR could be cast into that form by moving the bits from ndscan.fitting into oitg.fitting2 or similar?

If we want to supply an actual fitting library with ndscan, then I think the bar should be a bit higher and we should take a step back and look at what typical use cases actually look like. For instance, one of the things that's sorely missing currently for writing efficient servos/… is some goodness-of-fit mechanism.

agreed, that would be great. Re goodness of fit, that feels like something which would naturally be built on top of this PR, right? (e.g. add a get_chi_squared or whatever to FitBase and expose that as a result channel).

We might also want to provide some more structural support for calibration experiments (where failing fits can be responded to, etc.), like some of the other ARTIQ libraries out there do.

Interesting. I'm not fully sure I know what you have in mind here. Could you provide a link to relevant code / a point in the right direction?

@hartytp
Copy link
Contributor Author

hartytp commented Aug 11, 2022

Added Rabi flop fits, which serve the purpose of the oitg.fitting rabi flop and detuned square pulse functions. The only sine-ish functionality provided by oitg.fitting but not ndscan.fitting in this PR is now dipole_bsb_car_rsb which will probably become LaserFlop (todo: bikeshed name) at some stage in the future.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 11, 2022

Anyway, I'm going to leave this here for now, pending feedback

@hartytp
Copy link
Contributor Author

hartytp commented Aug 12, 2022

For instance, one of the things that's sorely missing currently for writing efficient servos/… is some goodness-of-fit mechanism.

Good call! Implemented, that's much cleaner than the way we were doing things.

@@ -413,17 +433,9 @@ def describe_online_analyses(
if isinstance(v, ResultChannel)
]

analysis_identifier = self.analysis_identifier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Potentially contentious... 🚨

If no analysis identifier is given we attempt to generate a unique identifier by mangling together the fit type with the result channels (there is more info we could in principle add here -- even the entire fit description -- but this is pretty good). The generated identifier mangles the names of the result channels fed into the fit (in practice this means that the channels fed into y and y_err since only 1D fits are supported at present).

Result channels are given a name in the fragment they are originally defined, but can be renamed in the context they are used in (e.g. subscans rename result channels by appending a prefix to avoid collisions). In the original code, the local names within the scan context were used which seems sensible. However, a side effect here is that if the user doesn't explicitly give us an analysis identifier we cannot generate one until describe_online_analyses.

This commit introduces a change in behaviour by generating the analysis identifier using the original channel names. The motivation was that (as far as I can tell, but I may be wrong) the result channels need to be defined in the DefaultAnalysis.__init__ method which does not have access to the scan context. I wanted to generate p and chi2 result channels for the analysis and it seemed sensible to name them based on the analysis identifier, which then required a change in behaviour.

The new behaviour is more likely to cause an accidental collision, but it's not obvious to me how much this is a practical concern (what real use-case would this affect?). So, I think it's fine, but I also haven't got my head fully around the details so may be missing something

NB in many cases the result channels defined in the original fragment will just be the readout p and p_err so there isn't much value in mangling that into the name since they don't tell us much...

In any case, the tests now pass, but that may say more about the test coverage than the correctness of the design here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, a better design here would probably have been to have used an ExportedResult to export chi2 or p rather than the implementation here. That way we could revert all of the changes around analysis_identifier. It's also a bit more explicit and more consistent with the rest of the interface. It requires a little bit of special casing in DefaultAnalysis to handle, but is probably the better design overall. TODO!

nit pick: would FitResult be a more descriptive name than ExportedResult?


y_fit = self.evaluate(self._x)[1]
chi_2 = np.sum(np.power((self._y - y_fit) / self._y_err, 2))
p = stats.chi2.sf(chi_2, n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: check the standard definition of p (is it cdf or sf? i.e. if all points lie on the curve is p 1 or 0).

@dnadlinger
Copy link
Member

Just saw I missed this in the above (although I don't really have the bandwidht for any work on this still):

Because the current ndscan code is written in a way that only allows one analysis type at present (otherwise we hit the warning I posted above), […]

I'm not sure whether this is based on a misunderstanding of the architecture, or I am just misunderstanding what you mean here. The point of this is precisely to accomodate other online analysis types (such as ones based on custom code loaded at runtime, as proposed here), but of course, the applet somehow needs to know how to deal with them. Since the design of ndscan is such that the only interface between experiment and applets or other postprocessing code is through the well-defined schema and associated dataset contents (doing anything different would pretty much be going against the ARTIQ model), there needs to be some registry that maps string descriptions to the code handling this on the applet side. The if statement (+ fallthrough warning) you linked is precisely that registry.

For instance, there is no immediate need for an oitg.fitting compatibility layer; we can just keep the old named_fit online analyses in place and add all the new hotness (custom annotations/parameters/…) for an alternative type.

@@ -127,7 +170,7 @@ def required_axes(self) -> Set[ParamHandle]:

def describe_online_analyses(
self, context: AnnotationContext
) -> Tuple[List[Dict[str, Any]], Dict[str, Dict[str, Any]]]:
) -> Tuple[List[Dict[str, Any]], Dict[str, FitDescription]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be a FitDescription, as there might be other online analysis types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. I wasn't sure quite how to name this class. Maybe better to have an AnalysisSchema (or AnalysisDescription or bikeshed any other name) base class to go alongside Analysis? With the idea being that the analysis Analysis object lives experiment-side, while AnalysisSchema is broadcast to the applets etc. Then we have a natural FitAnalysisSchema object here to go with the new FitAnalysis class (OnlineFit replacement).


self._recompute_in_progress = False
self.updated.emit()


def _run_fit(fit_type, xs, ys, y_errs, constants, initial_values):
def _run_fit(fit_obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? I kept the interface painfully scalar since I figured that the arguments would need to be transferred out of process to the executor somehow, but perhaps the fit object is pickled/unpickled just fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! The only thing that doesn't work are the fit result properties that I dynamically create during __init__ (and, in any case, I'm not sure that those are really good design anyway).

....although I'm still not convinced that this is the right way of handling it. I was thinking of changing this to pass the FitDescription into the executor and then have the code here do exactly the same thing as the applet.

@@ -114,7 +115,7 @@ def d(key):
for i in range(len(timestamps)):
prev = 0.0 if i == 0 else timestamps[i - 1]
cur = timestamps[i]
self.assertGreater(cur, prev)
self.assertGreaterEqual(cur, prev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you actually run into an issue here? I used strict inequality on purpose to reject the case where e.g. all timestamps are just set to the same value by accident, since the granularity of time.monotonic should be considerably better than microseconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but it's possible that it was unrelated and I didn't dig into it properly (e.g. could have been an artiq version conflict?). This is a bit sloppy anyway. I'll revert this change and open an issue if I hit it again.

def default(self, obj):
if isinstance(obj, numpy.integer):
return int(obj)
if isinstance(obj, numpy.floating):
return float(obj)
if isinstance(obj, numpy.ndarray):
return obj.tolist()
if dataclasses.is_dataclass(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't dislike this (although the dump_json docstring would need adapting), but I wonder whether we can come up with some sort of best practices for using dataclasses/… in PYON. (I used JSON here instead of PYON on purpose to make it easier to load the result files from e.g. Mathematica or Julia, but that could be adapted.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I wonder whether we can come up with some sort of best practices for using dataclasses/… in PYON

You mean best practices within ndscan or best practices for general pyon? If we do go down the route of replacing the dictionary schema with something like dataclasses, I was wondering of having a Schema (or SerialisableDataClass or whatever) base class which helps to formalise some of this. That could then provide to_dict and from_dict (or describe and decode or whatever) methods to help keep things uniform.

Comment on lines +178 to +179
Assumes the values of `y` are independent and normally distributed.
Returns the tuple (Chi-Squared, p-value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a closer look at this at all, but it might be valuable to offer some guidance here to users regarding the fact that our data is pretty much never actually normally distributed when working with ion readout data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm...I'm not sure that I have much intelligent to say here (but that probably mainly speaks to my limited knowledge of statistics) beyond "if the errors are well-approximated by normal then the Chi2 p-value is probably a decent estimator for the fit significance."

A couple of other comments here:

  1. I was thinking of perhaps adding an additional get_significance (returns a number between 0 and 1) method to FitBase which would default to returning the Chi Squared p-value. That leaves the option open for people to do something cleverer if desired
  2. I was also thinking of splitting FitBase.fit into two parts: a fit initialisation part and a part that does the fitting. The latter would default to least-squares fitting but allows people to do some other form of parameter estimation if desired.
  3. I was thinking of also letting ExportedResult take a result channel and a fit significance parameter. If the fit significance is above the threshold then the linked dataset is updated. This could also be done through a post-fit call back inside the Analysis class

from concurrent.futures import ProcessPoolExecutor
from pyqtgraph import SignalProxy
from qasync import QtCore
from typing import Any, Dict
from ...utils import FIT_OBJECTS
from ...experiment import FitDescription
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the discussion thread, I'd be keen to keep the clean separation between experiment and other packages. experiment is allowed to import ARTIQ, including the whole dependencies on the coredevice stuff (pulling in the compiler dependencies like llvmlite, etc.), whereas the rest isn't (well, except for the artiq.{dashboard, gui} for the dashboard, and artiq.applets.simple for the applet.

I'm not opposed to documenting the structure of the metadata (schemata/…) more explicitly through dataclasses, but imho those should be in some sort of extra package. Currently, ndscan.utils is the dumping ground for shared things, but for the schema stuff, an extra package may be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I wonder if it's worth having a schemata package then?

@hartytp
Copy link
Contributor Author

hartytp commented Aug 15, 2022

Because the current ndscan code is written in a way that only allows one analysis type at present (otherwise we hit the warning I posted above), […]

I'm not sure whether this is based on a misunderstanding of the architecture, or I am just misunderstanding what you mean here. The point of this is precisely to accomodate other online analysis types (such as ones based on custom code loaded at runtime, as proposed here), but of course, the applet somehow needs to know how to deal with them. Since the design of ndscan is such that the only interface between experiment and applets or other postprocessing code is through the well-defined schema and associated dataset contents (doing anything different would pretty much be going against the ARTIQ model), there needs to be some registry that maps string descriptions to the code handling this on the applet side. The if statement (+ fallthrough warning) you linked is precisely that registry.

For instance, there is no immediate need for an oitg.fitting compatibility layer; we can just keep the old named_fit online analyses in place and add all the new hotness (custom annotations/parameters/…) for an alternative type.

Thanks. I think it's a little of me misunderstanding the architecture and a little of me not expressing myself clearly.

I agree that other analysis types can be supported through the current design in principle. But, it's not something that a user can easily do at present since it needs modification to ndscan (as you point out, the applet-side code needs to know what to do with them, so the options end up having to be baked into ndscan).

@hartytp
Copy link
Contributor Author

hartytp commented Aug 15, 2022

Thinking about what we do with this PR...

I'm not opposed to documenting the structure of the metadata (schemata/…) more explicitly through dataclasses, but imho those should be in some sort of extra package. Currently, ndscan.utils is the dumping ground for shared things, but for the schema stuff, an extra package may be more appropriate.

There is a pretty well-contained piece on starting to implement this. This work isn't required for the rest of this PR; it's just here because I found myself needing to write documentation about schema to understand how things fitted together and a simple dataclass felt like a good way of doing that. Since there isn't an objection to doing this in principle, I'll open a discussion thread to discuss it further.

Since the design of ndscan is such that the only interface between experiment and applets or other postprocessing code is through the well-defined schema and associated dataset contents (doing anything different would pretty much be going against the ARTIQ model), there needs to be some registry that maps string descriptions to the code handling this on the applet side. The if statement (+ fallthrough warning) you linked is precisely that registry.

Other than that, this feels like the big thing we need to decide on: should the functionality added by this PR be implemented by adding another option to the registry (or replacing the old one and adding the compatibility layer as I've opted for here if we feel that's a like less krufty approach)? or, should we add a more flexible interface for things that want to add content to plots?

@hartytp hartytp mentioned this pull request Aug 15, 2022
@hartytp
Copy link
Contributor Author

hartytp commented Aug 21, 2022

Some further thoughts on this...

Validation

I had a look at the NIST scanning framework as prior art.

  • The NIST framework uses a three-step validation process
  • There is both validation on the input data and validation (standard and "strong") on the fitted parameters
  • there is a mini language to describe validation tests

My initial thoughts:

  • There are two separate things we might want to check for here: did we find a good fit to the data; and, does the data we're getting look plausible given our knowledge of the system
  • Conceptually, I don't like the idea of validating either of these based on looking at the fitted parameter values. AFAICT it's generally better to only validate based on a statistical test for quality of fit (such as Chi-Squared). If one has expectations about the data it's probably best to encode those in parameter bounds for the fit. That way, the approach is "let's constrain the parameters as tightly as we can given our prior knowledge of the system, let the code find the best fit it can and perform a statistical test to decide if that's consistent with the data"

Reacting to fit results

  • In the NIST framework, the fit + validation results are passed into a after_fit callback which the user can use e.g. to update datasets
  • One of the things I'd like to see in an ndscan implementation is the ability for ndscan to automatically update datasets. Ideally, either by passing in a dataset name or by passing in a parameter handle and updating the linked dataset. The kind of interface I had in mind here is that the (to be renamed) OnlineAnalysis takes a fit_significance parameter (optional float between 0 and 1) and each (also to be renamed) ExportedResult takes an optional destination parameter (bikeshed names). If the fit_significance is not None and the fit significance (Chi-Squared or whatever) is greater than the specified value we update the dataset values with the fitted parameter value. If the fit significance is below the specified value we should do something sensible by default (probably log a warning rather than raise an exception). This behaviour should be customisable through some sensible means like passing a post_fit callback into the OnlineAnalysis (default is to loop over all the exported results and update them or log the warning).
  • I did consider callbacks per parameter (i.e. specified in the ExportedResult as well as / instead of in the OnlineAnalysis) but I didn't see a use case for that. Easy to add this kind of thing later so leave out of any initial implementation

Split fit from model

  • this is something I'm a bit on the fence about, but I'm beginning to wonder whether it makes sense to separate the fitting class from the fitted model. Inheriting the model from the fit class doesn't really add anything useful anyway
  • By splitting them we make it easy for the user to implement something with whatever statistics they want
  • The fit class then exposes (among other things) a fit method and a fit_significance property. We'll provide a least-squares implementation where fit_significance comes from Chi-squared but it's then easy to implement a MLE fit for binomial data or whatever one wants.
  • The online analysis then takes in both fit_class and fit_model_class parameters (generally the user only touches the latter).

Anyway, as David points out, mostly this is stuff I can play around with by implementing a new analysis class so I'll do that on our fork. The one thing we need to agree here is how to handle things on the applet side.

@hartytp
Copy link
Contributor Author

hartytp commented Aug 21, 2022

Oh, one other comment here

  • I want to be able to write a reusable ExpFragment that collects some data and fits the result. I want to be able to use this fragment for multiple purposes. For example: writing a servo, manual debugging, embedding in some more complex experiment as part of a subscan
  • Each of these use cases may require modifications to how the analysis is configured. e.g. different parameter bounds, doing different things with the fitted data (the servo will want to update datasets but the other use cases probably just want to push the fitted parameter value to ResultChannels)
  • To do that with something like the current Fragment \ OnlineAnalysis implementation, one would need to override get_default_analysis and modify the results which feels quite unergonomic.
  • I haven't put too much thought into this yet, but I'm beginning to wonder if something like ExpFragment.setattr_analysis would be useful. The class attributes would be objects which provide a simple interface designed to make it easy to modify fit properties. The default get_default_analysis implementation would then group these together.
  • This isn't something which requires any modification to ndscan. I'm commenting here to provide a bit more context about how I see the use cases evolving...

@hartytp
Copy link
Contributor Author

hartytp commented Aug 31, 2022

Thanks for the feedback everyone. Here is how I suggest we proceed:

Fitting library

On reflection it makes more sense for this to live outside of ndscan. I've tidied up the fitting code here and moved it into its own library. I'll release it publicly and post a link here shortly.

Interfaces

It sounds like there is consensus that it's worth having a more concretely defined interface between the experiment-side and the applet-side of the code is useful. I have some thoughts about how this could look and will spin up a proof of concept PR. One thing I'd like is to build in explicit support for custom implementations of the interface classes (Analysis, Annotation, etc). That way the kind of functionality I wanted to implement here can be done entirely in user-code without modifications to ndscan.

Fitting framework

The original aim of this PR was to extend the current ndscan online fitting functionality in various ways: use the cleaned up fitting code; allow online fits with user-defined fit functions; automatically create result channels; validate fits and act on results (e.g. pushing results to linked datasets, call-backs for fit failure); support fits with multiple y channels (e.g. multi-ion data); etc.

Another thing on my radar is a potential extension of ExpFragment to make it easier for derived classes to modify analyses defined by their parents. For example, a base experiment may want to define an analysis and just push the results to result channels. A servo experiment which derives from this experiment may want to also push the fit result to the dataset associated with the scanned parameter. This requires the servo experiment class to be able to modify the behaviour of the base experiment's analysis class. I'd like a simple framework that allows this kind of thing to be done ergonomically and with minimal boilerplate.

Once the work on fitting code / interfaces is done, this becomes something that can be done entirely in user code. I think the best path forward might be for me to start by getting something up and running in our fragment library. Once we're basically happy with that we can have a discussion about where this falls between: not of general interest, should stay in our codebase; general interest but not suitable for merging into ndscan (e.g. could publish a small ndscan_extensions library for this kind of thing); general interest, suitable for merging into ndscan so we open a PR.

@hartytp hartytp closed this Aug 31, 2022
@hartytp hartytp mentioned this pull request Aug 31, 2022
@hartytp
Copy link
Contributor Author

hartytp commented Sep 3, 2022

@dnadlinger what would you think about a minimal change here to allow custom applet-side analyses. Something like

diff --git a/ndscan/plots/model/__init__.py b/ndscan/plots/model/__init__.py
index e9d5e16..dcf28f0 100644
--- a/ndscan/plots/model/__init__.py
+++ b/ndscan/plots/model/__init__.py
@@ -25,6 +25,8 @@ situations.)

 import logging
 import numpy
+import importlib
+import inspect
 from qasync import QtCore
 from typing import Any, Callable, Dict, List, Optional
 from .online_analysis import OnlineNamedFitAnalysis
@@ -164,6 +166,24 @@ class SinglePointModel(Model):
         raise NotImplementedError


+def import_class(module_name: str, class_name: str):
+    """
+    Imports a named class from a module in the python path or raises an exception.
+    """
+    try:
+        module = importlib.import_module(module_name)
+    except ImportError:
+        raise ValueError(f'Cannot import module "{module_name}"')
+
+    module_classes = [
+        name for name, obj in inspect.getmembers(module) if inspect.isclass(obj)
+    ]
+    if class_name not in module_classes:
+        raise ValueError(f'Class "{class_name}" not in module "{module_name}"')
+
+    return getattr(module, class_name)
+
+
 class ScanModel(Model):
     points_rewritten = QtCore.pyqtSignal(dict)
     points_appended = QtCore.pyqtSignal(dict)
@@ -251,6 +271,11 @@ class ScanModel(Model):
             kind = schema["kind"]
             if kind == "named_fit":
                 self._online_analyses[name] = OnlineNamedFitAnalysis(schema, self)
+            elif kind == "custom":
+                module_name = schema["module_name"]
+                module_class = schema["module_class"]
+                cls = import_class(module_name, module_class)
+                self._online_analyses[name] = cls(schema, self)
             else:
                 logger.warning("Ignoring unsupported online analysis type: '%s'", kind)

Edit: also needs custom annotation item handling in

if a.kind == "computed_curve":

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants