-
Notifications
You must be signed in to change notification settings - Fork 1
/
introduction.tex
67 lines (60 loc) · 3.77 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
\section{Introduction} % 1 page
\label{sec:introduction}
%A common problem in data analysis is to determine the parameters of
%a narrow signal which occurs on a wide, smoothly varying background.
%In high energy physics, this is often achieved using a maximum likelihood technique~\cite{ref:Fisher01011922}
%in which separate parametric models for the signal and background processes are constructed and
%``fit'' to the data.
%If, however, the shape of the background is not known {\it a priori}, then there will be
%some uncertainty in the signal parameters resulting from the uncertainty in
%the function used. This issue is exacerbated in the case of a small signal to
%background ratio.
A common problem in data analysis is that the underlying physics parameters of a model, or components of it, which is used to describe a dataset
are not known. In high energy physics, determination of signal parameters is
often achieved using a maximum likelihood technique~\cite{ref:Fisher01011922}
in which parametric models for the signal and background processes are constructed and
``fit'' to the data. However in certain circumstances the exact parametrisation, not just the parameter
values, of the underlying models is not {\it a priori} known. Consequently, there is some uncertainty in the signal parameters which results from the uncertainty in the function used.
A common approach to assess this systematic uncertainty is to fit various different plausible functions and
determine the spread of the values of the parameters of interest when using these functions.
However, these methods tend to have some degree of arbitrariness and so
a new approach is discussed in this paper.
This new method was developed as part of the analysis of data at the CMS experiment
following the discovery of the Higgs
boson~\cite{ref:introduction:atlasdis,ref:introduction:cmsdis}.
It was applied to the analysis of Higgs decays to two photons, which
results in a narrow signal on a large
background~\cite{ref:introduction:legacy}.
The method presented is less
arbitrary and treats the uncertainty associated with the
background parameterisation in a way
which is more comparable with the treatment of other
uncertainties associated with the measurement; the choice of background
function results in a systematic uncertainty
which is handled as a nuisance parameter~\cite{ref:intro:nusiances}.
There are two major new components to this approach, namely the method for
treating the choice of function as a nuisance parameter, and how to compare
functions with different numbers of parameters.
The concept is described in Section~\ref{sec:concept}.
The application of the method to functions with the same number of parameters
is described in Section~\ref{sec:functions} and to functions with different
numbers of parameters in Section~\ref{sec:correction}. Further discussion on
the method, namely its practical application to the real-world problem of
the Higgs boson measurements, is given in Section~\ref{sec:conclusions},
along with the conclusions.
Within this paper, twice the negative of the logarithm of the likelihood ratio
function is denoted by \nll. The data are binned and the
likelihood model used for each bin is the Poisson likelihood ratio to the best
possible likelihood given the observed data, i.e.
\begin{equation}
%{\rm \nll}_i = \nu_i - n_i + n_i \ln\left(\frac{n_i}{\nu_i}\right),
\nll = 2%\cdot
\sum_{i} \nu_i - n_i + n_i \ln\left(\frac{n_i}{\nu_i}\right),
\label{eqn:introduction:def2NLL}
\end{equation}
where for the $i^{\rm th}$ bin,
$n_{i}$ is the observed and $\nu_{i}$ is the expected number of events
given a particular background model.
For the purposes of fitting and generating datasets, the statistical package
``RooFit'' is used throughout this paper~\cite{ref:roofit}.
%FREQUENTIST UNLESS OTHERWISE STATED (ALSO SEE DISCUSSION)?