Skip to content

SHRIMP: Sub WtdLinCorr

sbodorkos edited this page Jun 21, 2016 · 4 revisions

SQUID 2.50 Sub: WtdLinCorr

[The following is a generalised function, used and reused by SQUID 2.50 on at least a couple of different occasions during a 'typical' data-reduction. As such, the inputs and outputs of the functions have generalised names. The identities of the specific inputs and outputs are defined when the function is called.]

The function calculates EITHER the error-weighted linear regression of Dodson-interpolated nuclide ratios (or Task equations) vs time, with the regression-error calculated at analysis midtime OR the error-weighted mean of Dodson-interpolated nuclide ratios (or Task equations) and its error, independent of time.

Note that in BOTH cases, the error-weighting relates solely to the y-errors (x-errors associated with timestamps, if applicable, are assumed zero throughout), and that in BOTH cases, the calculations use index-adjacent ratio error-correlations as described by Ludwig (2009; http://dx.doi.org/10.1016/j.chemgeo.2009.07.004).

Once the value, its error, and its MSWD and probability of fit have been determined, they are tested against a threshold for statistical coherence (probability of fit > 0.1). If this test is failed, each of the constituent data-points are deleted (sequentially, one at a time) and the effect of each rejection on the MSWD derived from the remaining points is assessed. If a deletion decreases the MSWD of the remaining points by a specified factor (which depends on the number of data-points in the input, and the type of calculation specified) AND statistical coherence is attained, then the (single) rejection is permitted. If the number of data-points in the input is relatively large (>8 for weighted average, >9 for linear regression), then a search for a second rejection is permitted.

Usage

WtdLinCorr(Avg1LinRegr2, N, Y, SigRho, MSWD, Probfit, WLrej, Intercept, InterceptSigma, Bad, Slope, SlopeSigma, CovSlopeInter, X)

Mandatory variables

Avg1LinRegr2: This is a 'integer Boolean' input specifying the type of calculation to be performed (note the integer value assigned is used in some of the following equations). As the name suggests, permitted values are 1 (for calculation of a time-independent error-weighted average and its error) or 2 (for calculation of an error-weighted linear regression vs time, with the regression-error calculated at analysis midtime).

N: Integer defining the number of Dodson-interpolated data-points submitted to the calculation, as an input.

Y: A vector array of length N, containing the y-values of the Dodson-interpolated data-points submitted to the calculation, as an input.

SigRho: A N x N array defining the sigma-rho matrix associated with the vector Y, as an input. The diagonal elements are the sigma values associated with each Y value, and the (symmetric) immediate off-diagonal elements are the correlation coefficients between time-adjacent Y values. All other elements are zero; more details below.

WLrej: Integer (= 0 at start of each 'ratio of interest' and each analysis) incremented each time a valid rejection is identified.

Intercept: The calculated 'mean' y-value calculated (EITHER error-weighted linear regression to analysis midtime (NOT the y-intercept at x = 0!) OR as error-weighted average independent of time), as an output.

InterceptSigma: The calculated 1-sigma absolute error of the Intercept, as an output.

Bad: Boolean for error-trapping (=FALSE at start).

MSWD: The calculated Mean Square of Weighted Deviates associated with the Intercept, as an output.

ProbFit: The calculated probability of fit associated with the MSWD and (N - Avg1LinRegr2) degrees of freedom, as an output.

Optional variables

[These apply ONLY when Avg1LinRegr2 = 2]
X: A vector array of length N, containing the x-values (e.g. time-stamps) of the data-points, as input to the calculation.

Slope: The calculated slope of the linear regression, as an output.

SlopeSigma: The calculated 1-sigma absolute error of the Slope, as an output.

CovSlopeInter: The calculated covariance of the Slope and the y-intercept (i.e. at x = 0) of the linear regression, as an output.


Definition of variables

Value of type Boolean
Bad, LinReg

Values of type Integer
i, j, MaxRej, MinIndex, Nw, Pass, WLrej

Values of type Double
f, MaxProb, MinProb, MinMSWD, MswdRat, MswdRatToler

Vector of length N (addressed 1 to N) comprising values of type Boolean)
Rej

Vectors of length N (addressed 1 to N) comprising values of type Double
x1, x2, y1, y2, SigmaY

Vectors of length N + 1 (addressed 0 to N) comprising values of type Double
InterW, InterSigmaW, MswdW, Prob, ProbW, SlopeW, SlopeSigmaW, CovSlopeInterW

Arrays of size N x N (addressed [1 to N, 1 to N]) comprising values of type Double
SigRho1, SigRho2


'Entry test'

In theory, WtdLinCorr should never be invoked unless N > 1. But in addition to this, Ludwig specified that linear regression to analysis midtime calculation is not permitted if N < 4, regardless of what is specified in the user-preferences (i.e. if the user has requested linear regression to analysis midtime, but N = 2 or 3, then the calculation is performed as error-weighted average instead). These scenarios are best handled by logic preceding the call of WtdLinCorr (and that is indeed how the code continues immediately beyond the current extent of my documented Step 3), but the WtdLinCorr routine itself contains a check, as a (theoretically obsolete) backup.

If (Avg1LinRegr2 = 1 and N < 2) Or (AvgLinRegr2 = 2 and N < 4) Then
  --[error-message: Not enough data-points for requested calculation]    
  Exit Sub  
End If  

WtdLinCorr

The first step is to define the LinReg Boolean, and to select the threshold value of MswdRatToler (which is the factor by which MSWD must improve following the deletion of a point, for that deletion to be considered a valid improvement to the data). For LinReg:

Bad = False  

If Avg1LinRegr = 2
  LinReg = True
Else
  LinReg = False  
End If  

(I know that code-blocks like this are significantly more long-winded than necessary, but I am keen to keep my documentation of the code as explicit and human-readable as possible, particularly with respect to Booleans. Ludwig made extensive use of abbreviated, implicit syntax for Booleans, and the result is that sections of the VBA code are very difficult to understand, particularly in combination with logic tests.)

The candidate values for MswdRatToler are hard-wired (see MswdRatList below), with the selected value dependent on N and (for N values at the lower end of the range) Avg1LinRegr2, as follows:

If N > 7  
  MswdRatToler = 0.3  
Else  
  MswdRatList = [0, 0.1, 0.15, 0.2, 0.2, 0.25]  
  MswdRatToler = MswdRatList[N - Avg1LinRegr2]
End If  

Next, establish the maximum number of rejections permitted from the set of Dodson-interpolated values, as a function of both the number of values and the type of calulcation desired. The VBA code verbatim is:

MaxRej = 1 + (N - Avg1LinRegr2) \ 8  

In Visual Basic, it seems that this final term is defined as a division that yields an integer result, with any remainder discarded (https://msdn.microsoft.com/en-us/library/0e16fywh.aspx), i.e.

MaxRej = 1 + rounddown( (N - Avg1LinRegr2) / 8 )  

Also define the minimum probability of fit (above which there is no need to test the effect of deleting each data-point in sequence), and associated parameters:

MinProb = 0.1  
WLrej = 0  
Pass = 0  
MinIndex = -1  

Fill the y1, y2, SigmaY, SigRho1, SigRho2, and (if required) x1 and x2 matrices from the input Y, SigRho, and X:

For i = 1 To N  
  y1[i] = Y[i]  
  y2[i] = Y[i]  
  SigmaY[i] = SigRho[i, i]  --diagonal elements of SigRho  

  If LinReg = True  
    x1[i] = X[i]  
    x2[i] = X[i]  
  End If  

  For j = 1 To N  
    SigRho1[i, j] = SigRho[i, j]  
    SigRho2[i, j] = SigRho[i, j]  
  Next j  
Next i  

This is followed by a curious little manipulation in which all elements of SigmaY less than the median are replaced with the median, for the purpose of defining the diagonal elements of SigRho1 and SigRho2:

f = max( median(SigmaY), 1E-10 )  

For i = 1 To N  
  SigRho1[i, i] = max( SigRho1[i, i], f)  
  SigRho2(i, i) = SigRho1[i, i]  
Next i  

This is followed by quite a long "do... loop", in which the linear regressions (or weighted means) are calculated firstly with all points included, tested against minimum criteria for statistical coherence (MinProb > 0.1), and if that test is failed, the calculation is repeated with each single data-point rejected sequentially. If the deletion of a single data-point improves the statistical coherence of the calculation significantly, one rejection is permitted (and a second is possible if N >= 8 or 9, depending on calculation type):

Do  
  For i = 0 To N  
    If i = 0  
      Nw = N  --first pass calculation includes all points  
    Else  
      DeletePoint LinReg, N, y1, y2, SigRho1, SigRho2, i, x1, x2  
      --Subroutine 'DeletePoint' documented separately  
      Nw = N - 1  
    End If  

    If Nw = 1 And LinReg = False 
      --The VBA code verbatim does not include "LinReg =": I have inferred it,  
      --but it makes sense, and I see no other obvious Boolean candidates.  
      
      ProbW[i] = 1  
      MswdW[i] = 0  
      InterSigmaW[i] = 1  
      InterW[i] = 1  
    ElseIf LinReg = True --perform the linear regression calculation  
      WeightedLinearCorr x2, y2, Nw, SigRho2, SlopeW[i], InterW[i], MswdW[i],...   
        ProbW[i], SigmaSlopeW[i], SigmaInterW[i], CovSlopeInterW[i], Bad  
      --Subroutine 'WeightedLinearCorr' documented separately  
    Else --perform the error-weighted average calculation  
      WtdAvCorr y2, SigRho2, Nw, ...  
        InterW[i], SigmaInterW[i], MswdW[i], ProbW[i], True, Bad  
      --Subroutine 'WtdAvCorr' documented separately  
    End If  

    If i = 0  
      If ProbW[0] > 0.1 --no-rejection calculation is statistically coherent  
        MinIndex = 0  
        MinMSWD = MswdW[0]  
        Exit For --no need to trial sequential rejections 
      End If  
      
      MaxProb = Prob[0]  
    End If  
  Next i  

  If MinIndex = 0 --no-rejection calculation is statistically coherent  
    Exit Do  
  End If  

  MinIndex = 0 --no-rejection calculation 'best' until proven otherwise  
  MinMSWD = MswdW[0]  

  For i = 1 To N  
    MswdRat = MswdW[i] / max( 1E-32, MswdW[0] )
    --Three criteria must be met for a rejection to be permitted:  

    If MswdRat < MswdRatToler And MswdW[i] < MinMSWD And ProbW[i] > MinProb  
      Rej(i) = True  
      WLrej = 1 + WLrej  
      MinIndex = i  
      MaxProb = ProbW[i]  
      MinMSWD = MswdW[i]  
    End If  
  Next i  

  Pass = 1 + Pass  

  If Pass > 0 And (MinIndex = 0 Or Pass = MaxRej Or MaxProb > 0.1)  
    --MinIndex = 0 means EITHER the no-rejection calculation is coherent OR  
    --no single rejection yields enough improvement to be justified.  
    --Pass = MaxRej means the rejection 'ceiling' has been reached.  
    --MaxProb > 0.1 means statistical coherence has been achieved, EITHER  
    --via no-rejection calculation OR within permitted number of rejections.

    Exit Do  
  End If  
  
  --If a rejection IS permitted BUT rejection 'ceiling' not yet reached  
  --AND statistical coherence not yet achieved, try another rejection:  

  DeletePoint LinReg, N, y1, y2, SigRho1, SigRho2, MinIndex, x1, x2  
  N = N - 1  
    
  --Redefine as vectors of length N (addressed 1 to N): x1, y1 
  --Redefine as vectors of length N + 1 (addressed 0 to N): Prob, Xbar  
  --Redefine as an array of size N x N (addressed [1 to N, 1 to N] ): SigRho1  

  For i = 1 To N
    y1[i] = y2[i]  

    If LinReg = True  
      x1[i] = x2[i]  
    End If  
  
    For j = 1 To N - 1
      SigRho1[i, j] = SigRho2[i, j]
    Next j  
  Next i  

Loop  

Assign values obtained from EITHER the WeightedLinearCorr OR the WtdAvCorr subroutine, as follows:

Intercept = InterW(MinIndex)  
SigmaIntercept = SigmaInterW(MinIndex)  
MSWD = MswdW(MinIndex)  
Probfit = ProbW(MinIndex)    

If LinReg = True And MinIndex > 0 --need to check WHY only when MinIndex > 0
  Slope = SlopeW(MinIndex)  
  SigmaSlope = SigmaSlopeW(MinIndex)  
  CovSlopeInter = CovSlopeInterW(MinIndex)  
End If  

If Probfit < 0.05  
  SigmaIntercept = SigmaIntercept * sqrt(MSWD)  

  If LinReg = True  
    SigmaSlope = SigmaSlope * sqrt(MSWD)  
  End If  

End If

End Sub

Clone this wiki locally