-
Notifications
You must be signed in to change notification settings - Fork 16
SHRIMP: Sub ExtractGroup
This documentation has been (temporarily) abandoned incomplete as of 2018-04-27, because it is not clear what effect (if any) these calculations can have on calibration constants. This subroutine and its sub-subroutines will be revisited at the time it becomes necessary to deal with Grouping of Sample analyses.
The following is a generalised subroutine, so the inputs and outputs have generalised names, and the identities of the specific inputs and outputs are defined when the function is called. The subroutine is used and reused by SQUID 2.50 in at least two different contexts during a 'typical' data-reduction:
- An iteration of outlier detection in calibration-constants (Std = StdCalc = TRUE)
- Extraction of a statistically coherent sub-population from a set of Grouped-sample unknowns (Std = StdCalc = FALSE)
In both contexts, it is possible to re-invoke ExtractGroup in 'Redo' (RedoOnly) mode, which essentially iterates the calculation taking into account any manual (user-defined) inclusion and exclusion of data-rows, in the wake of the original, automated calculation.
This subroutine extracts a statistically coherent age group (i.e. one with probability of fit >= Mprob), and (in Excel-speak) constructs its range addresses, as well as placing the results on the active sheet. (The SQUID 2.50 VBA code also constructs a weighted-average chart inset, but that aspect of its function is not documented here.)
This procedure was probably written long ago; once again, it makes extensive use of Ranges in contexts where newer code-modules, executing similar operations, have explicitly defined and utilised double-precision arrays. VBA code dealing with arrays tends to be much more intelligible than that dealing with spreadsheet-ranges, but the latter is what we've got.
I have documented this subroutine in detail, because it is also invoked when assessing populations of sample/unknown analyses. Its functionality is very useful to analysts trying to extract some sort of "rock age" from a high-n dataset complicated by lots of inherited zircon, Pb loss, etc.
ExtractGroup Std, Mprob, t, RedoOnly, StdCalc, AgeResult, TypeCol, DpNum
Std: A Boolean specifying whether the subroutine has been invoked on a StandardData sheet (TRUE) or not (FALSE). The former case encompasses invocation as part of WtdMeanAcalc; the latter as part of the grouped-sample "Group Me" process.
Mprob: Double-precision value (0 <= Mprob <= 1) defining the minimum (threshold) value for probability of equivalence required to satisfy the test of "statistical coherence" in the extracted group. In the 'first-pass' calculation in "Group Me", Mprob = 0.05 is a common setting. In other contexts (i.e. first-pass WtdMeanAcalc for calibration constants, and in RedoOnly mode), it is set to 0.
t: In Excel, the 2-column cell-range (in practice, a double-precision array) containing the input values (first column) and uncertainties (second column). No explicit assumptions are made about whether the uncertainties are (1sigma or 2sigma) or (percentage or absolute).
RedoOnly: Boolean input indicating whether this invocation of ExtractGroup is being run 'from scratch'/'first-pass' on the input cell-range (RedoOnly = FALSE), or is a 're-run' of an pre-existing weighted-mean calculation (RedoOnly = TRUE) aimed at investigating the effect of user-defined rejections and/or inclusions of data-rows since the previous iteration of ExtractGroup. DEFAULT value = FALSE.
StdCalc: Boolean input indicating whether ExtractGroup has been invoked on Reference Materials/StandardData calibration-constant value-uncertainty pairs (TRUE) or not (FALSE). In all invocations, its value is set (either explicitly or implicitly) equal to the mandatory Boolean 'Std' above. DEFAULT value = FALSE.
AgeResult: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me.
TypeCol: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me. Possibly unused in the final implementation of the routine?
DpNum: For StandardData calibration-constants only (Std = StdCalc = TRUE), the integer value indicating whether ExtractGroup is being invoked for the 'primary' daughter-parent ratio (DpNum = 1) or the alternate ratio (DpNum = 2). DEFAULT value = 1. Its values are based on our Perms, as follows:
- Perm1 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = NOT APPLICABLE
- Perm2 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = DpNum = 2
- Perm3 calibration constants: 206Pb/238U = NOT APPLICABLE; 208Pb/232Th = 1
- Perm4 calibration constants: 206Pb/238U = DpNum = 2; 208Pb/232Th = DpNum = 1
Values of type Boolean
NoReject, NoUPbConstAutoReject, pbCanDriftCorr, pbU, pbTh, [Task]Switch.DirectAltPD
Values of type Integer
c,DauParNum, ErN, h, i, j, LargeErRegN, Npts, Nrej, NtotRej, NumDauPar, r, rw1, rwn
Values of type Double
ExtPerr, ExtPtSigma, Lambda, MedianEr, MSWD, Nmadd, Prob, pscLm2, pscLm8, StdThPbRatio, StdUPbRatio, WtdMean, WtdMeanErr
Values of type String
s, Ele, t1, t2, t4, t5
Vectors of type Integer
LargeErRej, Rejected
Vectors of type Double
Acol, Aecol, Arange, Aerange, ErrVals
Arrays of type Double
Adatrange
Arrays of type Variant/Mixed
wW
The function commences with some definitions, and also some cleanup, remembering that this subroutine can be executed in both 'first-pass' and 'redo' mode:
--Reinitialise AgeResult:
If IsMissing( AgeResult ) = FALSE --i.e. AgeResult exists
AgeResult = 0
End If --IsMissing( AgeResult )
--Define Boolean DoAll:
If RedoOnly = TRUE
DoAll = FALSE
Else
DoAll = TRUE
End If --RedoOnly = TRUE
--Define string version of DpNum:
If Std = TRUE
Dp = fsS(DpNum) --fsS converts double-precision number to string
Else
Dp = "" --Excel-speak for "NULL string"
End If --Std = TRUE
Recalling that the input data t is defined as a single contiguous range (on a spreadsheet), it is necessary to define where on the spreadsheet the range is located, as well as the locations of the important components of the range:
With t
ar1 = t.Row --address of the FIRST row of the range
ArN = t.Rows.Count --number of rows in the range
ArL = ar1 + ArN - 1 --derived address of the LAST row of the range
arc = t.Column --address of the LEFT (value/age) column of the range
End With
Ludwig also checks to see whether the WtdAv being calculated is a 207Pb/206Pb date or not. This matters because other types of date (i.e. 206Pb/238U and 208Pb/232Th) usually need the external error of the Reference material data added in quadrature at the end, whereas a 207Pb/206Pb date does not. Ludwig does this simply by examining the string comprising the column-header immediately above the first column of the two-column array (which has address [ar1 - 1, arc]); hopefully Squid3 will be a bit more robust!
If InStr( Cells[ar1 - 1, arc], fsS( "207Pb/206Pb" ) ) > 0 --Instr = "In-String"
--Looks in cell above leftmost column for the specified string. If TRUE then found
Rad76age = TRUE
Else
Rad76age = FALSE
End If
Next, Ludwig defines vectors of length ArN for the input values and their uncertainties... and he chooses to use different vector-names depending on whether the values are Unknowns (X and Xerr), or Reference Materials (i.e. calibration-constants: rX and RxE). He also defines a fifth vector, vOK, which comprises Booleans as the result of validating each row of the input.
ReDim X[1 To ArN], Xerr[1 To ArN], rX[1 To ArN], RxE[1 To ArN], vOK[1 To ArN]
Then he rechecks the addresses of the header-rows, the first data-row and the last data-row on each of the StandardData and SampleData sheets (because they do have different addresses in Excel), using "-Std" as a numeric index to discriminate between the two cases. (Std is a Boolean, but in VBA, TRUE = -1 and FALSE = 0; Ludwig exploits this logical-numerical relationship for array-addressing and even arithmetic in some places, even though this is not good practice!). Ludwig also finds and sets the cell-range 'Hours' corresponding to the input ExtractGroup range, by searching the header-row for the string "Hours", and selecting the vector directly beneath. Next is some addressing of the column-header row, the first row of data and the last row of data on any given worksheet, bearing in mind that the addresses of these rows can differ between the StandardData and SampleData sheets:
HdrRowGrp = flHeaderRow[-Std] --recalling Std = TRUE has numeric value -1, FALSE = 0
FirstGrpDatRw = plaFirstDatRw[-Std]
LastGrpDatRw = plaLastDatRw[-Std]
FindStr "Hours", , piHoursCol, HdrRowGrp --"Find String"
--piHoursCol = index of column hosting string "Hours"
Set Hours = frSr(FirstGrpDatRw, piHoursCol, LastGrpDatRw)
--Defines Hours as the cell-range in beneath string "Hours", corresponding to full
--range of ExtractGroup input data-rows.
Now validate the input data, and store the validation results in vOK. As far as I can tell, this validation is only enacted on 'first-pass' when Extracting a Group for weighted mean date calculation on unknowns; presumably if it is a manual Redo of a first-pass calculation, or a calculation on Reference Materials calibration constants, then the input has been pre-validated (either by the subroutine, or by a different one).
Then populate the arrays rX and RxE (or X and Xerr), according to the Boolean inputs of ExtractGroup, as well as the presence of numeric data in the 'values' and 'errors' columns of range t:
k = 0
N = ArN
If (RedoOnly = TRUE) OR (StdCalc = TRUE)
SetArrayVal TRUE, vOK --i.e. set all values of vector vOK to TRUE
--Presumably these two Booleans indicate data has been pre-validated.
Else
For i = 1 To ArN
tB = TRUE
vOK[i] = FALSE
For j = 1 To 2
tB = tB AND (IsNumeric(t[i, j] = TRUE) AND (IsEmpty(t[i, j]) = FALSE)
Next j
If tB = TRUE --i.e. value and error for input-row both numeric
k = 1 + k
If StdCalc = TRUE --dealing with Ref Material calibration-constants
rX[k] = t[i, 1]
RxE[k] = t[i, 2]
Else --dealing with a weighted mean date for a Grouped-sample Unknown
X[k] = t[i, 1]
Xerr[k] = t[i, 2]
End If --StdCalc = TRUE
vOK[i] = TRUE
End If --tB = TRUE
Next i
Nn = k
BadCt = N - Nn
OkCt = Nn
End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)
If N < 2
Exit Sub --guards against largely non-numeric input-data
End If --N < 2
Having isolated the numeric subset of the input data, how the code proceeds depends on what it is trying to do. If StdCalc = TRUE, then the business of identifying and rejecting outliers is largely covered elsewhere (i.e. in WtdMeanAcalc), and if RedoOnly = TRUE, then an attempt at extracting a group has already been made, and the code is merely assessing the effect of modifications manually applied by the user (i.e. rejections/unrejections as defined in SQUID 2.50 by the presence/absence of Strikethrough font) since the last ExtractGroup attempt.
If (RedoOnly = TRUE) OR (StdCalc = TRUE)
Nn = 0
j = 0
k = 0
ReDim rX[1 To ArN] --Values column, for Stds. Not sure this is necessary?
--No equivalent ReDim on RxE!!
For i = 1 To ArN
Set Tt = t[i, 1] ----range Tt is the 'value', row-by-row, in Excel cell-range t.
--This Set enables access to all the VBA-controllable properties of that cell.
If (Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE)
--functionally equivalent to the tB-based condition in previous block.
If Val(Tt) > 0 AND (Tt.Font.Strikethrough = FALSE) --to avoid divide-by-zero
--when Tt = 0, and also checks that Tt has not previously been excluded from
--arithmetical consideration, either automatically or user-manually
Nn = 1 + Nn
rX[Nn] = Tt
RxE[Nn] = t[i, 2] * Tt / 100 --renders calibration-constant uncertainties
--as ABSOLUTE, to unify with weighted mean dates on Unknowns, where the
--uncertainties are always absolute.
--NOTE that this code can happen when RedoOnly = TRUE, even when StdCalc =
--FALSE, which seems very odd to me. In the previous loop, rX and RxE are
--populated only when StdCalc = TRUE.
End If --Val(Tt) > 0 AND (Tt.Font.Strikethrough = FALSE)
End If --(Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE)
Next i
OkCt = Nn
BadCt = ArN - OkCt
ElseIf (StdCalc = FALSE) --this ElseIf clause looks redundant; I think you can only get
--to it if (RedoOnly = FALSE) AND (StdCalc = FALSE) in any case, so I think it should
--simply read "Else". Anyway, entering this ElseIf means we are dealing with a
--first-pass calculation of a weighted mean date for a Grouped-sample unknown.
--FindCoherentGroup starts the heavy lifting:
FindCoherentGroup N, Nn, BadCt, BadRwIndx, X, Xerr, rX, RxE, MeanVal, MeanErr, MSWD,
Prob, Mprob, pdMinFract, OkVal --subroutine documented separately
If OkVal = FALSE --as dictated by FindCoherentGroup, then write string:
--"No coherent age group" 2 rows beneath final data-row, in the first column:
CFs LastGrpDatRw + 2, arc, "No coherent age group"
Exit Sub
End If --OkVal = FALSE
Else --I think this is superfluous/inaccessible; see above
If Nn > 2
OkVal = TRUE
Else
OkVal = FALSE
End If --Nn > 2
End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)
There follows a block of text aimed primarily at formatting (I think). Picking it up again at the point where a ±2sigma (absolute) error-column is inserted adjacent to (and to the right of) the ±1sigma (absolute) error column:
Code documentation temporarily abandoned at this point 2018-04-27
If StdCalc = TRUE
Er2SigCol = 4 + t.Column
--4 columns to the right of the calib. const. 'values' column, because
--the intervening columns are calib. const. %err, Age (Ma), and
--Age (Ma) ±1sigma, and Er2SigCol is the address for Age (Ma) ±2sigma.
Else --all other cases
Er2SigCol = 2 + t.Column
--2 columns to the right of the 'values' column, because the only
--intervening column is ±1sigma, and Er2SigCol is the address for ±2sigma.
End If
If RedoOnly = FALSE And StdCalc = FALSE And Cells[HdrRowGrp, Er2SigCol] <> ""
For i = FirstGrpDatRw To LastGrpDatRw
Cells[i, Er2SigCol] = "=2*" & Cells[i, Er2SigCol - 1].Address(0, 0)
Next i
End If
If Nn = 0
MsgBox "Can't reduce this data."
End
End If
OkCt = 0
BadCt0 = BadCt
BadCt = 0
Tct = t.Rows.Count --can't see how this differs from ArN
If BadCt0 > 0
ReDim TmpBadRwIndx[1 To BadCt0]
End If
NokGrps = 0
NbadGrps = 0
StartNewGrp = FALSE
For j = 1 To Tct
OkCt0 = OkCt
For i = 1 To UBound(rX) --i.e. length (rX)
Set Tt = t[j, 1]
If IsNumber(Tt.Text) = TRUE And Tt.Font.Strikethrough = FALSE
If Tt <> 0 And Tt = rX[i]
Set Ra1 = Tt[1, 0]
Set Ra4 = Tt[1, 1]
Set Ra5 = Tt[1, 2]
If StdCalc = TRUE
Set Ra2 = Tt[1, 3]
Set Ra3 = Tt[1, 5]
Else
Set Ra2 = Tt[1, 1]
Set Ra3 = Tt[1, 3]
End If
Set Ra6 = Range(Ra4, Ra5)
Set Ra7 = Range(Ra1, Ra4)
Set ur1 = Union(Ra1, Ra2)
With Ra6
.Font.Strikethrough = False
End With
OkCt = 1 + OkCt
OK[OkCt] = rX[i]
OKrwIndx[OkCt] = j
If StdCalc = TRUE
OkVals[OkCt, 1] = Ra1
OkVals[OkCt, 2] = Ra4
OkErrVals[OkCt] = Ra5 --1sigma absolute errors
End If
OkAgeVals[OkCt, 1] = Ra1
OkAgeVals[OkCt, 2] = Ra2
OkAgeErrVals[OkCt] = Ra3 --2sigma absolute errors
If OkCt = 1
NokGrps = 1
ReDim OkAgeAddr[1 To 1], OkAgeErAddr[1 To 1], OkBreak[1 To 1]
Set OKspots = Ra7
Set OkErrs = Ra5
Set OkAge = ur1
Set OkAgeErrs = Ra3
ElseIf StartNewGrp = TRUE
NokGrps = 1 + NokGrps
ReDim Preserve OkAgeAddr[1 To NokGrps], OkAgeErAddr[1 To NokGrps],
OkBreak[1 To NokGrps]
StartNewGrp = FALSE
Set OkAge = ur1
Set OkAgeErrs = Ra3
Else
Set OkAge = Union(OkAge, ur1)
Set OkAgeErrs = Union(OkAgeErrs, Ra3)
End If
Set OKspots = Union(OKspots, Ra7)
Set OkErrs = Union(OkErrs, Ra5)
OkAgeAddr[NokGrps] = OkAge.Address
OkAgeErAddr[NokGrps] = OkAgeErrs.Address
OkBreak[NokGrps] = j
Exit For
End If --Tt <> 0 And Tt = rX[i]
End If --IsNumber(Tt.Text) = TRUE And Tt.Font.Strikethrough = FALSE
Next i
If OkCt = OkCt0
BadCt = 1 + BadCt
BadRwIndx[BadCt] = j
End If
Next j
If BadCt > 0
k = Max( BadCt, BadCt0 )
ReDim Preserve BadRwIndx[1 To k], TmpBadRwIndx[1 To k]
ReDim TmpBadRwIndx[1 To k]
NbadGrps = 0
StartNewGrp = FALSE
For i = 1 To BadCt
TmpBadRwIndx[i] = BadRwIndx[i]
Next i
k = 0
For i = 1 To BadCt
m = TmpBadRwIndx[i]
With Range( t[m, 1], t[m, 2] )
.Font.Strikethrough = True
.Interior.Color = vbYellow
End With
Set Tt = t[m, 1]
If IsNumber(Tt.Text) = TRUE
If Val(Tt) <> 0
k = 1 + k
Set Ra1 = Tt(1, 0)
If StdCalc = TRUE
Set Ra2 = Tt[1, 3]
Set Ra3 = Tt[1, 5]
Else
Set Ra2 = Tt[1, 1]
Set Ra3 = Tt[1, 3]
End If
Set Ra7 = Union(Ra1, Ra2)
BadRwIndx[k] = m
BadAgeVals[k, 1] = Ra1
BadAgeVals[k, 2] = Ra2
BadAgeErrVals[k] = Ra3
If k = 1
NbadGrps = 1
ReDim BadAgeAddr[1 To 1], BadAgeErAddr[1 To 1], BadBreak[1 To 1]
Set BadAge = Ra7
Set BadAgeErrs = Ra3
ElseIf StartNewGrp = TRUE
NbadGrps = 1 + NbadGrps
ReDim Preserve BadAgeAddr[1 To NbadGrps], BadAgeErAddr[1 To NbadGrps],
BadBreak[1 To NbadGrps]
StartNewGrp = TRUE
Set BadAge = Ra7
Set BadAgeErrs = Ra3
Else
Set BadAge = Union(BadAge, Ra7)
Set BadAgeErrs = Union(BadAgeErrs, Ra3)
End If
BadAgeAddr[NbadGrps] = BadAge.Address
BadAgeErAddr[NbadGrps] = BadAgeErrs.Address
BadBreak[NbadGrps] = m
End If --Val(Tt) <> 0
End If --IsNumber(Tt.Text) = TRUE
Next i
BadCt = k
If NbadGrps > 0
If BadBreak[NbadGrps] = 0
BadBreak[NbadGrps] = m
End If
End If
If BadCt > 0
ReDim Preserve BadRwIndx[1 To k]
End If
End If --BadCt > 0