Skip to content

SHRIMP: Sub ExtractGroup

sbodorkos edited this page Sep 6, 2019 · 11 revisions

SQUID 2.50 Sub: ExtractGroup

The following is a generalised subroutine, so the inputs and outputs have generalised names, and the identities of the specific inputs and outputs are defined when the function is called. The subroutine is used and reused by SQUID 2.50 in at least two different contexts during a 'typical' data-reduction:

  1. An iteration of outlier detection in calibration-constants (Std = StdCalc = TRUE)
  2. Extraction of a statistically coherent sub-population from a set of Grouped-sample unknowns (Std = StdCalc = FALSE)

In both contexts, it is possible to re-invoke ExtractGroup in 'Redo' (RedoOnly) mode, which essentially iterates the calculation taking into account any manual (user-defined) inclusion and exclusion of data-rows, in the wake of the original, automated calculation.

This subroutine extracts a statistically coherent age group (i.e. one with probability of fit >= Mprob), and (in Excel-speak) constructs its range addresses, as well as placing the results on the active sheet. (The SQUID 2.50 VBA code also constructs a weighted-average chart inset, but that aspect of its function is not documented here.)

This procedure was probably written long ago; once again, it makes extensive use of Ranges in contexts where newer code-modules, executing similar operations, have explicitly defined and utilised double-precision arrays. VBA code dealing with arrays tends to be much more intelligible than that dealing with spreadsheet-ranges, but the latter is what we've got.

I have documented this subroutine in detail, because it is also invoked when assessing populations of sample/unknown analyses. Its functionality is very useful to analysts trying to extract some sort of "rock age" from a high-n dataset complicated by lots of inherited zircon, Pb loss, etc.

Usage

ExtractGroup(Std, Mprob, t, RedoOnly, StdCalc, AgeResult, TypeCol, DpNum)

Mandatory variables

Std: A Boolean specifying whether the subroutine has been invoked on a StandardData sheet (TRUE) or not (FALSE). The former case encompasses invocation as part of WtdMeanAcalc; the latter as part of the grouped-sample "Group Me" process.

Mprob: Double-precision value (0 <= Mprob <= 1) defining the minimum (threshold) value for probability of equivalence required to satisfy the test of "statistical coherence" in the extracted group. In the 'first-pass' calculation in "Group Me", Mprob = 0.05 is a common setting. In other contexts (i.e. first-pass WtdMeanAcalc for calibration constants, and in RedoOnly mode), it is set to 0.

t: In Excel, the 2-column cell-range (in practice, a double-precision array) containing the input values (first column) and uncertainties (second column). No explicit assumptions are made about whether the uncertainties are (1sigma or 2sigma) or (percentage or absolute).

Optional variables

RedoOnly: Boolean input indicating whether this invocation of ExtractGroup is being run 'from scratch'/'first-pass' on the input cell-range (RedoOnly = FALSE), or is a 're-run' of an pre-existing weighted-mean calculation (RedoOnly = TRUE) aimed at investigating the effect of user-defined rejections and/or inclusions of data-rows since the previous iteration of ExtractGroup. DEFAULT value = FALSE.

StdCalc: Boolean input indicating whether ExtractGroup has been invoked on Reference Materials/StandardData calibration-constant value-uncertainty pairs (TRUE) or not (FALSE). In all invocations, its value is set (either explicitly or implicitly) equal to the mandatory Boolean 'Std' above. DEFAULT value = FALSE.

AgeResult: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me.

TypeCol: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me. Possibly unused in the final implementation of the routine?

DpNum: For StandardData calibration-constants only (Std = StdCalc = TRUE), the integer value indicating whether ExtractGroup is being invoked for the 'primary' daughter-parent ratio (DpNum = 1) or the alternate ratio (DpNum = 2). DEFAULT value = 1. Its values are based on our Perms, as follows:

  • Perm1 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = NOT APPLICABLE
  • Perm2 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = DpNum = 2
  • Perm3 calibration constants: 206Pb/238U = NOT APPLICABLE; 208Pb/232Th = 1
  • Perm4 calibration constants: 206Pb/238U = DpNum = 2; 208Pb/232Th = DpNum = 1

Definition of variables - NEEDS UPDATING FOR EXTRACTGROUP

Values of type Boolean
OkVal, Rad76age, tB

Values of type Integer
ar1, arc, ArL, ArN, BadCt, cc, FirstGrpDatRw, HdrRowGrp, i, Indx, j, k, LastGrpDatRw, m, N, Nn, Nok, Nrej, OkCt, Rw

Values of type Double
ExtPtSigma, Mean, MeanErr, MSWD, pdMinFract, Prob, WtdAvg, WtdAvgErr

Values of type String
Dp, p, RejSpots, tmp, tW

Arrays of type Integer
BadRwIndx

Arrays of type Double
rX, RxE, X, Xerr

Arrays of type Variant/Mixed
vOK, wW

Ranges
ColOne, Col2, InpR, OutpR, Tt, Tt2


The function commences with some definitions, and also some cleanup, remembering that this subroutine can be executed in both 'first-pass' and 'redo' mode:

--Reinitialise AgeResult:
If IsMissing( AgeResult ) = FALSE --i.e. AgeResult exists 
  
  AgeResult = 0  

End If --IsMissing( AgeResult ) = FALSE

--Define string version of DpNum:
If Std = TRUE

  Dp = fsS( DpNum ) --fsS converts double-precision number to string

Else

  Dp = "" --Excel-speak for "NULL string"

End If --Std = TRUE 

Recalling that the input data t is defined as a single contiguous range (on a spreadsheet), it is necessary to define where on the spreadsheet the range is located, as well as the locations of the important components of the range:

With t

  ar1 = t.Row --address of the FIRST row of the range
  ArN = t.Rows.Count --number of rows in the range
  ArL = ar1 + ArN - 1 --derived address of the LAST row of the range
  arc = t.Column --address of the LEFT (value/age) column of the range

End With

Ludwig also checks to see whether the WtdAv being calculated is a 207Pb/206Pb date or not. This matters because other types of date (i.e. 206Pb/238U and 208Pb/232Th) usually need the external error of the Reference material data added in quadrature at the end, whereas a 207Pb/206Pb date does not. Ludwig does this simply by examining the string comprising the column-header immediately above the first column of the two-column array (which has address [ar1 - 1, arc]); hopefully Squid3 will be a bit more robust!

If InStr( Cells[ar1 - 1, arc], fsS( "207Pb/206Pb" ) ) > 0 --Instr = "In-String"
--Looks in cell above leftmost column for the specified string. If TRUE then found

  Rad76age = TRUE

Else

  Rad76age = FALSE  

End If

Next, Ludwig defines vectors of length ArN for the input values and their uncertainties... and he chooses to use different vector-names depending on whether the values are Unknowns (X and Xerr), or Reference Materials (i.e. calibration-constants: rX and RxE).

ReDim X[1 To ArN], Xerr[1 To ArN], rX[1 To ArN], RxE[1 To ArN]  

Then he rechecks the addresses of the header-rows, the first data-row and the last data-row on each of the StandardData and SampleData sheets (because they do have different addresses in Excel), using "-Std" as a numeric index to discriminate between the two cases. (Std is a Boolean, but in VBA, TRUE = -1 and FALSE = 0; Ludwig exploits this logical-numerical relationship for array-addressing and even arithmetic in some places, even though this is not good practice!). Ludwig also finds and sets the cell-range 'Hours' corresponding to the input ExtractGroup range, by searching the header-row for the string "Hours", and selecting the vector directly beneath.

Next is some addressing of the column-header row, the first row of data and the last row of data on any given worksheet, bearing in mind that the addresses of these rows can differ between the StandardData and SampleData sheets:

HdrRowGrp = flHeaderRow[-Std]  --recalling Std = TRUE has numeric value -1, FALSE = 0
FirstGrpDatRw = plaFirstDatRw[-Std]
LastGrpDatRw = plaLastDatRw[-Std]

FindStr "Hours", , piHoursCol, HdrRowGrp --"Find String"
--piHoursCol = index of column hosting string "Hours"

Set Hours = frSr(FirstGrpDatRw, piHoursCol, LastGrpDatRw)
--Defines Hours as the cell-range in beneath string "Hours", corresponding to full
--range of ExtractGroup input data-rows.

Now validate the input data. As far as I can tell, this validation is only enacted on 'first-pass' when Extracting a Group for weighted mean date calculation on unknowns; presumably if it is a manual Redo of a first-pass calculation, or a calculation on Reference Materials calibration constants, then the input has been pre-validated (either by the subroutine, or by a different one).

Then populate the arrays rX and RxE (or X and Xerr), according to the Boolean inputs of ExtractGroup, as well as the presence of numeric data in the 'values' and 'errors' columns of range t:

k = 0
N = ArN

If (RedoOnly = TRUE) OR (StdCalc = TRUE)

  --DO NOTHING: DATA PRE-VALIDATED
  
Else --i.e. ONLY IF (RedoOnly = FALSE) AND (StdCalc = FALSE)

  For i = 1 To ArN

    tB = TRUE

    For j = 1 To 2

      tB = tB AND (IsNumeric(t[i, j] = TRUE) AND (IsEmpty(t[i, j]) = FALSE)
      
    Next j

    If tB = TRUE --i.e. value and error for input-row both numeric
  
      k = 1 + k
  
      If StdCalc = TRUE --NOT POSSIBLE, ALWAYS GOES TO "ELSE"

        rX[k] = t[i, 1]
        RxE[k] = t[i, 2]

      Else --must be a 'first-pass' weighted mean date for a Grouped-sample Unknown
      --This IS useful: X and Xerr are populated, for 'first-pass' on unknowns
       
        X[k] = t[i, 1]
        Xerr[k] = t[i, 2]
        
      End If --StdCalc = TRUE
  
    End If --tB = TRUE

  Next i

  Nn = k
  BadCt = N - Nn
  OkCt = Nn

End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)

If (N < 2)
  
  Exit Sub --guards against largely non-numeric input-data

End If --(N < 2)

Having isolated the numeric subset of the input data, how the code proceeds depends on what it is trying to do:

  • If StdCalc = TRUE, then the business of identifying and rejecting outliers is covered elsewhere (i.e. in WtdMeanAcalc).
  • If RedoOnly = TRUE, then an attempt at extracting a group has already been made, and the code is merely assessing the effect of modifications manually applied by the user (i.e. rejections/unrejections as defined in SQUID 2.50 by the presence/absence of Strikethrough font) since the last ExtractGroup attempt.

Strangely enough, the wiki won't allow a code-block immediately after a bulleted list, so this is a dummy line of text before the code resumes:

If (RedoOnly = TRUE) OR (StdCalc = TRUE)

  Nn = 0
  j = 0
  k = 0
  ReDim rX[1 To ArN], RxE[1 To ArN] --note that SQUID 2.50 omits ReDim of RxE

  For i = 1 To ArN
    Set Tt = t[i, 1] --range Tt is the 'value', row-by-row, in Excel cell-range t.
    --This Set enables access to all the VBA-controllable properties of that cell.
    Set Tt2 = t[i, 2] --added during this documentation; tidies SQUID 2.50. 
     
    If (Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE) 
    --functionally equivalent to the tB-based condition in previous block.

      If (Val(Tt) > 0) AND (Tt.Font.Strikethrough = FALSE) --to avoid divide-by-zero
      --when Tt = 0, and also checks that Tt has not previously been excluded from
      --arithmetical consideration, either automatically or user-manually.
      --INTERESTING THAT NON-POSITIVE VALUES ARE AUTO-FILTERED HERE
  
        Nn = 1 + Nn
        rX[Nn] = Tt
        RxE[Nn] = Tt2 --modified from SQUID 2.50, which does not use RxE
    
      End If --(Val(Tt) > 0) AND (Tt.Font.Strikethrough = FALSE)
  
    End If --(Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE)

  Next i

  OkCt = Nn
  BadCt = ArN - OkCt

ElseIf (StdCalc = FALSE) --this ElseIf clause looks redundant; I think you can only get 
--to it if (RedoOnly = FALSE) AND (StdCalc = FALSE) in any case, so I think it should
--simply read "Else". Anyway, entering this ElseIf means we are dealing with a 
--first-pass calculation of a weighted mean date for a Grouped-sample unknown.
--FindCoherentGroup starts the heavy lifting:
 
  FindCoherentGroup N, Nn, BadCt, BadRwIndx, X, Xerr, rX, RxE, MeanVal, MeanErr, MSWD, 
    Prob, Mprob, pdMinFract, OkVal --subroutine documented separately
  --But in broad terms, X and Xrerr are the inputs to FindCoherentGroup, and rX and RxE
  --are the outputs. This means if OkVal = TRUE, the subroutine can proceed to the
  --calculation of a weighted mean using rX and RxE as input, irrespective of settings
  --for StdCalc and RedoOnly
      
  If OkVal = FALSE --as dictated by FindCoherentGroup, then write string:
  --"No coherent age group" 2 rows beneath final data-row, in the first column:

    CFs LastGrpDatRw + 2, arc, "No coherent age group"
    Exit Sub

  End If --OkVal = FALSE

Else --I think this is superfluous/inaccessible; see above
  
  If Nn > 2

    OkVal = TRUE
  
  Else
    
    OkVal = FALSE
   
  End If --Nn > 2

End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)

Now establish a pointer for the result-block. The idea is to set Rw as the index-number of the row which contains the first row of output of the weighted mean calculation. If the result-block has already been established (i.e. RedoOnly = TRUE), Rw just searches for a diagnostic text-string, based on the StdCalc setting. Otherwise (RedoOnly = FALSE), Rw points to the row three rows below the last input data-row:

If (RedoOnly = TRUE) AND (StdCalc = TRUE)
  
  Rw = Cells.Find("Wtd Mean of", Cells(ArL, arc), xlFormulas, xlPart, xlByColumns).Row

ElseIf (RedoOnly = TRUE) AND (StdCalc = FALSE)
  
  Rw = Cells.Find("Mean age of", Cells(ArL, arc), xlFormulas, xlPart, xlByColumns).Row

Else --i.e. (RedoOnly = FALSE)
  
  Rw = ArL + 3

End If

There follows an enormous amount of code that appears unnecessary to Squid3. The stuff relating to formatting was easy enough to eliminate, but there was a huge amount of cell-range definition that is seemingly no longer used. I can't tell simply by readingso if you encounter some piece of code, variable -reference, etc. beyond this point that doesn't make any sense, let me know.

The fundamental operation of the code appears to resume with first-order references to the input cell-range t:

For i = 1 To t.Rows.Count --functionally equivalent to "1 To ArN"

  Set ColOne = t[i, 1] --Looks like Tt!
  Set Col2 = t[i, 2] --Looks like Tt2!
  
  --First of a series of tests: are they both numeric?
  If (IsNumeric(ColOne) = TRUE) AND (IsNumeric(Col2) = TRUE)

    tB = TRUE
  
  Else
  
    tB = FALSE
  
  End If --(IsNumeric(ColOne) = TRUE) AND (IsNumeric(Col2) = TRUE)

  --Second test: if numeric, is the Value non-zero, is the Error positive, 
  --and do both fields comprise "non-blank" numeric values?
  If tB = TRUE
  
    If (ColOne <> 0) AND (Col2 > 0) AND (t[i, 1] <> "") AND (t[i, 2] <> "")

      tB = TRUE
    
    Else
    
      tB = FALSE
    
    End If --(ColOne <> 0) AND (Col2 > 0) AND (t[i, 1] <> "") AND (t[i, 2] <> "")
  
  Else
  
    tB = FALSE
  
  End If --tB = TRUE
  
  --Third and final test: does either numeric value match the SQUID_ERROR_VALUE?
  If tB = TRUE
  
    If (t[i, 1] <> SQUID_ERROR_VALUE) AND (t[i, 2] <> SQUID_ERROR_VALUE)

      tB = TRUE
    
    Else
    
      tB = FALSE
    
    End If --(t[i, 1] <> SQUID_ERROR_VALUE) AND (t[i, 2] <> SQUID_ERROR_VALUE)
  
  Else
  
    tB = FALSE
  
  End If --tB = TRUE

  --Having done these checks, we count the good rows and wipe the bad ones!
  If tB = TRUE
  
    Nok = 1 + Nok
  
  Else

    t[i, 1] = ""
    t[i, 2] = ""
    
  End If --tB = TRUE

Next i

Now a bit of error-trapping for input arrays of Reference Material data containing insufficient (valid) rows:

If (Std = TRUE) AND (Nok < 2) --deliver a message, based on index isotope selected

  If piStdCorrType = 2 --user-selected 208-correction in Perm1

    tW = "Fewer than 2 Standard spots with useable data -- Must quit. (Try 
            selecting 204- or 207-correction.)"
  
  Else 
  
    tW = "Fewer than 2 Standard spots with useable data -- Must quit."
  
  End If --piStdCorrType = 2

  MsgBox tW, , pscSq --delivers SQUID-branded message-box with text tW, and "OK" button
  
  End --SQUID-processing stops
  
End If --(Std = TRUE) AND (Nok < 2)

If we get this far, tW is used instead for the text defining the Excel cell-range for the input:

tW = Range( t[1, 1], t[ArN, 2] ).Address( FALSE, FALSE ) --simply defines 2-column 
     --array without any absolute references (e.g. "A1:B10", not "$A$1:$B$10")
 
Set InpR = Range( tW ) --input-range
Set OutpR = frSr( Rw, arc, Rw + 6 ) 
--Output-range has 6 rows. First row = Rw as defined above, first column = arc
--aligns first column of output-range with Values column of input-range

Finally, it's time to do the actual work! Once again, the procedure depends on the Boolean StdCalc and RedoOnly, and note that the combination (StdCalc = TRUE) AND (RedoOnly = FALSE), corresponding to first-pass Reference Material calculations as already implemented in Squid3, is explicitly NOT covered by the following If loop.

(The combination (StdCalc = TRUE) AND (RedoOnly = TRUE) case, also implemented in Squid3, is included below, but most of the provisions I can see relate to the listing of index-numbers for rejected rows, rather than anything arithmetical.)

If (StdCalc = TRUE) AND (RedoOnly = TRUE) --then the sequenced WtdAv arguments are:
  --PercentErrorsOut = TRUE
  --PercentErrorsIn = TRUE
  --SigmaLevelIn = 1
  --CanReject = FALSE --remember that this relates to SQUID-managed auto-rejection!
  --ConstantExternalErr = TRUE
  --SigmaLevelOut = 1

  wW = Isoplot3.WtdAv(InpR, TRUE, TRUE, 1, FALSE, TRUE, 1)
  
  --Now write various elements of array output to variable-names, remembering that
  --Isoplot's WtdAv writes the LABELS in the second column of the output array!
  If ( wW[3, 2] = "MSWD" ) --should always be FALSE, as ConstantExternalErr = TRUE?

    k = -1
  
  Else
  
    k = 0
   
  End If --( wW[3, 2] = "MSWD" )
  
  WtdAvg = wW[1, 1]
  WtdAvgErr = wW[2, 1]
  
  If k = 0
  
    ExtPtSigma = wW[3, 1]
  
  Else
  
    ExtPtSigma = 0
  
  End If --k = 0
  
  MSWD = wW[4 + k, 1]
  Prob = wW[6 + k, 1]
  
  RejSpots = "" --this is a string
  Nrej = 0
  
  --Now count and list row-index numbers of rows rejected via Strikethrough:
  For i = 1 To t.Count

    If ( t.Cells[i, 1].Font.Strikethrough = TRUE )
    
      Nrej = 1 + Nrej
     
      If Nrej = 1
      
        RejSpots = fsS[i] --writes STRING i as index-number for FIRST rejection
      
      Else
      
        RejSpots = RejSpots & "," & fsS[i] --builds comma-separated STRING of 
        --index-numbers of rejected rows

      End If --Nrej = 1
      
    End If --( t.Cells[i, 1].Font.Strikethrough = TRUE )
    
  Next i

  If Nrej = 0 
  
    RejSpots = "none"

  End If --Nrej = 0

ElseIf (StdCalc = FALSE) --then we are dealing with a weighted mean date for an
  --Unknown. Irrespective of RedoOnly setting, recall that: 
  --(1) Unknowns have ABSOLUTE errors as both inputs and outputs, 
  --(2) ConstantExternalErr is not relevant, and 
  --(3) Output error of weighted mean is most properly specified as 2sigma. 
  --So the sequenced arguments of WtdAv are:
  --PercentErrorsOut = FALSE
  --PercentErrorsIn = FALSE
  --SigmaLevelIn = 1
  --CanReject = FALSE --remember that this relates to SQUID-managed auto-rejection!
  --ConstantExternalErr = FALSE
  --SigmaLevelOut = 2

  wW = Isoplot3.WtdAv(InpR, FALSE, FALSE, 1, FALSE, FALSE, 2)

  WtdAvg = wW[1, 1]
  WtdAvgErr = wW[2, 1]
  MSWD = wW[3, 1]
  Prob = wW[5, 1]

  OutpR[1] = WtdAvg
  OutpR[2] = WtdAvgErr
  OutpR[3] = MSWD
  OutpR[4] = Prob

End If --(StdCalc = TRUE) AND (RedoOnly = TRUE)

More big piles of unnecessary-looking code; I have tried to extract snippets that look relevant in the context of things already documented above:

If (StdCalc = FALSE) AND (IsMissing(AgeResult) = FALSE)

  AgeResult = OutpR[1]

End If --StdCalc = FALSE) AND (IsMissing(AgeResult) = FALSE)

If (StdCalc = TRUE)
--Some irrelevant stuff

  If (RedoOnly = TRUE)

    OutpR[1] = WtdAvg
    OutpR[2] = WtdAvgErr
    OutpR[3] = ExtPtSigma
    OutpR[4] = MSWD
    OutpR[5] = Prob
    OutpR[6] = RejSpots
  
  End If --(RedoOnly = TRUE)

Else --(i.e. StdCalc = FALSE)
--More irrelevant stuff

  If (Rad76age = FALSE) --then we are dealing with a "daughter/parent" Date Type
    --(i.e. 206Pb/238U or 208Pb/232Th), and these date-types need their WtdAvgErr
    --values augmented by the WtdMeanAPerr1 value determined on the Ref Material
    --(which broadly corresponds to WtdAvgErr in the StdCalc = TRUE case). This 
    --augmentation is managed via quadratic addition of Unknown-based WtdAvgErr 
    --and Reference Material-based WtdMeanAPerr1, bearing in mind that the former 
    --is ~2sigma/95% conf. ABSOLUTE and the latter is 1sigma PERCENTAGE:

    OutpR[5].Formula =  " = SQRT( " & Cells[Rw + 1, arc].Address & 
      " ^2 + ( 2 * StandardData!WtdMeanAPerr1 / 100 * " &
      Cells[Rw, arc].Address & " )^2 ) "
      
    --Cells[Rw + 1, arc] points to **UNKNOWN** OutpR[2] = WtdAvgErr = wW[2, 1]
    --StandardData!WtdMeanAPerr1 is **REF MAT** PERCENTAGE OutpR[2] = WtdAvgErr
    --which needs to be doubled to make it 2sigma, and then multiplied by... 
    --Cells[Rw, arc], which points to **UNKNOWN** OutpR[1] = WtdAvg = wW[1, 1]
    --before dividing by 100. This process yields two 2sigma ABSOLUTE errors,
    --which can sensibly be added in quadrature.
  
  End If --(Rad76age = FALSE)

To the best of my knowledge, this is the end of the relevant code in this subroutine, and we can step back up to the main line of GroupThis (Procedural Framework Part 7c).

Clone this wiki locally