Skip to content

SHRIMP: Sub ExtractGroup

sbodorkos edited this page Aug 21, 2019 · 11 revisions

SQUID 2.50 Sub: ExtractGroup

This documentation has been (temporarily) abandoned incomplete as of 2018-04-27, because it is not clear what effect (if any) these calculations can have on calibration constants. This subroutine and its sub-subroutines will be revisited at the time it becomes necessary to deal with Grouping of Sample analyses.

The following is a generalised subroutine, so the inputs and outputs have generalised names, and the identities of the specific inputs and outputs are defined when the function is called. The subroutine is used and reused by SQUID 2.50 in at least two different contexts during a 'typical' data-reduction:

  1. An iteration of outlier detection in calibration-constants (Std = StdCalc = TRUE)
  2. Extraction of a statistically coherent sub-population from a set of Grouped-sample unknowns (Std = StdCalc = FALSE)

In both contexts, it is possible to re-invoke ExtractGroup in 'Redo' (RedoOnly) mode, which essentially iterates the calculation taking into account any manual (user-defined) inclusion and exclusion of data-rows, in the wake of the original, automated calculation.

This subroutine extracts a statistically coherent age group (i.e. one with probability of fit >= Mprob), and (in Excel-speak) constructs its range addresses, as well as placing the results on the active sheet. (The SQUID 2.50 VBA code also constructs a weighted-average chart inset, but that aspect of its function is not documented here.)

This procedure was probably written long ago; once again, it makes extensive use of Ranges in contexts where newer code-modules, executing similar operations, have explicitly defined and utilised double-precision arrays. VBA code dealing with arrays tends to be much more intelligible than that dealing with spreadsheet-ranges, but the latter is what we've got.

I have documented this subroutine in detail, because it is also invoked when assessing populations of sample/unknown analyses. Its functionality is very useful to analysts trying to extract some sort of "rock age" from a high-n dataset complicated by lots of inherited zircon, Pb loss, etc.

Usage

ExtractGroup Std, Mprob, t, RedoOnly, StdCalc, AgeResult, TypeCol, DpNum

Mandatory variables

Std: A Boolean specifying whether the subroutine has been invoked on a StandardData sheet (TRUE) or not (FALSE). The former case encompasses invocation as part of WtdMeanAcalc; the latter as part of the grouped-sample "Group Me" process.

Mprob: Double-precision value (0 <= Mprob <= 1) defining the minimum (threshold) value for probability of equivalence required to satisfy the test of "statistical coherence" in the extracted group. In the 'first-pass' calculation in "Group Me", Mprob = 0.05 is a common setting. In other contexts (i.e. first-pass WtdMeanAcalc for calibration constants, and in RedoOnly mode), it is set to 0.

t: In Excel, the 2-column cell-range (in practice, a double-precision array) containing the input values (first column) and uncertainties (second column). No explicit assumptions are made about whether the uncertainties are (1sigma or 2sigma) or (percentage or absolute).

Optional variables

RedoOnly: Boolean input indicating whether this invocation of ExtractGroup is being run 'from scratch'/'first-pass' on the input cell-range (RedoOnly = FALSE), or is a 're-run' of an pre-existing weighted-mean calculation (RedoOnly = TRUE) aimed at investigating the effect of user-defined rejections and/or inclusions of data-rows since the previous iteration of ExtractGroup. DEFAULT value = FALSE.

StdCalc: Boolean input indicating whether ExtractGroup has been invoked on Reference Materials/StandardData calibration-constant value-uncertainty pairs (TRUE) or not (FALSE). In all invocations, its value is set (either explicitly or implicitly) equal to the mandatory Boolean 'Std' above. DEFAULT value = FALSE.

AgeResult: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me.

TypeCol: Double-precision value of the Age extracted by ExtractGroup (if applicable). Used only for 'first-pass' calculations on grouped-samples via Group Me. Possibly unused in the final implementation of the routine?

DpNum: For StandardData calibration-constants only (Std = StdCalc = TRUE), the integer value indicating whether ExtractGroup is being invoked for the 'primary' daughter-parent ratio (DpNum = 1) or the alternate ratio (DpNum = 2). DEFAULT value = 1. Its values are based on our Perms, as follows:

  • Perm1 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = NOT APPLICABLE
  • Perm2 calibration constants: 206Pb/238U = DpNum = 1; 208Pb/232Th = DpNum = 2
  • Perm3 calibration constants: 206Pb/238U = NOT APPLICABLE; 208Pb/232Th = 1
  • Perm4 calibration constants: 206Pb/238U = DpNum = 2; 208Pb/232Th = DpNum = 1

Definition of variables - NEEDS UPDATING FOR EXTRACTGROUP

Values of type Boolean
NoReject, NoUPbConstAutoReject, pbCanDriftCorr, pbU, pbTh, [Task]Switch.DirectAltPD

Values of type Integer
c,DauParNum, ErN, h, i, j, LargeErRegN, Npts, Nrej, NtotRej, NumDauPar, r, rw1, rwn

Values of type Double
ExtPerr, ExtPtSigma, Lambda, MedianEr, MSWD, Nmadd, Prob, pscLm2, pscLm8, StdThPbRatio, StdUPbRatio, WtdMean, WtdMeanErr

Values of type String
s, Ele, t1, t2, t4, t5

Vectors of type Integer
LargeErRej, Rejected

Vectors of type Double
Acol, Aecol, Arange, Aerange, ErrVals

Arrays of type Double
Adatrange

Arrays of type Variant/Mixed
wW


The function commences with some definitions, and also some cleanup, remembering that this subroutine can be executed in both 'first-pass' and 'redo' mode:

--Reinitialise AgeResult:
If IsMissing( AgeResult ) = FALSE --i.e. AgeResult exists 
  
  AgeResult = 0  

End If --IsMissing( AgeResult ) 

--Define Boolean DoAll:
If RedoOnly = TRUE

  DoAll = FALSE

Else

  DoAll = TRUE

End If --RedoOnly = TRUE

--Define string version of DpNum:
If Std = TRUE

  Dp = fsS(DpNum) --fsS converts double-precision number to string

Else

  Dp = "" --Excel-speak for "NULL string"

End If --Std = TRUE 

Recalling that the input data t is defined as a single contiguous range (on a spreadsheet), it is necessary to define where on the spreadsheet the range is located, as well as the locations of the important components of the range:

With t

  ar1 = t.Row --address of the FIRST row of the range
  ArN = t.Rows.Count --number of rows in the range
  ArL = ar1 + ArN - 1 --derived address of the LAST row of the range
  arc = t.Column --address of the LEFT (value/age) column of the range

End With

Ludwig also checks to see whether the WtdAv being calculated is a 207Pb/206Pb date or not. This matters because other types of date (i.e. 206Pb/238U and 208Pb/232Th) usually need the external error of the Reference material data added in quadrature at the end, whereas a 207Pb/206Pb date does not. Ludwig does this simply by examining the string comprising the column-header immediately above the first column of the two-column array (which has address [ar1 - 1, arc]); hopefully Squid3 will be a bit more robust!

If InStr( Cells[ar1 - 1, arc], fsS( "207Pb/206Pb" ) ) > 0 --Instr = "In-String"
--Looks in cell above leftmost column for the specified string. If TRUE then found

  Rad76age = TRUE

Else

  Rad76age = FALSE  

End If

Next, Ludwig defines vectors of length ArN for the input values and their uncertainties... and he chooses to use different vector-names depending on whether the values are Unknowns (X and Xerr), or Reference Materials (i.e. calibration-constants: rX and RxE). He also defines a fifth vector, vOK, which comprises Booleans as the result of validating each row of the input.

ReDim X[1 To ArN], Xerr[1 To ArN], rX[1 To ArN], RxE[1 To ArN], vOK[1 To ArN]  

Then he rechecks the addresses of the header-rows, the first data-row and the last data-row on each of the StandardData and SampleData sheets (because they do have different addresses in Excel), using "-Std" as a numeric index to discriminate between the two cases. (Std is a Boolean, but in VBA, TRUE = -1 and FALSE = 0; Ludwig exploits this logical-numerical relationship for array-addressing and even arithmetic in some places, even though this is not good practice!). Ludwig also finds and sets the cell-range 'Hours' corresponding to the input ExtractGroup range, by searching the header-row for the string "Hours", and selecting the vector directly beneath. Next is some addressing of the column-header row, the first row of data and the last row of data on any given worksheet, bearing in mind that the addresses of these rows can differ between the StandardData and SampleData sheets:

HdrRowGrp = flHeaderRow[-Std]  --recalling Std = TRUE has numeric value -1, FALSE = 0
FirstGrpDatRw = plaFirstDatRw[-Std]
LastGrpDatRw = plaLastDatRw[-Std]

FindStr "Hours", , piHoursCol, HdrRowGrp --"Find String"
--piHoursCol = index of column hosting string "Hours"

Set Hours = frSr(FirstGrpDatRw, piHoursCol, LastGrpDatRw)
--Defines Hours as the cell-range in beneath string "Hours", corresponding to full
--range of ExtractGroup input data-rows.

Now validate the input data, and store the validation results in vOK. As far as I can tell, this validation is only enacted on 'first-pass' when Extracting a Group for weighted mean date calculation on unknowns; presumably if it is a manual Redo of a first-pass calculation, or a calculation on Reference Materials calibration constants, then the input has been pre-validated (either by the subroutine, or by a different one).

Then populate the arrays rX and RxE (or X and Xerr), according to the Boolean inputs of ExtractGroup, as well as the presence of numeric data in the 'values' and 'errors' columns of range t:

k = 0
N = ArN

If (RedoOnly = TRUE) OR (StdCalc = TRUE)

  SetArrayVal TRUE, vOK --i.e. set all values of vector vOK to TRUE
  --Presumably these two Booleans indicate data has been pre-validated.
  
Else

  For i = 1 To ArN

    tB = TRUE
    vOK[i] = FALSE

    For j = 1 To 2

      tB = tB AND (IsNumeric(t[i, j] = TRUE) AND (IsEmpty(t[i, j]) = FALSE)
      
    Next j

    If tB = TRUE --i.e. value and error for input-row both numeric
  
      k = 1 + k
  
      If StdCalc = TRUE --dealing with Ref Material calibration-constants

        rX[k] = t[i, 1]
        RxE[k] = t[i, 2]

      Else --dealing with a weighted mean date for a Grouped-sample Unknown
      
        X[k] = t[i, 1]
        Xerr[k] = t[i, 2]
        
      End If --StdCalc = TRUE
  
      vOK[i] = TRUE

    End If --tB = TRUE

  Next i

  Nn = k
  BadCt = N - Nn
  OkCt = Nn

End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)

If N < 2
  
  Exit Sub --guards against largely non-numeric input-data

End If --N < 2

Having isolated the numeric subset of the input data, how the code proceeds depends on what it is trying to do. If StdCalc = TRUE, then the business of identifying and rejecting outliers is largely covered elsewhere (i.e. in WtdMeanAcalc), and if RedoOnly = TRUE, then an attempt at extracting a group has already been made, and the code is merely assessing the effect of modifications manually applied by the user (i.e. rejections/unrejections as defined in SQUID 2.50 by the presence/absence of Strikethrough font) since the last ExtractGroup attempt.

If (RedoOnly = TRUE) OR (StdCalc = TRUE)

  Nn = 0
  j = 0
  k = 0
  ReDim rX[1 To ArN] --Values column, for Stds. Not sure this is necessary?
  --No equivalent ReDim on RxE!!

  For i = 1 To ArN
    Set Tt = t[i, 1] ----range Tt is the 'value', row-by-row, in Excel cell-range t.
    --This Set enables access to all the VBA-controllable properties of that cell.

    If (Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE) 
    --functionally equivalent to the tB-based condition in previous block.

      If Val(Tt) > 0 AND (Tt.Font.Strikethrough = FALSE) --to avoid divide-by-zero
      --when Tt = 0, and also checks that Tt has not previously been excluded from
      --arithmetical consideration, either automatically or user-manually
  
        Nn = 1 + Nn
        rX[Nn] = Tt
        RxE[Nn] = t[i, 2] * Tt / 100 --renders calibration-constant uncertainties
        --as ABSOLUTE, to unify with weighted mean dates on Unknowns, where the
        --uncertainties are always absolute.
        --NOTE that this code can happen when RedoOnly = TRUE, even when StdCalc =
        --FALSE, which seems very odd to me. In the previous loop, rX and RxE are
        --populated only when StdCalc = TRUE.
    
      End If --Val(Tt) > 0 AND (Tt.Font.Strikethrough = FALSE)
  
    End If --(Len(Tt.Text) > 0) AND (IsNumeric(Tt) = TRUE)

  Next i

  OkCt = Nn
  BadCt = ArN - OkCt

ElseIf (StdCalc = FALSE) --this ElseIf clause looks redundant; I think you can only get 
--to it if (RedoOnly = FALSE) AND (StdCalc = FALSE) in any case, so I think it should
--simply read "Else". Anyway, entering this ElseIf means we are dealing with a 
--first-pass calculation of a weighted mean date for a Grouped-sample unknown.
--FindCoherentGroup starts the heavy lifting:

  FindCoherentGroup N, Nn, BadCt, BadRwIndx, X, Xerr, rX, RxE, MeanVal, MeanErr, MSWD, 
    Prob, Mprob, pdMinFract, OkVal --subroutine documented separately
      
  If OkVal = FALSE --as dictated by FindCoherentGroup, then write string:
  --"No coherent age group" 2 rows beneath final data-row, in the first column:

    CFs LastGrpDatRw + 2, arc, "No coherent age group"
    Exit Sub

  End If --OkVal = FALSE

Else --I think this is superfluous/inaccessible; see above
  
  If Nn > 2

    OkVal = TRUE
  
  Else
    
    OkVal = FALSE
   
  End If --Nn > 2

End If --(RedoOnly = TRUE) OR (StdCalc = TRUE)

There follows a block of text aimed primarily at formatting (I think). Picking it up again at the point where a ±2sigma (absolute) error-column is inserted adjacent to (and to the right of) the ±1sigma (absolute) error column:

Code documentation temporarily abandoned at this point 2018-04-27

If StdCalc = TRUE
  
  Er2SigCol = 4 + t.Column 
  --4 columns to the right of the calib. const. 'values' column, because
  --the intervening columns are calib. const. %err, Age (Ma), and 
  --Age (Ma) ±1sigma, and Er2SigCol is the address for Age (Ma) ±2sigma.
  
Else --all other cases
  
  Er2SigCol = 2 + t.Column 
  --2 columns to the right of the 'values' column, because the only
  --intervening column is ±1sigma, and Er2SigCol is the address for ±2sigma.

End If

If RedoOnly = FALSE And StdCalc = FALSE And Cells[HdrRowGrp, Er2SigCol] <> ""
  
  For i = FirstGrpDatRw  To LastGrpDatRw
    Cells[i, Er2SigCol] = "=2*" & Cells[i, Er2SigCol - 1].Address(0, 0)
  Next i
  
End If

If Nn = 0
  MsgBox "Can't reduce this data."
  End
End If

OkCt = 0
BadCt0 = BadCt
BadCt = 0
Tct = t.Rows.Count --can't see how this differs from ArN

If BadCt0 > 0 
  ReDim TmpBadRwIndx[1 To BadCt0]
End If

NokGrps = 0
NbadGrps = 0
StartNewGrp = FALSE

For j = 1 To Tct
  
  OkCt0 = OkCt

  For i = 1 To UBound(rX) --i.e. length (rX)

    Set Tt = t[j, 1]
    
    If IsNumber(Tt.Text) = TRUE And Tt.Font.Strikethrough = FALSE

      If Tt <> 0 And Tt = rX[i] 
        
        Set Ra1 = Tt[1, 0]
        Set Ra4 = Tt[1, 1]
        Set Ra5 = Tt[1, 2]

        If StdCalc = TRUE
          Set Ra2 = Tt[1, 3]
          Set Ra3 = Tt[1, 5]
        Else
          Set Ra2 = Tt[1, 1]
          Set Ra3 = Tt[1, 3]
        End If

        Set Ra6 = Range(Ra4, Ra5)
        Set Ra7 = Range(Ra1, Ra4)
        Set ur1 = Union(Ra1, Ra2)

        With Ra6
          .Font.Strikethrough = False
        End With
        
        OkCt = 1 + OkCt
        OK[OkCt] = rX[i]
        OKrwIndx[OkCt] = j
    
        If StdCalc = TRUE
          OkVals[OkCt, 1] = Ra1
          OkVals[OkCt, 2] = Ra4
          OkErrVals[OkCt] = Ra5 --1sigma absolute errors
        End If
    
        OkAgeVals[OkCt, 1] = Ra1
        OkAgeVals[OkCt, 2] = Ra2
        OkAgeErrVals[OkCt] = Ra3 --2sigma absolute errors
    
        If OkCt = 1
          NokGrps = 1
          ReDim OkAgeAddr[1 To 1], OkAgeErAddr[1 To 1], OkBreak[1 To 1]
          Set OKspots = Ra7
          Set OkErrs = Ra5
          Set OkAge = ur1
          Set OkAgeErrs = Ra3
        ElseIf StartNewGrp = TRUE
          NokGrps = 1 + NokGrps
          ReDim Preserve OkAgeAddr[1 To NokGrps], OkAgeErAddr[1 To NokGrps],
            OkBreak[1 To NokGrps]
          StartNewGrp = FALSE
          Set OkAge = ur1
          Set OkAgeErrs = Ra3
        Else
          Set OkAge = Union(OkAge, ur1)
          Set OkAgeErrs = Union(OkAgeErrs, Ra3)
        End If
    
        Set OKspots = Union(OKspots, Ra7)
        Set OkErrs = Union(OkErrs, Ra5)
        OkAgeAddr[NokGrps] = OkAge.Address
        OkAgeErAddr[NokGrps] = OkAgeErrs.Address
        OkBreak[NokGrps] = j
        
        Exit For
      
      End If --Tt <> 0 And Tt = rX[i] 
  
    End If --IsNumber(Tt.Text) = TRUE And Tt.Font.Strikethrough = FALSE

  Next i

  If OkCt = OkCt0 
    BadCt = 1 + BadCt
    BadRwIndx[BadCt] = j
  End If

Next j

If BadCt > 0
  
  k = Max( BadCt, BadCt0 )

  ReDim Preserve BadRwIndx[1 To k], TmpBadRwIndx[1 To k]
  ReDim TmpBadRwIndx[1 To k]
  
  NbadGrps = 0
  StartNewGrp = FALSE
  
  For i = 1 To BadCt
    TmpBadRwIndx[i] = BadRwIndx[i]
  Next i
  
  k = 0

  For i = 1 To BadCt
    m = TmpBadRwIndx[i]
    
    With Range( t[m, 1], t[m, 2] )
      .Font.Strikethrough = True
      .Interior.Color = vbYellow
    End With
    
    Set Tt = t[m, 1]

    If IsNumber(Tt.Text) = TRUE

      If Val(Tt) <> 0 
        k = 1 + k
        
        Set Ra1 = Tt(1, 0)
        
        If StdCalc = TRUE
          Set Ra2 = Tt[1, 3]
          Set Ra3 = Tt[1, 5]
        Else
          Set Ra2 = Tt[1, 1]
          Set Ra3 = Tt[1, 3]
        End If

        Set Ra7 = Union(Ra1, Ra2)
    
        BadRwIndx[k] = m
        BadAgeVals[k, 1] = Ra1
        BadAgeVals[k, 2] = Ra2
        BadAgeErrVals[k] = Ra3
    
        If k = 1
          NbadGrps = 1
          ReDim BadAgeAddr[1 To 1], BadAgeErAddr[1 To 1], BadBreak[1 To 1]
          
          Set BadAge = Ra7
          Set BadAgeErrs = Ra3
    
        ElseIf StartNewGrp = TRUE
          NbadGrps = 1 + NbadGrps
          ReDim Preserve BadAgeAddr[1 To NbadGrps], BadAgeErAddr[1 To NbadGrps],
            BadBreak[1 To NbadGrps]
          
          StartNewGrp = TRUE
          Set BadAge = Ra7
          Set BadAgeErrs = Ra3
        
        Else
          Set BadAge = Union(BadAge, Ra7)
          Set BadAgeErrs = Union(BadAgeErrs, Ra3)
        
        End If
    
        BadAgeAddr[NbadGrps] = BadAge.Address
        BadAgeErAddr[NbadGrps] = BadAgeErrs.Address
        BadBreak[NbadGrps] = m
        
      End If --Val(Tt) <> 0
  
    End If --IsNumber(Tt.Text) = TRUE

  Next i

  BadCt = k
  
  If NbadGrps > 0
    
    If BadBreak[NbadGrps] = 0 
      BadBreak[NbadGrps] = m
    End If
    
  End If
  
  If BadCt > 0
    ReDim Preserve BadRwIndx[1 To k]
  End If

End If --BadCt > 0
Clone this wiki locally