Authors: Jivesh Ramduny & Clare Kelly
Recent advances in fMRI research have posited that the functional connectivity profile of an individual may act as a "fingerprint", i.e. the functional connectivity profile exhibits distinct characteristics at the individual level. Our aim is to leverage this knowledge to investigate whether examining the identifiability of connectivity fingerprints may serve as a tested for assessing the reproducibility of functional connectivity analysis applications. We investigate an array of factors that may optimise functional connectivity measures to attain the long-term goal of discovering robust and reliable biomarkers of psychiatric disorders. We employed the functional connectome fingerprinting approach which was originally proposed by Finn et al. (2015).
Structural and functional data were obtained from the openly available Consortium for Replicability and Reproducibility (CoRR) spanning childhood to adulthood. A total of five independent datasets were used which include the New York University (NYUadu), New York University (NYUado), University of Pittsburgh School of Medicine (UPSM), Beijing Normal University (BNU) and Southwest University (SWU). For each dataset, two resting-state scans were obtained from two imaging sessions either on the same day or after several months and years. The complete information regarding the demographic details and fMRI acquisition parameters are provided below.
Dataset | NYUadu | NYUado | UPSM | BNU | SWU |
---|---|---|---|---|---|
Sample Size | 31 | 25 | 67 | 60 | 82 |
Age Range (years) | 18-43 | 7-13 | 14-19 | 19-23 | 18-25 |
Gender (M:Male F:Female) | M:16 F:15 | M:15 F:10 | M:34 F:33 | M:32 F:28 | M:33 F:49 |
Scan Duration (min) | 06:00 | 06:00 | 05:06 | 08:06 | 08:00 |
Time Between Retest Scans | Same day | Same day | 1-4 years | 3 months | 1 year |
Scanner Manufacturer | Siemens | Siemens | Siemens | Siemens | Siemens |
Scanner Model | Magnetom Allegra | Magnetom Allegra | TrioTrim | TrioTrim | TrioTrim |
Field Strength | 3.0T | 3.0T | 3.0T | 3.0T | 3.0T |
TR (ms) | 2000 | 2000 | 1500 | 2000 | 2000 |
We used a functionally defined Shen 268 parcellation to derive the FC profiles of each individual between sessions as described by Finn et al. (2015). For each individual, the mean timeseries of each ROI was extracted across the whole brain and the Pearson's correlation coefficient was calculated between all possible ROI pairs to conduct a symmetric 268 x 268 FC matrix; the correlation values represent the connecitivity strength (i.e. edges) between two ROIs (i.e. nodes). This procedure was repeated for each of the two imaging sessions such that an individual had two FC matrices which reflect his/her connectivity profiles during each session. We eliminated some edges in the FC matrices for each individual due to the lack of coverage across the whole brain. We also considered only the upper triangular part of the FC matrices to remove duplicate edges in the subsequent analyses. From 35,778 edges in the functional connectome, there were 31,878 distinct edges which remained in the NYUadu dataset, 15,051 edges in the NYUado dataset, 30,381 edges in the UPSM dataset, 27,028 edges in the BNU dataset and 28,680 edges in the SWU dataset.
Identification was performed by creating a "database" which stored all the FC matrices of each individual from session 1. Iteratively, the FC matrix from a given individual from session 2 was then selected and this FC matrix was treated as the "target matrix". The target matrix was then compared with each of the FC matrices in the database to find the corresponding matrix which is maximally correlated with each other. An individual is correctly identified if the FC matrices in the database and target matrix share the highest Pearson's correlation coefficient. The predicted identity (ID) was compuated using two approaches:
- binary identification (BID): ID was assigned a score of 1 if the predicted identity matched the true identity of the individual, otherwise the ID was given a score of 0.
- relative rank (RR): RR is a continuous measure ranging from 0 to 1, and quantifies the degree of "confusion" for inaccurately identified individuals such that the fewer individuals inaccurately ranked above their true ID, the lower the degree of confusion and lower the RR.
The ID accuracy for each dataset was computed as the percentage of individuals who were correctly identified out of the total number of individuals in each dataset. We then averaged the ID accuracy for each dataset by exchanging the roles of the database-target matrix. The identification procedure was repeated until the FC matrices of each subject served as target matrices across the five datasets and two database-target matrix configurations.
Failure to control for gross motion has the potential to bias the true estimates of FC-based measures as there is an inverse relationship between FC patterns and head movements, especially in developmental cohorts (Satterthwaite et al., 2012). In order to avoid head motion in confounding the ID accuracy, we excluded high-motion individuals in the five datasets by using a root-mean-square framewise displacement (rmsFD) threshold as implemented by Jenkinson et al. (2002). We selected an rmsFD threshold that is neither very strict (e.g. rmsFD < 0.1mm) nor very liberal (e.g. rmsFD > 0.3-0.5mm). We retained all the low-motion individuals with rmsFD <= 0.2mm in either imaging session. We then determined whether there was a relationship between head motion and ID accuracy in both imaging sessions.
TFC is a measure which was recently proposed by Kopal et al. (2020) to estimate the quality of FC profiles derived at the individual level. It is computed as the Pearson’s correlation coefficient between an individual’s FC matrix and the typical FC matrix which represents the mean FC matrix of all the low-motion participants (i.e. rmsFD <= 0.2mm) in the cohort averaged between two imaging sessions. We vectorised the individual and typical FC matrices by selecting only the upper triangular form of the FC matrices in each independent dataset. TFC is a continuous measure which ranges from 0 to 1 such that a TFC = 0 indicates an anticorrelated relationship between an individual’s FC profile and the typical FC profile whereas a TFC = 1 represents maximal correlation between an individual’s FC profile and the typical FC profile of the cohort. If TFC = 0.5, then this intermediate score suggests no relationship between an individual’s FC profile and the typical FC profile. We assessed the quality of the individual FC profile in each independent dataset to determine whether the FC profiles reflect typical whole-brain FC patterns which are free from motion-related artefacts.
tSNR provides a measure of the noise characteristics in the fMRI timeseries over time which may stem from physiological (e.g. motion, respiration, cardiac processes) and scanner-related (e.g. scanner drifts, field inhomogeneity) artefacts (Welvaert et al., 2013). Gains in tSNR have offered the potential to detect small fluctuations in the fMRI signal at higher spatial resolutions in addition to localising finer brain areas (Murphy et al., 2007). We tested if there was an association between tSNR and ID accuracy to understand how the quality of the fMRI signal affects the individual functional connectome.
Recent work has shown that greater connectome distinctiveness, i.e. the degree to which an individual connectome differentiates that person from a group is associaed with increasing age during a critical period that spans puberty (Kaufmann et al., 2017). We examined age-related variability in ID accuracy that spanned across childhood to adulthood.
GSR has been applied as a denoising strategy to account for physiological noise related to head motion, respiration, cardiac processes and blood vessels (Colenbier et al., 2020; Power et al., 2017; Fox et al., 2009). Although GSR remains a controversial preprocessing step till date, it is highly efficient in removing the positive associations between motion parameters and FC-based measures (Power et al., 2017; Power et al., 2014; Yan et al., 2013), increasing the specificity of positive correlations between brain regions (Weissenbacher et al., 2009; Fox et al., 2009), and improving the presence of neuroanatomical networks (Fox et al., 2009). We preprocssed the functional data with and without regressing out the global signal in the fMRI timeseries to assess the effectiveness of GSR in boosting the ID accuracy.
Parcellation schemes provide an understanding about the brain's anatomical, functional and cytoarchitectural organisation in a homogeneous manner, and different parcellation schemes capture coarse and fine-grained properties of brain areas (Eickhoff et al., 2018; Eickhoff et al., 2015). This is because these parcellation schemes differ in terms of their coverage (e.g. cortical, whole-brain), space (e.g. volume, surface) and resolution (e.g. 10-1000). We parcellated the functional data using four publicly available parcellation schemes including Shen 268, Glasser 360, MIST 444 and Schaefer 1000-17Networks, to determine whether there was, if any, a linear or nonlinear (e.g. exponential, polynomial) relationship between these parcellation features and ID accuracy.
Prior FC-based studies have indicted that higher parcel resolutions may improve the individual differences of the functional connectome (Vanderwal et al., 2019; Bellec et al., 2015). Thus, we investigated the associations between parcel resolution and ID accuracy by employing three parcellation schemes using Schaefer 17Networks, MIST and DiFuMo atlases which offered 8 (i.e. 100, 200, 300, 400, 500, 600, 800, 1000), 10 (i.e. 7, 12, 20, 36, 64, 122, 197, 325, 444, 1095) and 5 (i.e. 64, 128, 256, 512, 1024) levels of parcel dimensionality, respectively.
The spatial distribution of FC-based measures has shown significantly lower test-retest reliability estimates when subcortical connections are considered as opposed to cortical connections (Shah et al., 2016). The reliability of FC estimates may suffer from the inclusion of subcortical regions due to their small volumes, close proximity to physiological sources and low SNR due to signal dropout (Noble et al., 2019). We parcellated the functional data into cortical, subcortical and cerebellar regions using the Shen 268 parcellation to examine the influence of spatial brain coverage on ID accuracy. We identified the subcortical and cerebellar regions in the Shen 268 parcellation using the Harvard-Oxford atlas (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases) and Buckner atlas (http://surfer.nmr.mgh.harvard.edu/fswiki/CerebellumParcellation_Buckner2011), respectively. We then obtained the cortical regions by subtracting the subcortical and cerebellar areas from the whole-brain Shen 268 parcellation.
Several studies have demonstrated that some functional networks yield stronger predictive power in facilitating the identification of an individual (Jalbrzikowski et al., 2019; Horien et al., 2019; Finn et al., 2015). First, we delineated the functional data as previously described using the Shen 268 parcellation to assess the contributions of 8 functional networks on ID accuracy (Finn et al., 2015). These functional networks are as follows: (1) medial frontal; (2) frontoparietal; (3) default mode; (4) subcortical and cerebellum; (5) motor; (6) visual I; (7) visual II; and (8) visual association. Second, we evaluated the contributions of the Yeo 7 and 17 functional networks using the recently published DiFuMo atlases which reflect dictionaries of "soft" functional modes.
As the reliability of FC-based measures may be dependent on the lengths of scans acquired (Noble et al., 2017; Mueller et al., 2015; Birn et al., 2013; Anderson et al., 2011; Van Dijk et al., 2010), we assessed the association between scan duration and ID accuracy. We thus determined the relationship between increasing scan lengths ranging from 0.5min to the full acquisition length and ID accuracy for each independent dataset.
Previous studies showed that longer timepoints are required to boost the individual properties of the functional connectome (Vanderwal et al., 2019; Vanderwal et al., 2017; Finn et al., 2015). However, censoring timepoints which are contaminated by potential confounding factors such as head motion may not be ideal as truncating the timeseries has an inverse effect on ID accuracy (Finn et al., 2015). Hence, we used a bootstrapping approach to randomly select subsamples of the fMRI data with replacement over 300 iterations for each individual and we approximated the ID accuracy for each dataset by averaging the resulting ID accuracy computed for each iteration. We then investigated the effects of the bootstrapped timepoints sampled from 0.5min to the full acquisition length of each dataset in addition to oversampled timepoints (equivalent to 10min of acquisition length) on ID accuracy.
-
Extract fMRI timeseries: To extract the ROIs' timeseries, a parcellation scheme in the MNI152 standard space such as
shen_2mm_268_parcellation.nii
inParcellations
can be used. Theextract_fMRI_timeseries
script inScripts
requires the preprocessed fMRI data of every individual in the cohort and a predefined parcellation scheme as input. Run./extract_fMRI_timeseries.sh
in a bash terminal to execute the command and extract the fMRI timeseries. -
Identification procedure: The identification procedure was implemented in the
fMRI_fingerprint_connectome.py
script which can be found inScripts
. As input, the fMRI timeseries of two imaging sessions are needed for each individual and the script will generate a text file which stores the subject-wise correlation matrix (N x N) for each dataset. Of note, zero values in the timeseries which are introduced by a parcellation scheme due to limited coverage are removed, otherwise the correlation matrix will include NaN values. Runpython3.7 fMRI_fingerprint_connectome.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command. -
Compute ID accuracy using BID approach: Using the
ID_accuracy_connect_print.py
script inScripts
, the input file which stores the subject-wise correlation matrix of a dataset is required. The script will produce the success rate of the dataset which represents the percentage of individuals that have been correctly identified. For example, a success rate of 0.6 indicates that 60% of the individuals in a dataset have been successfully identified. Runpython3.7 ID_accuracy_connect_print.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command. -
Compute ID accuracy using RR approach: Using the
ID_accuracy_relative_rank.py
script inScripts
, the input file which stores the subject-wise correlation matrix of a dataset is required. The script will produce the success rate of the dataset which represents the degree of confusion for incorrectly identifying individuals. For example, a success rate of 0.05 indicates that ony 5% of the individuals in a dataset have inaccurately identified, thus the lower the degree of confusion and higher the ID accuracy. However, a success rate of 0.60 indicates that 60% of the individuals in that dataset have been incorrectly identified, thus the higher the degree of confusion and lower the ID accuracy. Runpython3.7 ID_accuracy_relative_rank.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command. -
Parcellation Schemes: Four publicly available parcellation schemes were used including Shen 268, Glasser 360, MIST 444 and Schaefer 1000-17Networks which are stored in
Parcellations
. All the parcellation schemes were registered in the MNI152 standard space and they were resampled at a resolution of 2x2x2mm. -
Parcel Resolutions: Three publicly available parcellation schemes were employed with different levels of parcel dimensionality and they are available in
Parcel Resolutions
. The Schaefer 17Networks parcellation scheme has 8 levels of dimensionality ranging from 100 to 1000 parcels. The MIST parcellation scheme provides 10 levels of dimensionality ranging from 7 to 1095 parcels. The DiFuMo parcellation scheme offers 5 levels of dimensionality ranging from 64 to 1024 parcels. -
Parcellation Coverage: The whole brain Shen 268 parcellation was divided into cortical, subcortical and cerebellar regions which are available in
Parcel Coverage
. In order to compute the success rate of a dataset based on the spatial coverage of the functional connectome, executenetwork_fingerprint_connectome.py
inScripts
by selecting a specific coverage. Runpython3.7 network_fingerprint_connectome.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command. -
Definitions of functional networks: The labels of the functionally defined networks are provided in
Network Labels
which have been derived from the Shen 268 and DiFuMo parcellation schemes. The functional networks in the Shen 268 parcellation are numbered from 1 to 8 such that "1" corresponds to the medial frontal network, "2" refers to the frontoparietal network and so forth. The functional networks in the DiFuMo parcellation are described by the Yeo 7 and Yeo 17 networks. To compute the success rate of a dataset based on a specific functional network, executenetwork_fingerprint_connectome.py
inScripts
. Runpython3.7 network_fingerprint_connectome.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command. -
Scan Duration: By varying the scan duration of the fMRI data, the equivalent number of timepoints is selected to compute the subject-wise correlation matrix. For example, the UPSM dataset has been acquired in 5min with 194 timepoints. In order to compute the success rate of the dataset for a scan length of 1min, only 30 timepoints are selected. The
scan_dura_fingerprint_connectome.py
script inScripts
can be executed to compute the success rate of a dataset with varying scan duration. Runpython3.7 scan_dura_fingerprint_connectome.py
in a Python environment or a Cloud-based platform (e.g. Google Colab) to execute the command.