Corections for multiple tests for trans-eQTL #146

AlexandreHUBERT17 · 2024-08-19T10:39:49Z

Hello, my colleagues and I, are using TensorQTL for our researches and we have a few questions about the corections for multiple tests when identifying trans-eQTLs.

Many papers incorrectly state that they have detected many more trans-eQTL than cis-eQTL, probably due to an incorrect correction for multiple tests.
We would therefore like to know more about the correction for multiple tests made by TensorQTL when calculating trans-eQTL.
To clarify our questions, let's take our study set as an example: 300k heterozygous markers on 15K genes expressed in a specific tissue from 500 hens.

We assume that TensorQTL calculates 300K x 15K = 4.5 billion pvalues.
TensorQTL then returns the marker x gene associations with a pvalue.Nomimal only ≤ threshold to be set by the user (usually 10^-5) and after removing the cis-type associations calculated according to the 'marker-TSS' distance parameter specified by the user (usually 1Mb). In the end, we have a file with around 1M lines.

What is the rationale behind such a selection “pvalue.Nomimal ≤ threshold 10^-5”?

We then calculate permutations using the trans.map_permutations() and trans.apply_permutations() functions, in order to obtain, in addition to the pvalue.Nomimal,

A pvalue.perm :
- Is it calculated following the N permutations indicated by the 'nPerm' parameter (10K by default) in the 'trans.map_permutations()' function?
- What do the permutations correspond to?
- Do we just swap the expression values for each gene?
- Then do we repeat the test N times for each swapped expression gene * marker?
- If so, that would mean 10K permutations * 15k * 300 K, i.e. far too many tests to do. Why have 10K permutations been defined by default?
- Finally, is pvalue.perm the 95th percentile of the distribution of pvalues obtained by chance after permutation?
A pvalBeta, calculated after permutation:
- How are permutations taken into account in this pvalBeta?
- Wht is the formula?

Thank you in advance

francois-a · 2024-08-24T02:39:31Z

The 10^-5 threshold for nominal p-values is somewhat arbitrary, and was chosen to include potentially interesting (but not genome-wide significant) results. Reporting the full set of trans associations would result in prohibitively large outputs for most datasets.

For permutations it is important to note that the approach implemented in TensorQTL only works if the phenotypes are all standard normal distributed (e.g., from applying an inverse normal transform). Based on this assumption, empirical p-values can be obtained from genome-wide permutations of a standard normal (with the chr_s=pgr.pvar_df['chrom'] argument, this is performed as 'leave on chromosome out'). A beta distribution is fitted to the permutation p-values in the same manner as for cis-QTLs (for details, see Ongen et al., Bioinformatics, 2016). pval_perm is the value computed from the permutations; pval_beta is the corresponding beta-approximated p-value that should be used for analyses (e.g., to compute q-values).

AlexandreHUBERT17 · 2024-08-28T12:51:44Z

Thank you for your reply,

I understand better the distinction between the two types of pval.

However, I didn't quite understand the point of using the chr_s parameter, can you tell me more?
Also, in your example, what does pgr.pvar_df correspond to? Is it the dataframe containing the positions of the variants?

Thank you in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corections for multiple tests for trans-eQTL #146

Corections for multiple tests for trans-eQTL #146

AlexandreHUBERT17 commented Aug 19, 2024

francois-a commented Aug 24, 2024

AlexandreHUBERT17 commented Aug 28, 2024

Corections for multiple tests for trans-eQTL #146

Corections for multiple tests for trans-eQTL #146

Comments

AlexandreHUBERT17 commented Aug 19, 2024

francois-a commented Aug 24, 2024

AlexandreHUBERT17 commented Aug 28, 2024