Models and Normalized Scores - Some binary scores/models seem to consistently produce large values. Unnormalized? #294

bgulko · 2024-05-20T21:17:28Z

bgulko
May 20, 2024

Context: I am applying the pgsc_calc (2.0.0-alpha, Linux/Singularity/AWS) to whole genome 1000 genome data (this has presented its own set of challenges).

Concern: In the course of generating scores, I find that most PGS models I use produce scores in the expected range (mostly +/-2.0) over nearly all genomes. However, about 15% consistently produce quite large scores, for example PGS000749 (though this is just one of the affected PGS) which routinely produces "SUM" scores in the 100.0-500.0 range.

Question: As I understand it, models are generally normalized during development, so in binary models scores have a mean ~0, a standard deviation of ~1.0, and the scores produced can be loosely interpreted as Z-scores. Is it reasonable assume that large magnitude scores on binary models are simply un-normalized results, or is this more likely a software/data issue or bug?

Possible Workaround: If un-normalized, is there some way to get the development/eval normalization coefficients from data in the PGRS catalog? As an alternative, I can renormalize scores across my sample population, but I'd rather use the parameters developed by the submitters, in case my sample is biased over the predicted trait.

Thanks!
--Brad

Answered by smlmbrt

May 21, 2024

The SUM scores output by the calculator are the raw sum of effect_weight*dosage over all variants in the scoring file. So it is possible that these could have large numbers and not follow a normal distribution centred around 0. Usually people just center those results across the whole sample to make them more comparable to the others.

View full answer

smlmbrt · 2024-05-21T09:37:50Z

smlmbrt
May 21, 2024
Maintainer

The SUM scores output by the calculator are the raw sum of effect_weight*dosage over all variants in the scoring file. So it is possible that these could have large numbers and not follow a normal distribution centred around 0. Usually people just center those results across the whole sample to make them more comparable to the others.

0 replies

bgulko · 2024-06-03T00:55:29Z

bgulko
Jun 3, 2024
Author

So it sounds like submitters aren't required to include any normalizing coefficients or Mean/SD values for scores from the score development population, and these may not generally be available in PGRS-provided models or metadata. It is a touch problematic to use measures like OR/SD in the development population, without knowing what the observed SD was in that population. No worries, it is most helpful just to know not to seek for them here.

There may be growing interest in converting PGRS all the way to probabilities, which ideally requires the normalization used in the score development population. For now, I can work around this by deriving mean and SD of scores generated from a relevant validation population (I think this was also your reference).

Thanks, indeed, for the quick response!

1 reply

smlmbrt Jun 5, 2024
Maintainer

There may be growing interest in converting PGRS all the way to probabilities, which ideally requires the normalization used in the score development population. For now, I can work around this by deriving mean and SD of scores generated from a relevant validation population (I think this was also your reference).

True, but if people used PGS adjusted using a specific reference panel you wouldn't need to supply normalisation factors (which would be invalid because of missingness).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models and Normalized Scores - Some binary scores/models seem to consistently produce large values. Unnormalized? #294

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Models and Normalized Scores - Some binary scores/models seem to consistently produce large values. Unnormalized? #294

bgulko May 20, 2024

Replies: 2 comments · 1 reply

smlmbrt May 21, 2024 Maintainer

bgulko Jun 3, 2024 Author

smlmbrt Jun 5, 2024 Maintainer

bgulko
May 20, 2024

Replies: 2 comments 1 reply

smlmbrt
May 21, 2024
Maintainer

bgulko
Jun 3, 2024
Author

smlmbrt Jun 5, 2024
Maintainer