Plotting multiple principal components (PC) from classical multidimensional scaling (cmd) output and mark outlier clusters based on PC1 vs. PC2
The function is mainly designed as integration to the GenABEL workflow
More specifically, it points to the detection of genetic outliers using identity by state (IBS) procedure (see quality control, section 5.2 of the tutorial).
Although the function is embedded in the GenABEL workflow, it can be used for every cmd- or principal component matrix. It is important to have the sample names or ID names within the rownames of your matrix. The function plots different mds plots to verify genetic outliers obtained in a PC1 vs. PC2 plot. Further it returns a vector of IDs which represents the main sample-cluster (optional), the samples which should be kept in the analysis.
The input-arguments of the function are described within the code. Since this function is not part of a official package one should read the comments written in the first lines.
Just download/clone the repo as .zip while using the respective button at the main-page. After you unzipped the repo you can find the function within the folder outlier_function
.
Once stored on your hard drive use:
source("path/to/where/the/function/is/stored")
The function should be loaded to the global environment of your current R-session.
Within the function I extensively describe what the specific code blocks do. If you use R-Studio and you linked the .R-extension (during the installation) to the program, simply double-click the file and it will be opened in R-Studio. Then you can read the source-code and the comments.