BoBaFor

a bacterial genome wide association approach through utilizing machine learning practices.

installation

Create Conda Virtual Environment with required python version

conda create -n BoBaFor_env python==3.11.10

Activate BoBaFor_env

conda activate BoBaFor_env

install BoBaFor via pip

pip install BoBaFor

Use

Download example from GitHub

git clone https://github.com/PaulDanPhillips/BoBaFor.git

change directory to example

cd BoBaFor/examples

Convert relative paths to absolute in example_config.yaml

python AddAbsolutePaths.py

Run the example data (user needs to pay attention to how many cores they have available and want to use)

BoBaFor --config example_config.yaml --cores 4

Running your own data:

Data organization

The data needs to be split into two files: 1 containing all of the genetic features and 2 the response phenotype ensureing that the indexes on the feature matrix align with the reponse vector.

Configure.yaml file:

You will edit the Sim1.yaml file from the example directory.

Predictor
- The absolute or relative path to your genetic feature matrix will take the place of """balanced_features/Strongest_Sim1.txt"""
- /path/to/data/features.txt
Response
- The absolute or relative path to your response phenotype vector will take the place of """balanced_responses/Sim1.txt""" /path/to/daata/response.txt
*chi2Correct
- The user need to decie if they want to the chi2 prefilter step to be corrected via false-discovery-rate or not (only True or False)
- True/False
FDRchi2Thresh
- The user needs to decide what the FDR chi2 threshold is for removing genetic features based on this chi2 test is.
- 0.-1. (float)
PreFilterFileName
- The name you want for the output files from just the chi2-prefilter step.
- Prefilter_Experiment_Name
sim
- Experiment Name for prefix file name
- Simulation1
GridParamDepth
- The number of of options to keep for EVERY hyperparameter after optimizing from RandomSearch leading into the much more thorough and computational demanding GridSearch
- 1-5 (int)
ScoreMetric
- What metric to use to score the model. Pay attention to what type of data your response metric is and how balanced the data is.
- balanced_accuracy
RandSearch1_Niter
- The number of RandomSearch iterations to perform prior to boruta selection (The number of hyperparamter combinations to test)
- (int)
GridSearch1_CV
- The number of GridSearch iterations to perform prior to boruta selection (The number of times to create kfold splits and test EVERY possible hyperparameter combination)
- (int)
boruta_perc
- The percent strength of the boruta shadow-features to be comparted against the real features.
- 0-100 (int)
boruta_pval
- Level at which the corrected p-values will get rejected in both correction steps.
- 0.-1. (float)
borutaSelect_perc
- The percent theshold of which to select final features (the number of times a feature was selected divided by the number of iterations boruta was run).
collect
- A boolean on whether to keep all selected features no matter the boruta_Select_perc setting.
- True/False
boruta_reps
- The number of iterations to run boruta selection.
model
- Whether to run a random forest or extreme gradient boosted model
- RF/XGB
RandSearch2_Niter
- The number of RandomSearch iterations to perform after boruta selection (The number of hyperparamter combinations to test)
- (int)
RandSearch2_CV
- TO BE REMOVED
- The number of iterations to create kfolds within a single RandSearch2_Niter.
- (int)
GridSearch2_CV
- The number of GridSearch iterations to perform after boruta selection (The number of times to create kfold splits and test EVERY possible hyperparameter combination) (int)
RankingFeatureIters
- The number of iterations to perform feature ranking via Feature Importance or Permutation Importance
- (int)
PermImpInteralIter
- The number of internal repetitions for Permutation Importance
- (int)

Output files:

Most important is BorutaOUT_1.00.05_RF/FinalFeatures_RF/eatureImportance_Ranked_*.out as it contains the genentic variants and their associated feature importance value.
1. BorutaOUT_1.00.05_RF/PreFilterOut/PreFilterPredictor_Sim1.txt
  - The features that pass the chi2 prefilter set in the config.yaml file
2. BorutaOUT_1.00.05_RF/TuningOUT_RF/GridParamSpace1_Sim1.pickle
3. BorutaOUT_1.00.05_RF/TuningOUT_RF/GridParamSpace2_Sim1.pickle
4. BorutaOUT_1.00.05_RF/TuningOUT_RF/gridSearch1_Sim1.txt
5. BorutaOUT_1.00.05_RF/TuningOUT_RF/gridSearch2_Sim1.txt
6. BorutaOUT_1.00.05_RF/TuningOUT_RF/GridSearchFIG1_Sim1.png
7. BorutaOUT_1.00.05_RF/TuningOUT_RF/GridSearchFIG2_Sim1.png
8. BorutaOUT_1.00.05_RF/TuningOUT_RF/Optimal_Tuning1_Sim1.pickle
9. BorutaOUT_1.00.05_RF/TuningOUT_RF/Optimal_Tuning2_Sim1.pickle
  - The optimized model used to rank the genetic features. This can be used with the explainable AI shap to best interpret the model if desired.
10. BorutaOUT_1.00.05_RF/TuningOUT_RF/randSearch1_Sim1.txt
11. BorutaOUT_1.00.05_RF/TuningOUT_RF/randSearch2_Sim1.txt
12. BorutaOUT_1.00.05_RF/BorutaOUT/RF_Boruta_Selected_Sim1.txt
13. BorutaOUT_1.00.05_RF/BorutaOUT/RF_Boruta_Tentative_Sim1.txt
14. BorutaOUT_1.00.05_RF/BorutaOUT/RF_FinalFeatureMatrix_BorutaSelected_Sim1.txt
15. BorutaOUT_1.00.05_RF/FinalFeatures_RF/FeatureImportance_Ranked_Sim1.out

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
BoBaFor		BoBaFor
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoBaFor

installation

Create Conda Virtual Environment with required python version

Activate BoBaFor_env

install BoBaFor via pip

Use

Download example from GitHub

change directory to example

Convert relative paths to absolute in example_config.yaml

Run the example data (user needs to pay attention to how many cores they have available and want to use)

Running your own data:

Data organization

Configure.yaml file:

Output files:

About

Releases

Packages

Languages

License

PaulDanPhillips/BoBaFor

Folders and files

Latest commit

History

Repository files navigation

BoBaFor

installation

Create Conda Virtual Environment with required python version

Activate BoBaFor_env

install BoBaFor via pip

Use

Download example from GitHub

change directory to example

Convert relative paths to absolute in example_config.yaml

Run the example data (user needs to pay attention to how many cores they have available and want to use)

Running your own data:

Data organization

Configure.yaml file:

Output files:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages