diff --git a/README.md b/README.md index 1a140aed..35ae1e8a 100644 --- a/README.md +++ b/README.md @@ -63,8 +63,9 @@ All SAMPL7 challenges are now closed. Note the first phase of the SAMPL8 host-gu - **Release 0.6** (Oct. 13, 2020, DOI [10.5281/zenodo.4086044](https://dx.doi.org/10.5281/zenodo.3975152)): Release the finalized the physical properties challenge inputs, formats, submissions and experimental results. A later release will include the results of analysis. These changes were all available in master earlier (see detailed changelog in release notes), but this provides an official release. Analysis of physical properties results will come at a later date. ### Changes not in a release -- Added physical properties analysis (December 2020-January 2021) -- Fixed two submissions that had errors and updated the overview plots/stats and individual plots for the two affected submissions (4/9/2021) +- Added physical properties analysis for logP and pKa (December 2020-January 2021) +- Fixed two log *P* submissions that had errors and updated the overview plots/stats and individual plots for the two affected submissions (4/9/2021) +- After fixing the two log *P* submissions that had errors, log *D* estimates were regenerated and analysis was done (4/19/2021) ## Challenge overview diff --git a/physical_property/logD/README.md b/physical_property/logD/README.md index bb39f992..d457c4ba 100644 --- a/physical_property/logD/README.md +++ b/physical_property/logD/README.md @@ -1,10 +1,74 @@ ## SAMPL7 log *D* Predictions -Placeholder +Ranked SAMPL7 pKa and log *P* predictions were combined to estimate log *D*7.4. The Mathematica notebook used to do this analysis is available in the manifest. + +General analysis of log *D* predictions include calculated vs predicted log *D* correlation plots and 6 performance statistics (RMSE, MAE, ME, R^2, linear regression slope(m), and error slope(ES)) for all the submissions. +95%-percentile bootstrap confidence intervals of all the statistics were reported. + +Molecular statistics analysis was performed to indicate which molecules were more difficult to predict accurately across submitted methods. Error statistics (MAE and RMSE) were calculated for each molecule averaging across all methods or for all methods within a method category. ## Manifest -- [`analysis/`](analysis/) - Contains analysis of log *D*7.4 predictions generated from log *P* and pKa predictions. -- [`calc_logD.nb`](calc_logD.nb) - Wolfram Mathematica `.nb` file that calculates and exports SAMPL7 distribution coefficients log *D*7.4 for participants that had submitted a ranked log *P* and a ranked pKa submission. The notebook gathers the predicted macroscopic acidity constants and the partition coefficients from [`pKa_submission_collection.csv`](../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_submissions/pKa_submission_collection.csv) and [`logP_submission_collection.csv`](../logP/analysis/analysis_outputs_ranked_submissions/logP_submission_collection.csv), respectively. The log *D*7.4 is then calculated under the assumption that the ionic species can not enter the organic phase [1]. Because the acidity constants listed in [`pKa_submission_collection.csv`](../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_submissions/pKa_submission_collection.csv) do not contain information about the charge states of the protonated and deprotonated species, the consensus of models that had submitted macroscopic pKa values including the charge states was used to determine that eq. 4 should be used for all compounds. +- [`calculate_logD/`](./calculate_logD/) + - `calc_logD.nb` - Wolfram Mathematica `.nb` file that calculates and exports SAMPL7 distribution coefficients log *D*7.4 for participants that had submitted a ranked log *P* and a ranked pKa submission. The notebook gathers the predicted macroscopic acidity constants and the partition coefficients from [`pKa_submission_collection.csv`](../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_submissions/pKa_submission_collection.csv) and [`logP_submission_collection.csv`](../logP/analysis/analysis_outputs_ranked_submissions/logP_submission_collection.csv), respectively. The log *D*7.4 is then calculated under the assumption that the ionic species can not enter the organic phase [1]. Because the acidity constants listed in [`pKa_submission_collection.csv`](../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_submissions/pKa_submission_collection.csv) do not contain information about the charge states of the protonated and deprotonated species, the consensus of models that had submitted macroscopic pKa values including the charge states was used to determine that eq. 4 should be used for all compounds. Notebook created by Nicolas Tielker. + - `logD_submission_collection.csv` - Contains log *D*7.4 predictions generated from log *P* and pKa predictions. + - `logD_predictions/` - Contains SAMPL style submission files created from the log *D* data found in `logD_submission_collection.csv`. One reference method and one null method were added to this folder to be used as a comparison to other methods in the general SAMPL analysis. These submission style files were used as input to the general SAMPL analysis script (`logD_analysis.py`) and the output can be found in `analysis_outputs_all_submissions/` and `analysis_outputs_ranked_submissions/`. +- [`logD_analysis.py`](logD_analysis.py) - Python script that parses submissions and performs the analysis. Provides two separate treatment for ranked blind predictions alone (output directory: [`analysis_outputs_ranked_submissions/`](analysis_outputs_ranked_submissions/)) and ranked and reference calculations together (output directory: [`analysis_outputs_all_submissions/`](analysis_outputs_all_submissions/)). Reference calculations are provided as reference/comparison methods. logD_analysis.py +- [`logD_analysis2.py`](logD_analysis2.py) - Python script that performs the analysis of molecular statistics (Error statistics, MAE and RMSE, calculated across methods for each molecule.) +- [`logD_experimental_values.csv`](logD_experimental_values.csv) - A CSV (`.csv`) table of potentiometric and shake-flask log *D* measurements of the 22 SAMPL molecules. +- [`analysis_outputs_ranked_submissions/`](analysis_outputs_ranked_submissions/) - This directory contain analysis outputs of ranked submissions only. + - `error_for_each_logD.pdf` - Violin plots that show error distribution of predictions related to each experimental log *P*. + - `logDCorrelationPlots/` - This directory contains plots of predicted vs. experimental log *P* values with linear regression line (blue) for each method. Files are named according to the submitted method name of each subission, which can be found in `statistics_table.csv`. In correlation plots, the dashed black line has a slope of 1. Dark and light green shaded areas indicate +-0.5 and +-1.0 log *P* unit error regions, respectively. + - `logDCorrelationPlotsWithSEM/` - This directory contains similar plots to the `logDCorrelationPlots/` directory with error bars added for Standard Error of the Mean (SEM) of experimental and predicted values for submissions that reported these values. Experimental log *P* SEM values are either too small to be able to see the horizontal error bars, or some of the experimental log *P* SEM values were not collected. + - `AbsoluteErrorPlots/` - This directory contains a bar plot for each method showing the absolute error for each log *P* prediction compared to the experimental value. + - `StatisticsTables/` - This directory contains machine-readable copies of the Statistics Table, bootstrap distributions of performance statistics, and overall performance comparison plots based on RMSE and MAE values. + - `statistics.csv`- A table of performance statistics (RMSE, MAE, ME, R^2, linear regression slope(m), Kendall's Tau, and error slope(ES)) for all the submissions. + - `RMSE_vs_method_plot.pdf` + - `RMSE_vs_method_plot_colored_by_method_category.pdf` + - `RMSE_vs_method_plot_for_Physical_MM_category.pdf` + - `RMSE_vs_method_plot_for_Physical_QM_category.pdf` + - `RMSE_vs_method_plot_for_Empirical_category.pdf` + - `RMSE_vs_method_plot_for_Mixed_category.pdf` + - `RMSE_vs_method_plot_physical_methoods_colored_by_method_category.pdf` + - `MAE_vs_method_plot.pdf` + - `MAE_vs_method_plot_colored_by_method_category.pdf` + - `MAE_vs_method_plot_for_Physical_MM_category.pdf` + - `MAE_vs_method_plot_for_Physical_QM_category.pdf` + - `MAE_vs_method_plot_for_Empirical_category.pdf` + - `MAE_vs_method_plot_for_Mixed_category.pdf` + - `kendalls_tau_vs_method_plot.pdf` + - `MAE_vs_method_plot_physical_methoods_colored_by_method_category.pdf` + - `kendalls_tau_vs_method_plot_colored_by_method_category.pdf` + - `kendalls_tau_vs_method_plot_for_Physical_MM_category.pdf` + - `kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf` + - `kendalls_tau_vs_method_plot_for_Empirical_category.pdf` + - `kendalls_tau_vs_method_plot_for_Mixed_category.pdf` + - `kendall_tau_vs_method_plot_physical_methoods_colored_by_method_category.pdf` + - `Rsquared_vs_method_plot.pdf` + - `Rsquared_vs_method_plot_colored_by_method_category.pdf` + - `Rsquared_vs_method_plot_colored_by_type.pdf` + - `Rsquared_vs_method_plot_for_Empirical_category.pdf` + - `Rsquared_vs_method_plot_for_Mixed_category.pdf` + - `Rsquared_vs_method_plot_for_Physical_MM_category.pdf` + - `Rsquared_vs_method_plot_for_Physical_QM_category.pdf` + - `Rsquared_vs_method_plot_physical_methoods_colored_by_method_category.pdf` + - `statistics_bootstrap_distributions.pdf` - Violin plots showing bootstrap distributions of performance statistics of each method. Each method is labelled according to the method name of the submission. + + - `QQPlots/` - Quantile-Quantile plots for the analysis of model uncertainty predictions. + - `MolecularStatisticsTables/` - This directory contains tables and barplots of molecular statistics analysis (Error statistics, MAE and RMSE, calculated across methods for each molecule.) + - `MAE_vs_molecule_ID_plot.pdf` - Barplot of MAE calculated for each molecule averaging over all prediction methods. + - `RMSE_vs_molecule_ID_plot.pdf` - Barplot of RMSE calculated for each molecule averaged over all prediction methods + - `molecular_error_statistics.csv` - MAE and RMSE statistics calculated for each molecule averaged over all prediction methods. 95% confidence intervals were calculated via bootstrapping (10000 samples). + - `molecular_MAE_comparison_between_method_categories.pdf` - Barplot of MAE calculated for each method category for each molecule averaging over all predictions in that method category. The colors of the bars indicate method categories. + - `molecular_error_distribution_ridge_plot_all_methods.pdf`: Error distribution of each molecule, based on predictions from all ranked methods. + - `molecular_error_distribution_ridge_plot_well_performing_methods.pdf`: Error distribution of each molecule based on predictions from only methods who are determined as consistently well-performing methods. + - `Empirical/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods in the Empirical method category. + - `Physical_MM/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods in the Physical MM method category. + - `Physical_QM/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods in the Physical QM method category. + +- [`analysis_outputs_all_submissions/`](analysis_outputs_all_submissions/) - Duplicates the [`analysis_outputs_ranked_submissions/`](analysis_outputs_ranked_submissions/) directory, but reference calculations. Also includes the additional plots: + - `StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf`: Barplot showing overall performance by MAE, with reference calculations colored differently. + - `StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf`: Barplot showing overall performance by RMSE, with reference calculations colored differently. +- [`analysis_different_pKa_logP_combos`](analysis_different_pKa_logP_combos) - Contains similar analysis to `analysis_outputs_all_submissions/` except it includes some additional pKa and log *P* combinations (for log *D* estimation). ## References -[1] Bannan, Caitlin C., Kalistyn H. Burley, Michael Chiu, Michael R. Shirts, Michael K. Gilson, and David L. Mobley. “Blind Prediction of Cyclohexane–water Distribution Coefficients from the SAMPL5 Challenge.” Journal of Computer-Aided Molecular Design 30, no. 11 (November 2016): 927–44. \ No newline at end of file +[1] Bannan, Caitlin C., Kalistyn H. Burley, Michael Chiu, Michael R. Shirts, Michael K. Gilson, and David L. Mobley. “Blind Prediction of Cyclohexane–water Distribution Coefficients from the SAMPL5 Challenge.” Journal of Computer-Aided Molecular Design 30, no. 11 (November 2016): 927–44. diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/README.md b/physical_property/logD/analysis_different_pKa_logP_combos/README.md new file mode 100644 index 00000000..619a13ba --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/README.md @@ -0,0 +1,42 @@ +## What's here +- [`analysis/`](analysis/) - Analysis of log *D* predictions. Analysis is similar to that found in `../analysis_outputs_all_submissions/` except it includes some additional pKa and log *P* combinations (for log *D* estimation). Method name *logP_experimental + EC_RISM* combines the experimental log *P* with the top performing pKa method (based on RMSE), method *logP_experimental + pKa_experimental* combines the experimental log *P* and pKa value, method *TFE MLR + EC_RISM* combines the best performing (based on RMSE) log *P* and pKa methods, method *TFE MLR + pKa_experimental* combines the best performing (based on RMSE) log *P* method with the experimental pKa, method *logP_experimental + DFT_M05-2X_SMD* combines the experimental log *P* with an average performing pKa method, method *NES-1 (GAFF2/OPC3) B + pKa_experimental* combines a log *P* method with average performance with the experimental pKa. + - `error_for_each_logD.pdf` - Violin plots that show error distribution of predictions related to each experimental log *P*. + - `logDCorrelationPlots/` - This directory contains plots of predicted vs. experimental log *P* values with linear regression line (blue) for each method. Files are named according to the submitted method name of each subission, which can be found in `statistics_table.csv`. In correlation plots, the dashed black line has a slope of 1. Dark and light green shaded areas indicate +-0.5 and +-1.0 log *P* unit error regions, respectively. + - `logDCorrelationPlotsWithSEM/` - This directory contains similar plots to the `logDCorrelationPlots/` directory with error bars added for Standard Error of the Mean (SEM) of experimental and predicted values for submissions that reported these values. Experimental log *P* SEM values are either too small to be able to see the horizontal error bars, or some of the experimental log *P* SEM values were not collected. + - `AbsoluteErrorPlots/` - This directory contains a bar plot for each method showing the absolute error for each log *P* prediction compared to the experimental value. + - `StatisticsTables/` - This directory contains machine-readable copies of the Statistics Table, bootstrap distributions of performance statistics, and overall performance comparison plots based on RMSE and MAE values. + - `statistics.csv`- A table of performance statistics (RMSE, MAE, ME, R^2, linear regression slope(m), Kendall's Tau, and error slope(ES)) for all the submissions. + - `RMSE_vs_method_plot.pdf` + - `RMSE_vs_method_plot_colored_by_method_category.pdf` + - `RMSE_vs_method_plot_colored_by_type.pdf` + - `MAE_vs_method_plot.pdf` + - `MAE_vs_method_plot_colored_by_method_category.pdf` + - `MAE_vs_method_plot_colored_by_type.pdf` + - `kendalls_tau_vs_method_plot.pdf` + - `kendalls_tau_vs_method_plot_colored_by_method_category.pdf` + - `kendalls_tau_vs_method_plot_colored_by_type.pdf` + - `Rsquared_vs_method_plot.pdf` + - `Rsquared_vs_method_plot_colored_by_method_category.pdf` + - `Rsquared_vs_method_plot_colored_by_type`.pdf + - `QQPlots/` - Quantile-Quantile plots for the analysis of model uncertainty predictions. + - `MolecularStatisticsTables/` - This directory contains tables and barplots of molecular statistics analysis (Error statistics, MAE and RMSE, calculated across methods for each molecule.) + - `MAE_vs_molecule_ID_plot.pdf` - Barplot of MAE calculated for each molecule averaging over all prediction methods. + - `RMSE_vs_molecule_ID_plot.pdf` - Barplot of RMSE calculated for each molecule averaged over all prediction methods + - `molecular_error_statistics.csv` - MAE and RMSE statistics calculated for each molecule averaged over all prediction methods. 95% confidence intervals were calculated via bootstrapping (10000 samples). + - `Empirical/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods in the empirical method category. + - `Empirical_Experimental_pKa/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods combining an empirical method and experimental pKa. + - `Empirical_QM/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods combining an empirical and QM method. + - `Experimental_logP_QM/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods combining experimental log *P* and QM predictions. + - `Experimental_only/` + - `Physical_MM_Experimental_pKa/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods combining MM methods and experimental pKa. + - `Physical_MM_QM_LEC/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods combining MM and QM+LEC. + - `Physical_QM/` - This directory contains table and barplots of molecular statistics analysis calculated only for methods in the physical QM category. +- [`submission_collection_files/`](logD_submission_collection.csv) - Contains analysis of log *D*7.4 predictions generated from log *P* and pKa predictions using the `calc_logD.nb` notebook found in `../calculate_logD`. + - `experimental_pKa_and_logP_combined.csv` - log *D* predictions generated from the experimental log *P* and pKa values. + - `best_pKa_and_logP_method_combined.csv` - log *D* predictions generated from the top log *P* prediction *P* and pKa predictions. + - `experimental_logP_and_participant_pKa_predictions_combined.csv` - log *D* predictions generated from experimental log *P* and pKa predictions. + - `experimental_pKa_and_participant_logP_predictions_combined.csv` - log *D* predictions generated from experimental log *P* and log *P* predictions. +- [`make_input.ipynb`](make_input.ipynb) - Example of a notebook that takes the log *D* data in `submission_collection_files/` and converts it to SAMPL style submission format to be used as input for analysis. +- [`input_files/`](input_files/) - Contains SAMPL style submission files that were created from the log *D* data generated by [`calc_logD.nb`](calc_logD.nb). Also contains files from `../calculate_logD/logD_predictions/`. These were used as input for the general SAMPL analysis. +- [`user-map2.csv`](user-map2.csv) - manually created user map of all log *D* estimate files. Used as input for the general SAMPL analysis scripts. +- [`experimental_value_files/`](experimental_value_files/) - Contains files that have log *P* and pKa values. Used as input for [`calc_logD.nb`](calc_logD.nb). diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..e4121186 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..ea738aef Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf new file mode 100644 index 00000000..a913017b Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NULL0.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NULL0.pdf new file mode 100644 index 00000000..40f28bd0 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/NULL0.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/REF0_ChemAxon.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/REF0_ChemAxon.pdf new file mode 100644 index 00000000..be972859 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..f0169213 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..4784e358 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..5cd7147e Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_EC_RISM.pdf new file mode 100644 index 00000000..963ca957 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_pKa_experimental.pdf new file mode 100644 index 00000000..01273018 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_MLR_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..3001738c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..0bbdc763 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_EC_RISM.pdf new file mode 100644 index 00000000..f38df3a1 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_pKa_experimental.pdf new file mode 100644 index 00000000..95171994 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/AbsoluteErrorPlots/logP_experimental_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..560f569d Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1cc2d1f5 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv new file mode 100644 index 00000000..2197abf9 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +19,SM44,0.155,0.06,0.25,0.18179658962697842,0.06,0.25 +18,SM43,0.475,0.42,0.53,0.47817360864020925,0.42,0.53 +21,SM46,0.5199999999999999,0.3499999999999999,0.69,0.5470831746635971,0.3499999999999999,0.69 +16,SM41,0.63,0.42,0.84,0.6640783086353597,0.42,0.84 +1,SM26,0.69,0.51,0.87,0.7130918594402827,0.51,0.87 +20,SM45,0.795,0.53,1.06,0.8380035799446206,0.53,1.06 +11,SM36,0.81,0.76,0.8600000000000001,0.8115417426109394,0.76,0.8600000000000001 +0,SM25,0.9800000000000001,0.09,1.87,1.3238202294873729,0.09,1.87 +14,SM39,0.9849999999999999,0.07999999999999985,1.89,1.3376284985002376,0.07999999999999985,1.89 +10,SM35,1.085,0.87,1.3,1.1060967407962108,0.87,1.3 +17,SM42,1.12,0.99,1.2500000000000002,1.1275194011634568,0.99,1.2500000000000002 +3,SM28,1.13,1.0799999999999998,1.18,1.1311056537742175,1.0799999999999998,1.18 +2,SM27,1.145,0.7300000000000001,1.56,1.2178875153313626,0.7300000000000001,1.56 +13,SM38,1.155,1.03,1.28,1.161744378079791,1.03,1.28 +4,SM29,1.23,0.8500000000000001,1.61,1.2873616430514,0.8500000000000001,1.61 +5,SM30,1.405,0.050000000000000266,2.76,1.9519349374402826,0.050000000000000266,2.76 +6,SM31,1.515,1.0699999999999998,1.96,1.5790028499024311,1.0699999999999998,1.96 +12,SM37,1.605,1.45,1.76,1.612467053926994,1.45,1.76 +8,SM33,1.7,0.43999999999999995,2.96,2.1160340261914503,0.43999999999999995,2.96 +7,SM32,1.77,1.0999999999999999,2.44,1.8925643978475342,1.0999999999999999,2.44 +15,SM40,1.88,1.82,1.94,1.880957203128237,1.82,1.94 +9,SM34,2.095,1.36,2.83,2.220191433187688,1.36,2.83 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..86418374 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1f3ee1a0 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/molecular_error_statistics_for_Empirical_Experimental_pKa_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/molecular_error_statistics_for_Empirical_Experimental_pKa_methods.csv new file mode 100644 index 00000000..36f4b4f7 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_Experimental_pKa/molecular_error_statistics_for_Empirical_Experimental_pKa_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +12,SM37,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +5,SM30,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805 +2,SM27,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008 +13,SM38,0.09999999999999998,0.09999999999999998,0.09999999999999998,0.09999999999999998,0.09999999999999998,0.09999999999999998 +20,SM45,0.10999999999999988,0.10999999999999988,0.10999999999999988,0.10999999999999988,0.10999999999999988,0.10999999999999988 +4,SM29,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012 +19,SM44,0.22999999999999998,0.22999999999999998,0.22999999999999998,0.22999999999999998,0.22999999999999998,0.22999999999999998 +21,SM46,0.24999999999999994,0.24999999999999994,0.24999999999999994,0.24999999999999994,0.24999999999999994,0.24999999999999994 +17,SM42,0.27,0.27,0.27,0.27,0.27,0.27 +8,SM33,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025 +16,SM41,0.32,0.32,0.32,0.32,0.32,0.32 +14,SM39,0.32000000000000006,0.32000000000000006,0.32000000000000006,0.32000000000000006,0.32000000000000006,0.32000000000000006 +6,SM31,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993 +1,SM26,0.44000000000000006,0.44000000000000006,0.44000000000000006,0.44000000000000006,0.44000000000000006,0.44000000000000006 +7,SM32,0.47,0.47,0.47,0.47,0.47,0.47 +0,SM25,0.4700000000000001,0.4700000000000001,0.4700000000000001,0.4700000000000001,0.4700000000000001,0.4700000000000001 +10,SM35,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001 +3,SM28,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002 +9,SM34,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002 +15,SM40,0.81,0.81,0.81,0.81,0.81,0.81 +18,SM43,1.8299999999999998,1.8299999999999998,1.8299999999999998,1.8299999999999998,1.8299999999999998,1.8299999999999998 +11,SM36,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..01c08a32 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..4260548b Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/molecular_error_statistics_for_Empirical_QM_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/molecular_error_statistics_for_Empirical_QM_methods.csv new file mode 100644 index 00000000..1aaf3d4e --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Empirical_QM/molecular_error_statistics_for_Empirical_QM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +12,SM37,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +5,SM30,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805,0.029999999999999805 +2,SM27,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008,0.09000000000000008 +13,SM38,0.10999999999999999,0.10999999999999999,0.10999999999999999,0.10999999999999999,0.10999999999999999,0.10999999999999999 +4,SM29,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012,0.14000000000000012 +21,SM46,0.15999999999999992,0.15999999999999992,0.15999999999999992,0.15999999999999992,0.15999999999999992,0.15999999999999992 +1,SM26,0.16999999999999993,0.16999999999999993,0.16999999999999993,0.16999999999999993,0.16999999999999993,0.16999999999999993 +16,SM41,0.18999999999999997,0.18999999999999997,0.18999999999999997,0.18999999999999997,0.18999999999999997,0.18999999999999997 +20,SM45,0.21999999999999997,0.21999999999999997,0.21999999999999997,0.21999999999999997,0.21999999999999997,0.21999999999999997 +19,SM44,0.22000000000000003,0.22000000000000003,0.22000000000000003,0.22000000000000003,0.22000000000000003,0.22000000000000003 +14,SM39,0.28,0.28,0.28,0.28,0.28,0.28 +8,SM33,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025,0.28000000000000025 +6,SM31,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993,0.41999999999999993 +0,SM25,0.45999999999999996,0.45999999999999996,0.45999999999999996,0.45999999999999996,0.45999999999999996,0.45999999999999996 +7,SM32,0.47,0.47,0.47,0.47,0.47,0.47 +10,SM35,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001,0.4900000000000001 +3,SM28,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002,0.6900000000000002 +9,SM34,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002,0.7800000000000002 +15,SM40,0.81,0.81,0.81,0.81,0.81,0.81 +18,SM43,0.97,0.97,0.97,0.97,0.97,0.97 +17,SM42,1.24,1.24,1.24,1.24,1.24,1.24 +11,SM36,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999,1.8699999999999999 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..6e19abeb Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1934a29c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/molecular_error_statistics_for_Experimental_logP_QM_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/molecular_error_statistics_for_Experimental_logP_QM_methods.csv new file mode 100644 index 00000000..c2777037 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_logP_QM/molecular_error_statistics_for_Experimental_logP_QM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +2,SM27,0.0,0.0,0.0,0.0,0.0,0.0 +3,SM28,0.0,0.0,0.0,0.0,0.0,0.0 +6,SM31,0.0,0.0,0.0,0.0,0.0,0.0 +7,SM32,0.0,0.0,0.0,0.0,0.0,0.0 +8,SM33,0.0,0.0,0.0,0.0,0.0,0.0 +9,SM34,0.0,0.0,0.0,0.0,0.0,0.0 +12,SM37,0.0,0.0,0.0,0.0,0.0,0.0 +11,SM36,0.0,0.0,0.0,0.0,0.0,0.0 +5,SM30,0.004999999999999893,0.0,0.009999999999999787,0.007071067811865324,0.0,0.009999999999999787 +10,SM35,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +15,SM40,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +4,SM29,0.015000000000000013,0.0,0.030000000000000027,0.021213203435596444,0.0,0.030000000000000027 +0,SM25,0.39,0.0,0.78,0.5515432893255071,0.0,0.78 +18,SM43,0.46499999999999997,0.43,0.5,0.46631534394656154,0.43,0.5 +19,SM44,0.5549999999999999,0.009999999999999995,1.0999999999999999,0.7778495998584816,0.009999999999999995,1.0999999999999999 +21,SM46,0.5650000000000001,0.10000000000000009,1.03,0.7317444909256235,0.10000000000000009,1.03 +13,SM38,0.66,0.010000000000000009,1.31,0.9263368717696603,0.010000000000000009,1.31 +1,SM26,0.75,0.030000000000000027,1.47,1.0396634070698074,0.030000000000000027,1.47 +14,SM39,0.8049999999999998,0.039999999999999813,1.5699999999999998,1.1105178971993201,0.039999999999999813,1.5699999999999998 +20,SM45,0.805,0.11999999999999988,1.4900000000000002,1.0570004730367912,0.11999999999999988,1.4900000000000002 +16,SM41,0.8350000000000001,0.6700000000000002,1.0,0.8511462858991984,0.6700000000000002,1.0 +17,SM42,0.905,0.76,1.05,0.9165424158215483,0.76,1.05 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..17dc89c4 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..835c0d31 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/molecular_error_statistics_for_Experimental_only_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/molecular_error_statistics_for_Experimental_only_methods.csv new file mode 100644 index 00000000..be8214c2 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Experimental_only/molecular_error_statistics_for_Experimental_only_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +19,SM44,0.0,0.0,0.0,0.0,0.0,0.0 +2,SM27,0.0,0.0,0.0,0.0,0.0,0.0 +3,SM28,0.0,0.0,0.0,0.0,0.0,0.0 +4,SM29,0.0,0.0,0.0,0.0,0.0,0.0 +5,SM30,0.0,0.0,0.0,0.0,0.0,0.0 +6,SM31,0.0,0.0,0.0,0.0,0.0,0.0 +7,SM32,0.0,0.0,0.0,0.0,0.0,0.0 +8,SM33,0.0,0.0,0.0,0.0,0.0,0.0 +9,SM34,0.0,0.0,0.0,0.0,0.0,0.0 +11,SM36,0.0,0.0,0.0,0.0,0.0,0.0 +12,SM37,0.0,0.0,0.0,0.0,0.0,0.0 +13,SM38,0.0,0.0,0.0,0.0,0.0,0.0 +14,SM39,0.0,0.0,0.0,0.0,0.0,0.0 +10,SM35,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +20,SM45,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +15,SM40,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009,0.010000000000000009 +21,SM46,0.01000000000000012,0.01000000000000012,0.01000000000000012,0.01000000000000012,0.01000000000000012,0.01000000000000012 +17,SM42,0.07999999999999996,0.07999999999999996,0.07999999999999996,0.07999999999999996,0.07999999999999996,0.07999999999999996 +0,SM25,0.15,0.15,0.15,0.15,0.15,0.15 +1,SM26,0.58,0.58,0.58,0.58,0.58,0.58 +16,SM41,1.1800000000000002,1.1800000000000002,1.1800000000000002,1.1800000000000002,1.1800000000000002,1.1800000000000002 +18,SM43,1.3599999999999999,1.3599999999999999,1.3599999999999999,1.3599999999999999,1.3599999999999999,1.3599999999999999 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..3360e29f Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1a13868c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..09415548 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/molecular_error_statistics_for_Physical_MM_Experimental_pKa_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/molecular_error_statistics_for_Physical_MM_Experimental_pKa_methods.csv new file mode 100644 index 00000000..9dcac6df --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_Experimental_pKa/molecular_error_statistics_for_Physical_MM_Experimental_pKa_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +2,SM27,0.1200000000000001,0.1200000000000001,0.1200000000000001,0.1200000000000001,0.1200000000000001,0.1200000000000001 +7,SM32,0.1299999999999999,0.1299999999999999,0.1299999999999999,0.1299999999999999,0.1299999999999999,0.1299999999999999 +11,SM36,0.13,0.13,0.13,0.13,0.13,0.13 +14,SM39,0.13000000000000012,0.13000000000000012,0.13000000000000012,0.13000000000000012,0.13000000000000012,0.13000000000000012 +3,SM28,0.20999999999999996,0.20999999999999996,0.20999999999999996,0.20999999999999996,0.20999999999999996,0.20999999999999996 +12,SM37,0.33999999999999986,0.33999999999999986,0.33999999999999986,0.33999999999999986,0.33999999999999986,0.33999999999999986 +4,SM29,0.3900000000000001,0.3900000000000001,0.3900000000000001,0.3900000000000001,0.3900000000000001,0.3900000000000001 +5,SM30,0.7900000000000005,0.7900000000000005,0.7900000000000005,0.7900000000000005,0.7900000000000005,0.7900000000000005 +9,SM34,0.8300000000000001,0.8300000000000001,0.8300000000000001,0.8300000000000001,0.8300000000000001,0.8300000000000001 +10,SM35,0.94,0.94,0.94,0.94,0.94,0.94 +16,SM41,0.94,0.94,0.94,0.94,0.94,0.94 +0,SM25,1.0599999999999998,1.0599999999999998,1.0599999999999998,1.0599999999999998,1.0599999999999998,1.0599999999999998 +6,SM31,1.1600000000000001,1.1600000000000001,1.1600000000000001,1.1600000000000001,1.1600000000000001,1.1600000000000001 +15,SM40,1.2000000000000002,1.2000000000000002,1.2000000000000002,1.2000000000000002,1.2000000000000002,1.2000000000000002 +19,SM44,1.3299999999999998,1.3299999999999998,1.3299999999999998,1.3299999999999998,1.3299999999999998,1.3299999999999998 +13,SM38,1.3800000000000001,1.3800000000000001,1.3800000000000001,1.3800000000000001,1.3800000000000001,1.3800000000000001 +20,SM45,1.5899999999999999,1.5899999999999999,1.5899999999999999,1.5899999999999999,1.5899999999999999,1.5899999999999999 +8,SM33,1.67,1.67,1.67,1.67,1.67,1.67 +18,SM43,1.96,1.96,1.96,1.96,1.96,1.96 +21,SM46,2.0100000000000002,2.0100000000000002,2.0100000000000002,2.0100000000000002,2.0100000000000002,2.0100000000000002 +1,SM26,2.05,2.05,2.05,2.05,2.05,2.05 +17,SM42,2.7,2.7,2.7,2.7,2.7,2.7 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..e778f332 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..b5eeafd9 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv new file mode 100644 index 00000000..686a5999 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +0,SM25,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004 +3,SM28,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002 +7,SM32,1.17,1.17,1.17,1.17,1.17,1.17 +20,SM45,1.48,1.48,1.48,1.48,1.48,1.48 +9,SM34,1.6,1.6,1.6,1.6,1.6,1.6 +5,SM30,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997 +17,SM42,1.7,1.7,1.7,1.7,1.7,1.7 +11,SM36,1.79,1.79,1.79,1.79,1.79,1.79 +14,SM39,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998 +6,SM31,1.94,1.94,1.94,1.94,1.94,1.94 +2,SM27,2.08,2.08,2.08,2.08,2.08,2.08 +18,SM43,2.19,2.19,2.19,2.19,2.19,2.19 +4,SM29,2.46,2.46,2.46,2.46,2.46,2.46 +8,SM33,2.49,2.49,2.49,2.49,2.49,2.49 +1,SM26,2.58,2.58,2.58,2.58,2.58,2.58 +16,SM41,2.81,2.81,2.81,2.81,2.81,2.81 +10,SM35,2.86,2.86,2.86,2.86,2.86,2.86 +12,SM37,2.88,2.88,2.88,2.88,2.88,2.88 +21,SM46,2.89,2.89,2.89,2.89,2.89,2.89 +13,SM38,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998 +15,SM40,3.0,3.0,3.0,3.0,3.0,3.0 +19,SM44,3.52,3.52,3.52,3.52,3.52,3.52 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..fffe6e48 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..ab27e391 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv new file mode 100644 index 00000000..ef1e7f60 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +8,SM33,0.9079999999999998,0.28600000000000014,1.7179999999999995,1.2139522231125899,0.30472938814627,1.8892802862465903 +2,SM27,0.95,0.502,1.35,1.064903751519357,0.6421993459977984,1.3633854920747837 +1,SM26,1.018,0.404,1.722,1.2651719250757978,0.4915689168366934,1.8569652662341316 +7,SM32,1.082,0.568,1.54,1.2168730418576952,0.7479572180278763,1.549877414507354 +16,SM41,1.1700000000000002,0.5780000000000001,1.762,1.3511402591885122,0.6340189271622733,1.813934949219514 +10,SM35,1.218,0.404,2.032,1.562005121630528,0.74677975334097,2.1282434071318064 +6,SM31,1.2659999999999998,0.7339999999999999,1.7560000000000002,1.3821070870232883,0.8930845424706442,1.813764041985616 +5,SM30,1.2879999999999998,0.9640000000000001,1.612,1.3424902234280887,0.9802244640897307,1.6435145268600457 +3,SM28,1.302,0.774,1.782,1.4254753593100091,0.9201195574489219,1.793694511336866 +17,SM42,1.348,0.30399999999999994,2.5420000000000003,1.8534076723700053,0.38042081961953655,2.8410878198323966 +4,SM29,1.4,0.664,2.136,1.6300306745579975,0.8248151308020483,2.1525984298052436 +20,SM45,1.416,0.836,2.012,1.582302120329743,0.9162095830103504,2.1453764238473396 +9,SM34,1.4200000000000004,0.8260000000000002,2.0160000000000005,1.5846640022414846,0.8979198182465962,2.0833866659840177 +11,SM36,1.486,0.8280000000000001,2.286,1.7107133015207427,0.8569013945606577,2.455056007507771 +19,SM44,1.564,0.7780000000000001,2.3720000000000003,1.8267347919169876,0.9353181276977369,2.5689453088767773 +18,SM43,1.8059999999999998,0.788,2.824,2.1421157765162926,0.8823151364450231,2.9181603794171425 +21,SM46,1.9280000000000002,1.348,2.5479999999999996,2.054273594242013,1.3932408262751992,2.6431799030712986 +12,SM37,2.0940000000000003,1.25,2.8440000000000003,2.280539409876532,1.505277383075957,2.850915642385793 +14,SM39,2.432,1.4040000000000004,3.282,2.6723622508933924,1.765548073545436,3.4432687957811248 +15,SM40,2.7,1.44,3.7980000000000005,3.0221714048015214,1.8936367127831042,3.8027647836804213 +13,SM38,2.8259999999999996,1.338,4.206,3.2715592612697697,1.832129908057832,4.247792367807071 +0,SM25,5.132,1.5559999999999998,11.7,8.333695458798575,1.6073083089438691,14.180846237090366 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1d892e76 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/molecular_error_statistics.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/molecular_error_statistics.csv new file mode 100644 index 00000000..c67e4d5a --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/MolecularStatisticsTables/molecular_error_statistics.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +2,SM27,0.6728571428571429,0.32785714285714285,1.0399999999999998,0.9633868826770924,0.5704634707223343,1.2845677427501774 +3,SM28,0.7878571428571428,0.4707142857142857,1.125,1.0058365388357806,0.6712195723350999,1.3008403877054686 +7,SM32,0.7992857142857143,0.4221428571428571,1.197142857142857,1.0821638376089693,0.6524404734752216,1.4588620810167667 +5,SM30,0.8428571428571427,0.4142857142857143,1.3078571428571428,1.1986182521076985,0.7178887697368483,1.6211724152600178 +4,SM29,0.9014285714285715,0.45071428571428573,1.4021428571428574,1.2773689477090904,0.7656229956241984,1.693333567679024 +8,SM33,0.9042857142857141,0.4035714285714286,1.457142857142857,1.3487931113194702,0.6953827312042936,1.8409353212817818 +10,SM35,0.9335714285714286,0.48142857142857143,1.4407142857142858,1.3144770823411112,0.7504046527422008,1.7890041124283964 +6,SM31,0.95,0.5742857142857142,1.3342857142857143,1.195216656988538,0.8343603195605258,1.4990663761154808 +20,SM45,0.9778571428571429,0.5935714285714286,1.3814285714285717,1.2228917718728367,0.8219706286028193,1.5918228544659108 +1,SM26,0.9849999999999998,0.5892857142857143,1.424285714285714,1.2705426287333186,0.7946023264141989,1.675397607392688 +16,SM41,1.0157142857142856,0.6621428571428571,1.4264285714285716,1.2469792070668793,0.7768158450642323,1.6738194133691449 +19,SM44,1.0385714285714287,0.5078571428571429,1.6700000000000002,1.5170836684714712,0.76758433692052,2.1324683886452873 +11,SM36,1.0507142857142855,0.6,1.5307142857142857,1.3670797865732427,0.8798376473613105,1.8077333874219395 +9,SM34,1.0914285714285714,0.6585714285714286,1.5542857142857147,1.3856406460551018,0.9098194483365526,1.8042271792337337 +17,SM42,1.1985714285714284,0.7299999999999999,1.735714285714286,1.5398330150655022,0.9355136098880199,2.0771271369025883 +12,SM37,1.2085714285714284,0.6228571428571428,1.845,1.6821457385477294,1.0778516993140965,2.176515892101739 +21,SM46,1.2235714285714288,0.7264285714285713,1.770714285714286,1.5867779572814482,1.0395431688967995,2.058215246275277 +14,SM39,1.3057142857142858,0.7021428571428572,1.9714285714285713,1.7973472516064175,1.1292475370794484,2.3863765120246336 +18,SM43,1.3728571428571428,0.9192857142857143,1.8714285714285712,1.660735637343541,1.1180020444652912,2.146520440154251 +13,SM38,1.5942857142857143,0.8414285714285715,2.4378571428571427,2.2148589119851403,1.2510738244746848,3.001832773490222 +15,SM40,1.6507142857142856,0.9428571428571428,2.386428571428571,2.1463607072703987,1.3821799552053375,2.7703119989323532 +0,SM25,2.202142857142857,0.6599999999999999,4.841428571428571,5.021585549263442,0.9220667778730254,8.53933921833032 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf new file mode 100644 index 00000000..ac686522 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf new file mode 100644 index 00000000..e8c7e46a Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental_QQ.pdf new file mode 100644 index 00000000..334a41bf Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NULL0_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NULL0_QQ.pdf new file mode 100644 index 00000000..e385e40b Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/NULL0_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/QQplot_dict.pickle b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/QQplot_dict.pickle new file mode 100644 index 00000000..d3aea90d Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/QQplot_dict.pickle differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/REF0_ChemAxon_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/REF0_ChemAxon_QQ.pdf new file mode 100644 index 00000000..07e5f3c3 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/REF0_ChemAxon_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf new file mode 100644 index 00000000..cfd56ef0 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf new file mode 100644 index 00000000..7a7f32ed Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf new file mode 100644 index 00000000..da7763f7 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_EC_RISM_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_EC_RISM_QQ.pdf new file mode 100644 index 00000000..01b48913 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_EC_RISM_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_pKa_experimental_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_pKa_experimental_QQ.pdf new file mode 100644 index 00000000..156b60fe Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_MLR_+_pKa_experimental_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf new file mode 100644 index 00000000..019c341c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_DFT_M05-2X_SMD_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_DFT_M05-2X_SMD_QQ.pdf new file mode 100644 index 00000000..79ce2b4f Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_DFT_M05-2X_SMD_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_EC_RISM_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_EC_RISM_QQ.pdf new file mode 100644 index 00000000..e43b5acc Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_EC_RISM_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_pKa_experimental_QQ.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_pKa_experimental_QQ.pdf new file mode 100644 index 00000000..9f6ac06c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/QQPlots/logP_experimental_+_pKa_experimental_QQ.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot.pdf new file mode 100644 index 00000000..51a5a87a Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..cebaeb9c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..fa125078 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot.pdf new file mode 100644 index 00000000..c7bba8d0 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..3521c17c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..1273cfdd Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot.pdf new file mode 100644 index 00000000..df71d95e Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..f0c8026f Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..3baa6880 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot.pdf new file mode 100644 index 00000000..3603b270 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..d44b4f37 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..fe7ec086 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statistics.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statistics.csv new file mode 100644 index 00000000..52956780 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statistics.csv @@ -0,0 +1,15 @@ +method name,file name,category,type,RMSE,RMSE_lower_bound,RMSE_upper_bound,MAE,MAE_lower_bound,MAE_upper_bound,ME,ME_lower_bound,ME_upper_bound,R2,R2_lower_bound,R2_upper_bound,m,m_lower_bound,m_upper_bound,kendall_tau,kendall_tau_lower_bound,kendall_tau_upper_bound,ES,ES_lower_bound,ES_upper_bound +logP_experimental + EC_RISM,logD-logP_experimental-EC_RISM,Experimental logP + QM pKa,Standard,0.3328321990542273,0.1178404159716166,0.4824228624161776,0.15136363636363634,0.04181818181818182,0.2836363636363637,0.05590909090909091,-0.07636363636363637,0.19954545454545453,0.9100827486012406,0.7825184165636128,0.9850582076274104,1.0255890671828423,0.873810774505985,1.1753498760636119,0.9110650501976791,0.7671312851617348,0.990990990990991,1.4483436466076942,0.765411077425696,1.1792294807370187 +logP_experimental + pKa_experimental,experimental_pKa_and_logP_combined,Experimental logP + Experimental pKa,Standard,0.4049522979763958,0.04028308916564458,0.6182048792490016,0.15409090909090908,0.015454545454545455,0.3272727272727272,0.15045454545454545,0.01136363636363636,0.32409090909090915,0.9357117944599204,0.8691785258022435,0.9983960991484003,1.197112408686458,1.021784511599545,1.3598365116305997,0.974025974025974,0.9074074074074073,1.0,1.1223749602004516,0.5692373714301536,1.0653127898445396 +TFE MLR + EC_RISM,TFE_MLR-EC_RISM,Empirical logP + QM pKa,Standard,0.6378230019513108,0.38635828873304545,0.8827539552188624,0.45909090909090916,0.2936363636363637,0.6577272727272727,0.016363636363636365,-0.26,0.2822727272727273,0.657744061147201,0.3319500238223894,0.8818524739397252,0.8438964940577454,0.6205214630867381,1.023161292947657,0.6681143701449648,0.40092052326118255,0.8648648648648649,1.3863394105513798,1.2847571189279734,1.459362930354251 +TFE MLR + pKa_experimental,logD-TFE_MLR-pKa_experimental,Empirical logP + Experimental pKa,Standard,0.6795921236634916,0.3623471464568951,0.9542631617212213,0.4736363636363637,0.2913636363636364,0.6945454545454545,0.1109090909090909,-0.17681818181818187,0.3859090909090909,0.7025860144062775,0.39834822021046673,0.9172925339578911,1.0154198355613615,0.7347818812862082,1.230697903180309,0.7028216101524953,0.43636363636363634,0.8990825688073395,1.3395074546285146,1.1961460193529632,1.439539294267481 +logP_experimental + DFT_M05-2X_SMD,logD-logP_experimental-DFT_M05_2X_SMD,Experimental logP + QM pKa,Standard,0.7566853555294482,0.49639610283431,0.9699695402902656,0.4645454545454545,0.22636363636363635,0.7222727272727273,-0.1981818181818182,-0.4995454545454546,0.11772727272727272,0.5132128380267931,0.18668060248844645,0.8258619347345122,0.6399137423719727,0.4047360618176918,0.8605529468772984,0.5930735930735931,0.2962962962962963,0.8433179723502304,0.9252183783933443,0.45365948198291745,1.0067416974680565 +REF0 ChemAxon,logD-REF0-ChemAxon-Bergazin,Empirical (ref),Reference,1.055933451759842,0.8311711343312247,1.2651464162481172,0.9104545454545456,0.6950000000000001,1.1386363636363637,0.285,-0.150909090909091,0.7090909090909091,0.26674767037924135,0.011541563379341993,0.575045697474495,0.535610976813044,0.088438238908957,0.909666795501415,0.30735930735930733,-0.022624434389140267,0.6,0.12324708944169888,-0.0,0.2968838683777012 +TFE IEFPCM MST + IEFPCM/MST,logD-TFE_IEFPCM_MST-IEFPCM_MST,QM logP + QM pKa,Standard,1.2715702533053015,0.8482629309359215,1.6418254807046053,0.9818181818181817,0.6695454545454547,1.3395454545454546,0.2427272727272727,-0.29636363636363633,0.7463636363636362,0.5463858543100745,0.17807815357429327,0.8695046830326694,1.3073698729337788,0.7298428999960644,1.7088254411977122,0.567099567099567,0.2767857142857143,0.817351598173516,1.1577031161316225,0.8861940556778384,1.2494566496393817 +NES-1 (GAFF2/OPC3) B + pKa_experimental,logD-NES_1_GAFF2_OPC3_B-pKa_experimental,MM logP + Experimental pKa,Standard,1.2744125206255912,0.9594269122762816,1.558517652363063,1.0481818181818183,0.7468181818181819,1.349090909090909,-0.35636363636363644,-0.853181818181818,0.14818181818181816,0.510630921468572,0.1544432762034976,0.7740626612569659,1.2126016903091037,0.5904899921137231,1.6361621563236846,0.46854774010166356,0.1493212669683258,0.747714288761701,1.2062931738582723,1.086714564558329,1.3162783615044924 +NULL0,logD-NULL0-Bergazin,Empirical (ref),Reference,1.5909545448052134,1.212255598610968,1.9231886401116622,1.3509090909090908,1.0081818181818183,1.7131818181818181,1.2254545454545451,0.8013636363636365,1.6518181818181816,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.6478951229979097,0.42976590943699217,0.8494677242964134 +EC_RISM_wet + EC_RISM,logD-EC_RISM_wet-EC_RISM,QM logP + QM pKa,Standard,1.6920267567194514,1.3005156320055937,2.055711555641987,1.4318181818181817,1.0654545454545454,1.8131818181818182,-1.4318181818181817,-1.8127272727272727,-1.0654545454545452,0.534429811315914,0.21255298193624175,0.7803181770858493,0.9505925394859067,0.5545384590880674,1.2988771957323753,0.500004725964924,0.20814692725366699,0.7327188940092167,0.8359704860390108,0.6339410551379489,1.0292232512424384 +TFE-NHLBI-TZVP-QM + TZVP-QM,logD-TFE_NHLBI_TZVP_QM-TZVP_QM,QM logP + QM pKa,Standard,1.7173896682836056,1.2877552703691941,2.137390082234959,1.470909090909091,1.1154545454545453,1.865,1.2581818181818183,0.7704545454545454,1.7672727272727278,0.2543523095779817,0.00769038613082499,0.6416302001350017,0.6390157606431108,0.0556891075543376,1.245343705642982,0.37662337662337664,0.009259259259259259,0.7064220183486238,0.047205725597685405,-0.0,0.17102039121225968 +TFE b3lypd3 + DFT_M05-2X_SMD,logD-TFE_b3lypd3-DFT_M05_2X_SMD,QM logP + QM pKa,Standard,2.1486274688740252,1.5474627326985646,2.72275428731476,1.7799999999999998,1.3072727272727276,2.319090909090909,1.7799999999999998,1.3072727272727276,2.319090909090909,0.3192339234671041,0.04025618744024854,0.662722049058934,0.8006777614443499,0.2657727098504458,1.2869901402949144,0.4112554112554112,0.04035874439461883,0.7117117117117118,0.4241316776721072,0.2763126929412905,0.6919999446267149 +MD (CGenFF/TIP3P) + Gaussian_corrected,logD-MD_CGenFF_TIP3P-Gaussian_corrected,MM logP + QM+LEC pKa,Standard,2.2724266164769165,1.96394200062481,2.546751942091231,2.130454545454546,1.7886363636363634,2.4486363636363637,1.843181818181818,1.2349999999999997,2.3431818181818183,0.6178907028584649,0.334302166012526,0.8430726338197605,1.5257966025220564,0.9057419539902958,2.194188481974192,0.6190476190476191,0.3424657534246575,0.8264840182648402,0.8785110123620862,0.7433863532538729,0.9984495480155601 +TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water,QM logP + QM pKa,Standard,4.535362569449507,2.08594539281789,7.150251108114379,2.9159090909090906,1.8813636363636363,4.5636363636363635,2.875,1.7986363636363634,4.551363636363636,0.24858890827216865,0.10314586472062787,0.766840446602634,1.9172620428702987,0.5167280424759192,4.394174562203435,0.5466390301186075,0.20737327188940094,0.8037383177570093,0.5468665642260893,0.3754170300538505,0.7212785691543117 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statisticsLaTex/statistics.tex b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statisticsLaTex/statistics.tex new file mode 100644 index 00000000..3198d1a6 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/StatisticsTables/statisticsLaTex/statistics.tex @@ -0,0 +1,56 @@ +\documentclass{article} +\usepackage[a4paper,margin=0.005in,tmargin=0.5in,lmargin=0.5in,rmargin=0.5in,landscape]{geometry} +\usepackage{booktabs} +\usepackage{longtable} +\pagenumbering{gobble} +\begin{document} +\begin{center} +\scriptsize +\begin{longtable}{|ccccccccc|} +\toprule + method name & file name & category & type & RMSE & MAE & ME & R$^2$ & m & $\tau$ & ES \\ +\midrule +\endhead +\midrule +\multicolumn{11}{r}{{Continued on next page}} \\ +\midrule +\endfoot + +\bottomrule +\endlastfoot + logP_experimental + EC_RISM & logD-logP\_experimental-EC\_RISM & Experimental logP + QM pKa & Standard & 0.33 [0.12, 0.48] & 0.15 [0.04, 0.28] & 0.06 [-0.08, 0.20] & 0.91 [0.78, 0.99] & 1.03 [0.87, 1.18] & 0.91 [0.77, 0.99] & 1.45 [0.77, 1.18] \\ + logP_experimental + pKa_experimental & experimental\_pKa\_and\_logP\_combined & Experimental logP + Experimental pKa & Standard & 0.40 [0.04, 0.62] & 0.15 [0.02, 0.33] & 0.15 [0.01, 0.32] & 0.94 [0.87, 1.00] & 1.20 [1.02, 1.36] & 0.97 [0.91, 1.00] & 1.12 [0.57, 1.07] \\ + TFE MLR + EC_RISM & TFE\_MLR-EC\_RISM & Empirical logP + QM pKa & Standard & 0.64 [0.39, 0.88] & 0.46 [0.29, 0.66] & 0.02 [-0.26, 0.28] & 0.66 [0.33, 0.88] & 0.84 [0.62, 1.02] & 0.67 [0.40, 0.86] & 1.39 [1.28, 1.46] \\ + TFE MLR + pKa_experimental & logD-TFE\_MLR-pKa\_experimental & Empirical logP + Experimental pKa & Standard & 0.68 [0.36, 0.95] & 0.47 [0.29, 0.69] & 0.11 [-0.18, 0.39] & 0.70 [0.40, 0.92] & 1.02 [0.73, 1.23] & 0.70 [0.44, 0.90] & 1.34 [1.20, 1.44] \\ + logP_experimental + DFT_M05-2X_SMD & logD-logP\_experimental-DFT\_M05\_2X\_SMD & Experimental logP + QM pKa & Standard & 0.76 [0.50, 0.97] & 0.46 [0.23, 0.72] & -0.20 [-0.50, 0.12] & 0.51 [0.19, 0.83] & 0.64 [0.40, 0.86] & 0.59 [0.30, 0.84] & 0.93 [0.45, 1.01] \\ + REF0 ChemAxon & logD-REF0-ChemAxon-Bergazin & Empirical (ref) & Reference & 1.06 [0.83, 1.27] & 0.91 [0.70, 1.14] & 0.28 [-0.15, 0.71] & 0.27 [0.01, 0.58] & 0.54 [0.09, 0.91] & 0.31 [-0.02, 0.60] & 0.12 [-0.00, 0.30] \\ + TFE IEFPCM MST + IEFPCM/MST & logD-TFE\_IEFPCM\_MST-IEFPCM\_MST & QM logP + QM pKa & Standard & 1.27 [0.85, 1.64] & 0.98 [0.67, 1.34] & 0.24 [-0.30, 0.75] & 0.55 [0.18, 0.87] & 1.31 [0.73, 1.71] & 0.57 [0.28, 0.82] & 1.16 [0.89, 1.25] \\ + NES-1 (GAFF2/OPC3) B + pKa_experimental & logD-NES\_1\_GAFF2\_OPC3\_B-pKa\_experimental & MM logP + Experimental pKa & Standard & 1.27 [0.96, 1.56] & 1.05 [0.75, 1.35] & -0.36 [-0.85, 0.15] & 0.51 [0.15, 0.77] & 1.21 [0.59, 1.64] & 0.47 [0.15, 0.75] & 1.21 [1.09, 1.32] \\ + NULL0 & logD-NULL0-Bergazin & Empirical (ref) & Reference & 1.59 [1.21, 1.92] & 1.35 [1.01, 1.71] & 1.23 [0.80, 1.65] & 0.00 [0.00, 0.00] & 0.00 [0.00, 0.00] & nan [nan, nan] & 0.65 [0.43, 0.85] \\ + EC_RISM_wet + EC_RISM & logD-EC\_RISM\_wet-EC\_RISM & QM logP + QM pKa & Standard & 1.69 [1.30, 2.06] & 1.43 [1.07, 1.81] & -1.43 [-1.81, -1.07] & 0.53 [0.21, 0.78] & 0.95 [0.55, 1.30] & 0.50 [0.21, 0.73] & 0.84 [0.63, 1.03] \\ + TFE-NHLBI-TZVP-QM + TZVP-QM & logD-TFE\_NHLBI\_TZVP\_QM-TZVP\_QM & QM logP + QM pKa & Standard & 1.72 [1.29, 2.14] & 1.47 [1.12, 1.86] & 1.26 [0.77, 1.77] & 0.25 [0.01, 0.64] & 0.64 [0.06, 1.25] & 0.38 [0.01, 0.71] & 0.05 [-0.00, 0.17] \\ + TFE b3lypd3 + DFT_M05-2X_SMD & logD-TFE\_b3lypd3-DFT\_M05\_2X\_SMD & QM logP + QM pKa & Standard & 2.15 [1.55, 2.72] & 1.78 [1.31, 2.32] & 1.78 [1.31, 2.32] & 0.32 [0.04, 0.66] & 0.80 [0.27, 1.29] & 0.41 [0.04, 0.71] & 0.42 [0.28, 0.69] \\ + MD (CGenFF/TIP3P) + Gaussian_corrected & logD-MD\_CGenFF\_TIP3P-Gaussian\_corrected & MM logP + QM+LEC pKa & Standard & 2.27 [1.96, 2.55] & 2.13 [1.79, 2.45] & 1.84 [1.23, 2.34] & 0.62 [0.33, 0.84] & 1.53 [0.91, 2.19] & 0.62 [0.34, 0.83] & 0.88 [0.74, 1.00] \\ + TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_... & logD-TFE\_SMD\_solvent\_opt-DFT\_M06\_2X\_SMD\... & QM logP + QM pKa & Standard & 4.54 [2.09, 7.15] & 2.92 [1.88, 4.56] & 2.88 [1.80, 4.55] & 0.25 [0.10, 0.77] & 1.92 [0.52, 4.39] & 0.55 [0.21, 0.80] & 0.55 [0.38, 0.72] \\ +\end{longtable} +\end{center} + +Notes + +- RMSE: Root mean square error + +- MAE: Mean absolute error + +- ME: Mean error + +- R2: R-squared, square of Pearson correlation coefficient + +- m: slope of the line fit to predicted vs experimental logD values + +- $\tau$: Kendall rank correlation coefficient + +- ES: error slope calculated from the QQ Plots of model uncertainty predictions + +- Mean and 95\% confidence intervals of RMSE, MAE, ME, R2, and m were calculated by bootstrapping with 10000 samples. + +- 95\% confidence intervals of ES were calculated by bootstrapping with 1000 samples.\end{document} diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/error_for_each_logD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/error_for_each_logD.pdf new file mode 100644 index 00000000..fc9cb640 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/error_for_each_logD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..92c592a1 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..7319054a Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf new file mode 100644 index 00000000..0d8a8af1 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/REF0_ChemAxon.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/REF0_ChemAxon.pdf new file mode 100644 index 00000000..c407bb62 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..d12819b3 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..b1ea6dbf Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..92b7241b Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_EC_RISM.pdf new file mode 100644 index 00000000..cedc2246 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_pKa_experimental.pdf new file mode 100644 index 00000000..1577254e Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_MLR_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..093eef25 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..759fde60 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_EC_RISM.pdf new file mode 100644 index 00000000..aba9b9c1 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_pKa_experimental.pdf new file mode 100644 index 00000000..8cbcc419 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlots/logP_experimental_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..dcfa6141 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..036ff980 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf new file mode 100644 index 00000000..6065b47b Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/NES-1_GAFF2_OPC3_B_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf new file mode 100644 index 00000000..adff5151 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..d99d94c5 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..dc23a287 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..1db063ee Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_EC_RISM.pdf new file mode 100644 index 00000000..63922260 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_pKa_experimental.pdf new file mode 100644 index 00000000..eda1b154 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_MLR_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..fefcb8f9 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..5092a460 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_EC_RISM.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_EC_RISM.pdf new file mode 100644 index 00000000..cd49db6c Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_pKa_experimental.pdf b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_pKa_experimental.pdf new file mode 100644 index 00000000..a4180d74 Binary files /dev/null and b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logDCorrelationPlotsWithSEM/logP_experimental_+_pKa_experimental.pdf differ diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logD_submission_collection.csv b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logD_submission_collection.csv new file mode 100644 index 00000000..e9f26563 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/analysis/logD_submission_collection.csv @@ -0,0 +1,309 @@ +,method_name,category,Molecule ID,logD (calc),logD SEM (calc),logD (exp),logD SEM (exp),$\Delta$logD error (calc - exp),logD model uncertainty +0,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM25,-0.38,0.18,-0.09,0.01,-0.29000000000000004,2.3999880195383225 +1,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM26,-3.45,0.11,-0.87,0.06,-2.58,2.3999886092463867 +2,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM27,-0.52,0.13,1.56,0.11,-2.08,2.3979742881675223 +3,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM28,1.85,0.13,1.18,0.08,0.6700000000000002,1.5 +4,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM29,-0.85,0.15,1.61,0.11,-2.46,2.3970721226093223 +5,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM30,1.07,0.14,2.76,0.19,-1.6899999999999997,2.3986675007101423 +6,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM31,0.02,0.13,1.96,0.14,-1.94,2.3983151936120537 +7,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM32,1.27,0.16,2.44,0.17,-1.17,2.39855103740242 +8,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM33,5.45,0.16,2.96,0.21,2.49,1.5 +9,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM34,1.23,0.14,2.83,0.2,-1.6,2.39892838869771 +10,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM35,-1.99,0.36,0.87,0.06,-2.86,2.399809083536587 +11,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM36,-1.03,0.6,0.76,0.05,-1.79,2.399648783489643 +12,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM37,-1.43,0.75,1.45,0.1,-2.88,2.3997498106923096 +13,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM38,-1.94,0.41,1.03,0.07,-2.9699999999999998,2.3995443305472417 +14,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM39,0.08,0.41,1.89,0.13,-1.8099999999999998,2.399608172679832 +15,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM40,-1.18,0.31,1.82,0.13,-3.0,2.3996953304337554 +16,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM41,-3.23,0.18,-0.42,0.03,-2.81,2.3999972682711226 +17,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM42,-0.71,0.13,0.99,0.07,-1.7,2.399997478807581 +18,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM43,-1.77,0.16,0.42,0.03,-2.19,2.399994399413065 +19,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM44,-3.46,0.22,0.06,0.0,-3.52,2.3999104647733165 +20,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM45,-0.42,0.41,1.06,0.07,-1.48,2.3998867609877292 +21,MD (CGenFF/TIP3P) + Gaussian_corrected,MM logP + QM+LEC pKa,SM46,-2.2,0.29,0.69,0.05,-2.89,2.3999411369110883 +22,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM25,-0.24,0.04990175982820839,-0.09,0.01,-0.15,0.0 +23,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM26,-1.45,0.04974237622905218,-0.87,0.06,-0.58,0.0 +24,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM27,1.56,0.11000003171656936,1.56,0.11,0.0,0.0 +25,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM28,1.18,0.08,1.18,0.08,0.0,0.0 +26,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM29,1.61,0.030000199580283982,1.61,0.11,0.0,0.0 +27,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM30,2.76,0.19000006621276927,2.76,0.19,0.0,0.0 +28,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM31,1.96,0.14000000230065585,1.96,0.14,0.0,0.0 +29,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM32,2.44,0.17000003171656936,2.44,0.17,0.0,0.0 +30,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM33,2.96,0.21,2.96,0.21,0.0,0.0 +31,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM34,2.83,0.2000000000348365,2.83,0.2,0.0,0.0 +32,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM35,0.88,0.02000045616483536,0.87,0.06,0.010000000000000009,0.0 +33,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM36,0.76,0.05000062893960167,0.76,0.05,0.0,0.0 +34,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM37,1.45,0.10000005508585408,1.45,0.1,0.0,0.0 +35,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM38,1.03,0.07000326718909411,1.03,0.07,0.0,0.0 +36,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM39,1.89,0.13000009135794513,1.89,0.13,0.0,0.0 +37,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM40,1.83,0.05000172321773445,1.82,0.13,0.010000000000000009,0.0 +38,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM41,-1.6,0.0594766376651706,-0.42,0.03,-1.1800000000000002,0.0 +39,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM42,0.91,0.0594234547197126,0.99,0.07,-0.07999999999999996,0.0 +40,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM43,-0.94,0.04870466475253714,0.42,0.03,-1.3599999999999999,0.0 +41,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM44,0.06,0.0638472905132671,0.06,0.0,0.0,0.0 +42,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM45,1.07,0.07742105380207609,1.06,0.07,0.010000000000000009,0.0 +43,logP_experimental + pKa_experimental,Experimental logP + Experimental pKa,SM46,0.7000000000000001,0.0427763943395595,0.69,0.05,0.01000000000000012,0.0 +44,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM25,0.6900000000000001,0.02344107623307046,-0.09,0.01,0.78,1.397871928239328 +45,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM26,-0.84,0.023361138100292885,-0.87,0.06,0.030000000000000027,1.3895583624304602 +46,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM27,1.56,0.11000003944649507,1.56,0.11,0.0,4.102435487131644e-06 +47,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM28,1.18,0.08,1.18,0.08,0.0,0.0 +48,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM29,1.61,0.03000014948954796,1.61,0.11,0.0,1.554691298840579e-05 +49,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM30,2.76,0.19000134536305704,2.76,0.19,0.0,0.00013991775793004748 +50,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM31,1.96,0.14000000043383826,1.96,0.14,0.0,4.5119173878258493e-08 +51,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM32,2.44,0.17000002730588126,2.44,0.17,0.0,2.839811647164077e-06 +52,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM33,2.96,0.21,2.96,0.21,0.0,0.0 +53,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM34,2.83,0.2000000136966416,2.83,0.2,0.0,1.4244507227482188e-06 +54,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM35,0.88,0.020000571108298257,0.87,0.06,0.010000000000000009,5.939526301887661e-05 +55,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM36,0.76,0.0500013148643359,0.76,0.05,0.0,0.00013674589093300572 +56,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM37,1.45,0.1000001107628115,1.45,0.1,0.0,1.151933239564395e-05 +57,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM38,1.02,0.07000202703601836,1.03,0.07,-0.010000000000000009,0.0002108117459088799 +58,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM39,1.85,0.1300919023711536,1.89,0.13,-0.039999999999999813,0.009557846599972148 +59,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM40,1.83,0.05000134536305703,1.82,0.13,0.010000000000000009,0.00013991775793004748 +60,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM41,-1.09,0.03314269632174524,-0.42,0.03,-0.6700000000000002,1.366840417461505 +61,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM42,-0.06,0.04330860066492997,0.99,0.07,-1.05,1.3840944691527168 +62,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM43,-0.08,0.020713282071424373,0.42,0.03,-0.5,1.114181335428135 +63,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM44,0.05,0.04169725306161642,0.06,0.0,-0.009999999999999995,1.2165143184081078 +64,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM45,1.18,0.05257550295125325,1.06,0.07,0.11999999999999988,1.3078523069303378 +65,logP_experimental + EC_RISM,Experimental logP + QM pKa,SM46,0.79,0.020713297459104524,0.69,0.05,0.10000000000000009,1.1141829357468709 +66,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM25,1.21,0.0,-0.09,0.01,1.3,0.0 +67,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM26,0.01,0.0,-0.87,0.06,0.88,0.0 +68,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM27,0.27,0.0,1.56,0.11,-1.29,0.0 +69,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM28,-0.23,0.0,1.18,0.08,-1.41,0.0 +70,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM29,-0.03,0.0,1.61,0.11,-1.6400000000000001,0.0 +71,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM30,1.52,0.0,2.76,0.19,-1.2399999999999998,0.0 +72,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM31,0.76,0.0,1.96,0.14,-1.2,0.0 +73,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM32,0.77,0.0,2.44,0.17,-1.67,0.0 +74,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM33,3.12,0.0,2.96,0.21,0.16000000000000014,0.0 +75,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM34,1.89,0.0,2.83,0.2,-0.9400000000000002,0.0 +76,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM35,-1.7,0.0,0.87,0.06,-2.57,0.0 +77,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM36,-0.36,0.0,0.76,0.05,-1.12,0.0 +78,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM37,0.06,0.0,1.45,0.1,-1.39,0.0 +79,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM38,-2.58,0.0,1.03,0.07,-3.6100000000000003,0.0 +80,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM39,-0.48,0.0,1.89,0.13,-2.37,0.0 +81,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM40,-1.67,0.0,1.82,0.13,-3.49,0.0 +82,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM41,-0.74,0.0,-0.42,0.03,-0.32,0.0 +83,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM42,0.78,0.0,0.99,0.07,-0.20999999999999996,0.0 +84,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM43,-0.14,0.0,0.42,0.03,-0.56,0.0 +85,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM44,-1.73,0.0,0.06,0.0,-1.79,0.0 +86,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM45,-0.52,0.0,1.06,0.07,-1.58,0.0 +87,TFE-NHLBI-TZVP-QM + TZVP-QM,QM logP + QM pKa,SM46,-0.93,0.0,0.69,0.05,-1.62,0.0 +88,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM25,-0.09,0.693823663460283,-0.09,0.01,0.0,1.0941178615364526 +89,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM26,0.6,0.29112560896259204,-0.87,0.06,1.47,0.4498009743401473 +90,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM27,1.56,0.11001296190914142,1.56,0.11,0.0,2.073905462627814e-05 +91,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM28,1.18,0.08,1.18,0.08,0.0,0.0 +92,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM29,1.58,0.03252410569995358,1.61,0.11,-0.030000000000000027,0.004038569119925757 +93,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM30,2.75,0.19047139281681505,2.76,0.19,-0.009999999999999787,0.0007542285069040628 +94,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM31,1.96,0.1400000000237929,1.96,0.14,0.0,3.8068618612047945e-11 +95,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM32,2.44,0.17002823983763338,2.44,0.17,0.0,4.51837402133877e-05 +96,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM33,2.96,0.21,2.96,0.21,0.0,0.0 +97,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM34,2.83,0.20000029914280965,2.83,0.2,0.0,4.786284954241563e-07 +98,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM35,0.88,0.02000043227744815,0.87,0.06,0.010000000000000009,6.91643917039198e-07 +99,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM36,0.76,0.050080719822163916,0.76,0.05,0.0,0.00012915171546227784 +100,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM37,1.45,0.10000494444503068,1.45,0.1,0.0,7.911112049091283e-06 +101,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM38,-0.28,0.6908838646523137,1.03,0.07,-1.31,0.9934141834437016 +102,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM39,0.32,0.7799094360425363,1.89,0.13,-1.5699999999999998,1.0398550976680578 +103,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM40,1.81,0.05120044198792785,1.82,0.13,-0.010000000000000009,0.0019207071806845567 +104,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM41,0.58,0.0200001247618502,-0.42,0.03,1.0,1.9961896031780734e-07 +105,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM42,1.75,0.03049300672398178,0.99,0.07,0.76,0.0007888107583708566 +106,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM43,0.85,0.010014873015340854,0.42,0.03,0.43,2.3796824545365085e-05 +107,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM44,1.16,0.03007040592493289,0.06,0.0,1.0999999999999999,0.00011264947989263564 +108,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM45,2.5500000000000003,0.04000015704952115,1.06,0.07,1.4900000000000002,2.5127923383730587e-07 +109,logP_experimental + DFT_M05-2X_SMD,Experimental logP + QM pKa,SM46,1.72,0.01000000004329575,0.69,0.05,1.03,6.927319449141272e-11 +110,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM25,-1.15,0.1199017598282084,-0.09,0.01,-1.0599999999999998,2.0 +111,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM26,-2.92,0.1397423762290522,-0.87,0.06,-2.05,2.0 +112,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM27,1.44,0.17000003171656936,1.56,0.11,-0.1200000000000001,2.0 +113,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM28,1.39,0.13,1.18,0.08,0.20999999999999996,2.0 +114,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM29,1.22,0.100000199580284,1.61,0.11,-0.3900000000000001,2.0 +115,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM30,3.5500000000000003,0.19000006621276927,2.76,0.19,0.7900000000000005,2.0 +116,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM31,3.12,0.15000000230065585,1.96,0.14,1.1600000000000001,2.0 +117,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM32,2.31,0.09000003171656935,2.44,0.17,-0.1299999999999999,2.0 +118,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM33,4.63,0.19,2.96,0.21,1.67,2.0 +119,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM34,3.66,0.1400000000348365,2.83,0.2,0.8300000000000001,2.0 +120,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM35,-0.07,0.22000045616483527,0.87,0.06,-0.94,2.0 +121,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM36,0.89,0.2000006289396017,0.76,0.05,0.13,2.0 +122,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM37,1.11,0.10000005508585408,1.45,0.1,-0.33999999999999986,2.0 +123,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM38,-0.35000000000000003,0.1300032671890941,1.03,0.07,-1.3800000000000001,2.0 +124,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM39,2.02,0.29000009135794513,1.89,0.13,0.13000000000000012,2.0 +125,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM40,0.62,0.15000172321773445,1.82,0.13,-1.2000000000000002,2.0 +126,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM41,0.52,0.1294766376651706,-0.42,0.03,0.94,2.0 +127,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM42,3.69,0.1994234547197126,0.99,0.07,2.7,2.0 +128,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM43,2.38,0.13870466475253715,0.42,0.03,1.96,2.0 +129,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM44,1.39,0.0938472905132671,0.06,0.0,1.3299999999999998,2.0 +130,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM45,2.65,0.21742105380207608,1.06,0.07,1.5899999999999999,2.0 +131,NES-1 (GAFF2/OPC3) B + pKa_experimental,MM logP + Experimental pKa,SM46,2.7,0.1427763943395595,0.69,0.05,2.0100000000000002,2.0 +132,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM25,-0.56,0.039901759828208386,-0.09,0.01,-0.4700000000000001,1.36 +133,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM26,-1.31,0.03974237622905219,-0.87,0.06,-0.44000000000000006,1.36 +134,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM27,1.47,3.171656935122753e-08,1.56,0.11,-0.09000000000000008,1.36 +135,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM28,1.87,0.0,1.18,0.08,0.6900000000000002,1.36 +136,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM29,1.47,1.9958028397883465e-07,1.61,0.11,-0.14000000000000012,1.36 +137,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM30,2.73,6.621276927099838e-08,2.76,0.19,-0.029999999999999805,1.36 +138,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM31,1.54,2.3006558391672453e-09,1.96,0.14,-0.41999999999999993,1.36 +139,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM32,1.97,3.171656935122753e-08,2.44,0.17,-0.47,1.36 +140,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM33,3.24,0.0,2.96,0.21,0.28000000000000025,1.36 +141,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM34,2.05,3.4836487372642844e-11,2.83,0.2,-0.7800000000000002,1.36 +142,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM35,1.36,4.5616483535177537e-07,0.87,0.06,0.4900000000000001,1.36 +143,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM36,2.63,6.28939601658209e-07,0.76,0.05,1.8699999999999999,1.36 +144,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM37,1.44,5.5085854073942854e-08,1.45,0.1,-0.010000000000000009,1.36 +145,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM38,0.93,3.2671890940976683e-06,1.03,0.07,-0.09999999999999998,1.36 +146,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM39,2.21,9.135794513077607e-08,1.89,0.13,0.32000000000000006,1.36 +147,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM40,1.01,1.7232177344455449e-06,1.82,0.13,-0.81,1.36 +148,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM41,-0.74,0.039476637665170594,-0.42,0.03,-0.32,1.36 +149,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM42,0.72,0.029423454719712593,0.99,0.07,-0.27,1.36 +150,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM43,-1.41,0.038704664752537137,0.42,0.03,-1.8299999999999998,1.36 +151,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM44,0.29,0.03384729051326709,0.06,0.0,0.22999999999999998,1.36 +152,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM45,1.17,0.03742105380207609,1.06,0.07,0.10999999999999988,1.36 +153,TFE MLR + pKa_experimental,Empirical logP + Experimental pKa,SM46,0.44,0.0327763943395595,0.69,0.05,-0.24999999999999994,1.36 +154,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM25,2.25,0.02344107623307046,-0.09,0.01,2.34,2.447871928239328 +155,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM26,0.51,0.023361138100292885,-0.87,0.06,1.38,2.43955836243046 +156,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM27,2.21,0.010000039446495069,1.56,0.11,0.6499999999999999,1.0500041024354871 +157,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM28,2.18,0.01,1.18,0.08,1.0000000000000002,1.05 +158,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM29,2.07,0.010000149489547966,1.61,0.11,0.45999999999999974,1.0500155469129884 +159,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM30,3.78,0.01000134536305702,2.76,0.19,1.02,1.05013991775793 +160,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM31,3.27,0.010000000433838209,1.96,0.14,1.31,1.0500000451191742 +161,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM32,2.59,0.010000027305881223,2.44,0.17,0.1499999999999999,1.0500028398116472 +162,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM33,5.27,0.01,2.96,0.21,2.3099999999999996,1.05 +163,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM34,5.2700000000000005,0.010000013696641566,2.83,0.2,2.4400000000000004,1.0500014244507228 +164,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM35,0.95,0.010000571108298259,0.87,0.06,0.07999999999999996,1.0500593952630188 +165,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM36,2.59,0.010001314864335895,0.76,0.05,1.8299999999999998,1.050136745890933 +166,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM37,2.14,0.0100001107628115,1.45,0.1,0.6900000000000002,1.0500115193323958 +167,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM38,2.29,0.010002027036018357,1.03,0.07,1.26,1.050210811745909 +168,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM39,4.12,0.010091902371153578,1.89,0.13,2.2300000000000004,1.0595578465999722 +169,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM40,3.61,0.01000134536305702,1.82,0.13,1.7899999999999998,1.05013991775793 +170,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM41,1.64,0.023142696321745242,-0.42,0.03,2.06,2.416840417461505 +171,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM42,4.44,0.023308600664929967,0.99,0.07,3.45,2.4340944691527167 +172,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM43,3.34,0.020713282071424373,0.42,0.03,2.92,2.164181335428135 +173,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM44,0.51,0.02169725306161642,0.06,0.0,0.45,2.266514318408108 +174,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM45,1.8,0.022575502951253247,1.06,0.07,0.74,2.357852306930338 +175,EC_RISM_wet + EC_RISM,QM logP + QM pKa,SM46,1.63,0.020713297459104524,0.69,0.05,0.94,2.1641829357468705 +176,NULL0,Empirical (ref),SM25,0.0,0.0,-0.09,0.01,0.09,1.0 +177,NULL0,Empirical (ref),SM26,0.0,0.0,-0.87,0.06,0.87,1.0 +178,NULL0,Empirical (ref),SM27,0.0,0.0,1.56,0.11,-1.56,1.0 +179,NULL0,Empirical (ref),SM28,0.0,0.0,1.18,0.08,-1.18,1.0 +180,NULL0,Empirical (ref),SM29,0.0,0.0,1.61,0.11,-1.61,1.0 +181,NULL0,Empirical (ref),SM30,0.0,0.0,2.76,0.19,-2.76,1.0 +182,NULL0,Empirical (ref),SM31,0.0,0.0,1.96,0.14,-1.96,1.0 +183,NULL0,Empirical (ref),SM32,0.0,0.0,2.44,0.17,-2.44,1.0 +184,NULL0,Empirical (ref),SM33,0.0,0.0,2.96,0.21,-2.96,1.0 +185,NULL0,Empirical (ref),SM34,0.0,0.0,2.83,0.2,-2.83,1.0 +186,NULL0,Empirical (ref),SM35,0.0,0.0,0.87,0.06,-0.87,1.0 +187,NULL0,Empirical (ref),SM36,0.0,0.0,0.76,0.05,-0.76,1.0 +188,NULL0,Empirical (ref),SM37,0.0,0.0,1.45,0.1,-1.45,1.0 +189,NULL0,Empirical (ref),SM38,0.0,0.0,1.03,0.07,-1.03,1.0 +190,NULL0,Empirical (ref),SM39,0.0,0.0,1.89,0.13,-1.89,1.0 +191,NULL0,Empirical (ref),SM40,0.0,0.0,1.82,0.13,-1.82,1.0 +192,NULL0,Empirical (ref),SM41,0.0,0.0,-0.42,0.03,0.42,1.0 +193,NULL0,Empirical (ref),SM42,0.0,0.0,0.99,0.07,-0.99,1.0 +194,NULL0,Empirical (ref),SM43,0.0,0.0,0.42,0.03,-0.42,1.0 +195,NULL0,Empirical (ref),SM44,0.0,0.0,0.06,0.0,-0.06,1.0 +196,NULL0,Empirical (ref),SM45,0.0,0.0,1.06,0.07,-1.06,1.0 +197,NULL0,Empirical (ref),SM46,0.0,0.0,0.69,0.05,-0.69,1.0 +198,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM25,1.45,0.0,-0.09,0.01,1.54,1.62736237955926 +199,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM26,-3.13,0.0,-0.87,0.06,-2.26,2.506510219032152 +200,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM27,1.75,0.0,1.56,0.11,0.18999999999999995,1.0600000003220549 +201,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM28,0.83,0.0,1.18,0.08,-0.35,1.06 +202,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM29,1.23,0.0,1.61,0.11,-0.3800000000000001,1.060000014275805 +203,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM30,3.53,0.0,2.76,0.19,0.77,1.0600001445262448 +204,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM31,1.61,0.0,1.96,0.14,-0.34999999999999987,1.0600003022939144 +205,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM32,1.63,0.0,2.44,0.17,-0.81,1.0600000019066798 +206,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM33,4.27,0.0,2.96,0.21,1.3099999999999996,1.06 +207,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM34,2.39,0.0,2.83,0.2,-0.43999999999999995,1.0600002390356282 +208,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM35,0.77,0.0,0.87,0.06,-0.09999999999999998,1.060003863243295 +209,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM36,3.73,0.0,0.76,0.05,2.9699999999999998,1.060517639767404 +210,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM37,-1.31,0.0,1.45,0.1,-2.76,2.508112831230342 +211,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM38,0.48,0.0,1.03,0.07,-0.55,1.0600307528312118 +212,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM39,2.45,0.0,1.89,0.13,0.5600000000000003,1.062395960435093 +213,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM40,1.35,0.0,1.82,0.13,-0.47,1.0890629521889412 +214,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM41,-1.45,0.0,-0.42,0.03,-1.03,2.496029608152775 +215,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM42,1.16,0.0,0.99,0.07,0.16999999999999993,2.5024653266494132 +216,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM43,-1.16,0.0,0.42,0.03,-1.5799999999999998,2.50719309548738 +217,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM44,-0.6900000000000001,0.0,0.06,0.0,-0.75,1.7782049976993068 +218,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM45,1.68,0.0,1.06,0.07,0.6199999999999999,1.5093985461690262 +219,TFE IEFPCM MST + IEFPCM/MST,QM logP + QM pKa,SM46,-0.95,0.0,0.69,0.05,-1.64,2.473467961947144 +220,REF0 ChemAxon,Empirical (ref),SM25,1.78,0.0,-0.09,0.01,1.87,0.0 +221,REF0 ChemAxon,Empirical (ref),SM26,-0.36,0.0,-0.87,0.06,0.51,0.0 +222,REF0 ChemAxon,Empirical (ref),SM27,0.83,0.0,1.56,0.11,-0.7300000000000001,0.0 +223,REF0 ChemAxon,Empirical (ref),SM28,0.1,0.0,1.18,0.08,-1.0799999999999998,0.0 +224,REF0 ChemAxon,Empirical (ref),SM29,0.76,0.0,1.61,0.11,-0.8500000000000001,0.0 +225,REF0 ChemAxon,Empirical (ref),SM30,2.81,0.0,2.76,0.19,0.050000000000000266,0.0 +226,REF0 ChemAxon,Empirical (ref),SM31,0.89,0.0,1.96,0.14,-1.0699999999999998,0.0 +227,REF0 ChemAxon,Empirical (ref),SM32,1.34,0.0,2.44,0.17,-1.0999999999999999,0.0 +228,REF0 ChemAxon,Empirical (ref),SM33,3.4,0.0,2.96,0.21,0.43999999999999995,0.0 +229,REF0 ChemAxon,Empirical (ref),SM34,1.47,0.0,2.83,0.2,-1.36,0.0 +230,REF0 ChemAxon,Empirical (ref),SM35,-0.43,0.0,0.87,0.06,-1.3,0.0 +231,REF0 ChemAxon,Empirical (ref),SM36,1.62,0.0,0.76,0.05,0.8600000000000001,0.0 +232,REF0 ChemAxon,Empirical (ref),SM37,-0.31,0.0,1.45,0.1,-1.76,0.0 +233,REF0 ChemAxon,Empirical (ref),SM38,-0.25,0.0,1.03,0.07,-1.28,0.0 +234,REF0 ChemAxon,Empirical (ref),SM39,1.81,0.0,1.89,0.13,-0.07999999999999985,0.0 +235,REF0 ChemAxon,Empirical (ref),SM40,-0.12,0.0,1.82,0.13,-1.94,0.0 +236,REF0 ChemAxon,Empirical (ref),SM41,0.42,0.0,-0.42,0.03,0.84,0.0 +237,REF0 ChemAxon,Empirical (ref),SM42,2.24,0.0,0.99,0.07,1.2500000000000002,0.0 +238,REF0 ChemAxon,Empirical (ref),SM43,0.95,0.0,0.42,0.03,0.53,0.0 +239,REF0 ChemAxon,Empirical (ref),SM44,-0.19,0.0,0.06,0.0,-0.25,0.0 +240,REF0 ChemAxon,Empirical (ref),SM45,1.59,0.0,1.06,0.07,0.53,0.0 +241,REF0 ChemAxon,Empirical (ref),SM46,0.34,0.0,0.69,0.05,-0.3499999999999999,0.0 +242,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM25,-2.33,1.0438236634602829,-0.09,0.01,-2.24,1.8941178615364531 +243,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM26,-0.99,0.6411256089625921,-0.87,0.06,-0.12,1.2498009743401473 +244,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM27,0.49,0.3600129619091414,1.56,0.11,-1.07,0.8000207390546263 +245,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM28,-0.6,0.36,1.18,0.08,-1.7799999999999998,0.8 +246,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM29,-0.54,0.3625241056999536,1.61,0.11,-2.1500000000000004,0.8040385691199258 +247,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM30,0.94,0.360471392816815,2.76,0.19,-1.8199999999999998,0.8007542285069041 +248,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM31,0.58,0.3600000000237929,1.96,0.14,-1.38,0.8000000000380687 +249,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM32,1.24,0.3600282398376334,2.44,0.17,-1.2,0.8000451837402134 +250,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM33,2.55,0.36,2.96,0.21,-0.41000000000000014,0.8 +251,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM34,1.46,0.3600002991428096,2.83,0.2,-1.37,0.8000004786284954 +252,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM35,-0.81,0.36000043227744816,0.87,0.06,-1.6800000000000002,0.8000006916439171 +253,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM36,0.19,0.3600807198221639,0.76,0.05,-0.5700000000000001,0.8001291517154623 +254,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM37,-1.62,0.3600049444450307,1.45,0.1,-3.0700000000000003,0.8000079111120492 +255,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM38,-3.9,0.9808838646523136,1.03,0.07,-4.93,1.7934141834437018 +256,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM39,-2.0,1.0099094360425362,1.89,0.13,-3.8899999999999997,1.8398550976680577 +257,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM40,-1.93,0.36120044198792783,1.82,0.13,-3.75,0.8019207071806846 +258,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM41,-1.03,0.36000012476185017,-0.42,0.03,-0.6100000000000001,0.8000001996189604 +259,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM42,0.23,0.36049300672398177,0.99,0.07,-0.76,0.8007888107583709 +260,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM43,-0.2,0.36001487301534085,0.42,0.03,-0.62,0.8000237968245454 +261,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM44,-1.63,0.3600704059249329,0.06,0.0,-1.69,0.8001126494798927 +262,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM45,-0.5,0.36000015704952115,1.06,0.07,-1.56,0.8000002512792339 +263,TFE b3lypd3 + DFT_M05-2X_SMD,QM logP + QM pKa,SM46,-1.8,0.3600000000432957,0.69,0.05,-2.49,0.8000000000692732 +264,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM25,0.37,0.01344107623307046,-0.09,0.01,0.45999999999999996,2.7578719282393283 +265,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM26,-0.7000000000000001,0.013361138100292885,-0.87,0.06,0.16999999999999993,2.7495583624304603 +266,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM27,1.47,3.94464950685735e-08,1.56,0.11,-0.09000000000000008,1.3600041024354872 +267,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM28,1.87,0.0,1.18,0.08,0.6900000000000002,1.36 +268,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM29,1.47,1.494895479654403e-07,1.61,0.11,-0.14000000000000012,1.3600155469129884 +269,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM30,2.73,1.3453630570196874e-06,2.76,0.19,-0.029999999999999805,1.36013991775793 +270,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM31,1.54,4.3383821036787e-10,1.96,0.14,-0.41999999999999993,1.360000045119174 +271,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM32,1.97,2.7305881222731507e-08,2.44,0.17,-0.47,1.3600028398116473 +272,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM33,3.24,0.0,2.96,0.21,0.28000000000000025,1.36 +273,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM34,2.05,1.3696641564886716e-08,2.83,0.2,-0.7800000000000002,1.3600014244507228 +274,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM35,1.36,5.71108298258429e-07,0.87,0.06,0.4900000000000001,1.3600593952630189 +275,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM36,2.63,1.3148643358942858e-06,0.76,0.05,1.8699999999999999,1.360136745890933 +276,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM37,1.44,1.1076281149657645e-07,1.45,0.1,-0.010000000000000009,1.3600115193323958 +277,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM38,0.92,2.027036018354614e-06,1.03,0.07,-0.10999999999999999,1.360210811745909 +278,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM39,2.17,9.190237115357836e-05,1.89,0.13,0.28,1.3695578465999725 +279,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM40,1.01,1.3453630570196874e-06,1.82,0.13,-0.81,1.36013991775793 +280,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM41,-0.23,0.01314269632174524,-0.42,0.03,0.18999999999999997,2.726840417461505 +281,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM42,-0.25,0.013308600664929969,0.99,0.07,-1.24,2.744094469152717 +282,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM43,-0.55,0.010713282071424373,0.42,0.03,-0.97,2.474181335428135 +283,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM44,0.28,0.011697253061616422,0.06,0.0,0.22000000000000003,2.576514318408108 +284,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM45,1.28,0.012575502951253249,1.06,0.07,0.21999999999999997,2.667852306930338 +285,TFE MLR + EC_RISM,Empirical logP + QM pKa,SM46,0.53,0.010713297459104524,0.69,0.05,-0.15999999999999992,2.4741829357468714 +286,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM25,-18.33,0.0,-0.09,0.01,-18.24,2.1562024272328912 +287,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM26,-0.42,0.0,-0.87,0.06,0.45,1.47 +288,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM27,0.01,0.0,1.56,0.11,-1.55,1.4700000000432958 +289,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM28,-0.79,0.0,1.18,0.08,-1.97,1.47 +290,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM29,-0.76,0.0,1.61,0.11,-2.37,1.4700000019788169 +291,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM30,1.17,0.0,2.76,0.19,-1.5899999999999999,1.470000786180085 +292,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM31,-0.13,0.0,1.96,0.14,-2.09,1.4700000037704035 +293,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM32,0.86,0.0,2.44,0.17,-1.58,1.4700000003770786 +294,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM33,2.61,0.0,2.96,0.21,-0.3500000000000001,1.47 +295,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM34,0.92,0.0,2.83,0.2,-1.9100000000000001,1.4700000002072229 +296,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM35,-0.79,0.0,0.87,0.06,-1.6600000000000001,1.4700000040648111 +297,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM36,-0.18,0.0,0.76,0.05,-0.94,1.470364134111271 +298,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM37,-1.11,0.0,1.45,0.1,-2.56,1.4855093562606103 +299,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM38,-2.75,0.0,1.03,0.07,-3.7800000000000002,1.470000000075239 +300,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM39,-1.22,0.0,1.89,0.13,-3.11,1.4720316802734248 +301,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM40,-2.18,0.0,1.82,0.13,-4.0,1.4704507142861698 +302,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM41,-2.25,0.0,-0.42,0.03,-1.83,2.073783637412573 +303,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM42,-1.16,0.0,0.99,0.07,-2.15,2.1392457493186248 +304,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM43,-2.93,0.0,0.42,0.03,-3.35,2.127134816087262 +305,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM44,-3.08,0.0,0.06,0.0,-3.14,2.00277016768521 +306,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM45,-1.52,0.0,1.06,0.07,-2.58,1.887624306191272 +307,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,QM logP + QM pKa,SM46,-2.26,0.0,0.69,0.05,-2.9499999999999997,1.7097387894188616 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/logP_experimental.csv b/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/logP_experimental.csv new file mode 100644 index 00000000..ff24689e --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/logP_experimental.csv @@ -0,0 +1,23 @@ +,method_name,file name,category,Molecule ID,logP (calc),logP SEM (calc),logP model uncertainty +0,logP_experimental,logP_experimental_values,experimental,SM25,2.67,0.01,0.0 +1,logP_experimental,logP_experimental_values,experimental,SM26,1.04,0.01,0.0 +2,logP_experimental,logP_experimental_values,experimental,SM27,1.56,0.11,0.0 +3,logP_experimental,logP_experimental_values,experimental,SM28,1.18,0.08,0.0 +4,logP_experimental,logP_experimental_values,experimental,SM29,1.61,0.03,0.0 +5,logP_experimental,logP_experimental_values,experimental,SM30,2.76,0.19,0.0 +6,logP_experimental,logP_experimental_values,experimental,SM31,1.96,0.14,0.0 +7,logP_experimental,logP_experimental_values,experimental,SM32,2.44,0.17,0.0 +8,logP_experimental,logP_experimental_values,experimental,SM33,2.96,0.21,0.0 +9,logP_experimental,logP_experimental_values,experimental,SM34,2.83,0.20,0.0 +10,logP_experimental,logP_experimental_values,experimental,SM35,0.88,0.02,0.0 +11,logP_experimental,logP_experimental_values,experimental,SM36,0.76,0.05,0.0 +12,logP_experimental,logP_experimental_values,experimental,SM37,1.45,0.10,0.0 +13,logP_experimental,logP_experimental_values,experimental,SM38,1.03,0.07,0.0 +14,logP_experimental,logP_experimental_values,experimental,SM39,1.89,0.13,0.0 +15,logP_experimental,logP_experimental_values,experimental,SM40,1.83,0.05,0.0 +16,logP_experimental,logP_experimental_values,experimental,SM41,0.58,0.02,0.0 +17,logP_experimental,logP_experimental_values,experimental,SM42,1.76,0.03,0.0 +18,logP_experimental,logP_experimental_values,experimental,SM43,0.85,0.01,0.0 +19,logP_experimental,logP_experimental_values,experimental,SM44,1.16,0.03,0.0 +20,logP_experimental,logP_experimental_values,experimental,SM45,2.55,0.04,0.0 +21,logP_experimental,logP_experimental_values,experimental,SM46,1.72,0.01,0.0 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/pKa_experimental.csv b/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/pKa_experimental.csv new file mode 100644 index 00000000..bfeaeb7b --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/experimental_value_files/pKa_experimental.csv @@ -0,0 +1,21 @@ +,method_name,file name,category,Molecule ID,pKa (calc),pKa SEM (calc),pKa model uncertainty +0,pKa_experimental,pKa_experimental_values,experimental,SM25,4.49,0.04,0.0 +1,pKa_experimental,pKa_experimental_values,experimental,SM26,4.91,0.01,0.0 +2,pKa_experimental,pKa_experimental_values,experimental,SM27,10.45,0.01,0.0 +3,pKa_experimental,pKa_experimental_values,experimental,SM29,10.05,0.01,0.0 +4,pKa_experimental,pKa_experimental_values,experimental,SM30,10.29,0.12,0.0 +5,pKa_experimental,pKa_experimental_values,experimental,SM31,11.02,0.01,0.0 +6,pKa_experimental,pKa_experimental_values,experimental,SM32,10.45,0.02,0.0 +7,pKa_experimental,pKa_experimental_values,experimental,SM34,11.93,0.05,0.0 +8,pKa_experimental,pKa_experimental_values,experimental,SM35,9.87,0.01,0.0 +9,pKa_experimental,pKa_experimental_values,experimental,SM36,9.8,0.06,0.0 +10,pKa_experimental,pKa_experimental_values,experimental,SM37,10.33,0.02,0.0 +11,pKa_experimental,pKa_experimental_values,experimental,SM38,9.44,0.02,0.0 +12,pKa_experimental,pKa_experimental_values,experimental,SM39,10.22,0.15,0.0 +13,pKa_experimental,pKa_experimental_values,experimental,SM40,9.58,0.01,0.0 +14,pKa_experimental,pKa_experimental_values,experimental,SM41,5.22,0.01,0.0 +15,pKa_experimental,pKa_experimental_values,experimental,SM42,6.62,0.02,0.0 +16,pKa_experimental,pKa_experimental_values,experimental,SM43,5.62,0.02,0.0 +17,pKa_experimental,pKa_experimental_values,experimental,SM44,6.34,0.01,0.0 +18,pKa_experimental,pKa_experimental_values,experimental,SM45,5.93,0.05,0.0 +19,pKa_experimental,pKa_experimental_values,experimental,SM46,6.42,0.01,0.0 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/TFE_MLR-EC_RISM.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/TFE_MLR-EC_RISM.csv new file mode 100644 index 00000000..ac24a2ad --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/TFE_MLR-EC_RISM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,0.37,0.01344107623307046,2.7578719282393283 +SM26,SM26,-0.7000000000000001,0.013361138100292885,2.7495583624304603 +SM27,SM27,1.47,3.94464950685735e-08,1.3600041024354872 +SM28,SM28,1.87,0.0,1.36 +SM29,SM29,1.47,1.494895479654403e-07,1.3600155469129884 +SM30,SM30,2.73,1.3453630570196874e-06,1.3601399177579299 +SM31,SM31,1.54,4.3383821036787003e-10,1.360000045119174 +SM32,SM32,1.97,2.7305881222731507e-08,1.3600028398116473 +SM33,SM33,3.24,0.0,1.36 +SM34,SM34,2.05,1.3696641564886716e-08,1.3600014244507228 +SM35,SM35,1.36,5.71108298258429e-07,1.3600593952630189 +SM36,SM36,2.63,1.3148643358942858e-06,1.360136745890933 +SM37,SM37,1.44,1.1076281149657645e-07,1.3600115193323958 +SM38,SM38,0.92,2.027036018354614e-06,1.360210811745909 +SM39,SM39,2.17,9.190237115357836e-05,1.3695578465999725 +SM40,SM40,1.01,1.3453630570196874e-06,1.3601399177579299 +SM41,SM41,-0.23,0.01314269632174524,2.726840417461505 +SM42,SM42,-0.25,0.013308600664929969,2.744094469152717 +SM43,SM43,-0.55,0.010713282071424373,2.474181335428135 +SM44,SM44,0.28,0.011697253061616422,2.576514318408108 +SM45,SM45,1.28,0.012575502951253249,2.667852306930338 +SM46,SM46,0.53,0.010713297459104524,2.4741829357468714 + +Name: +TFE MLR + EC_RISM +Category: +Empirical logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/experimental_pKa_and_logP_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/experimental_pKa_and_logP_combined.csv new file mode 100644 index 00000000..4d0599d5 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/experimental_pKa_and_logP_combined.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.24,0.04990175982820839,0.0 +SM26,SM26,-1.45,0.049742376229052185,0.0 +SM27,SM27,1.56,0.11000003171656936,0.0 +SM28,SM28,1.18,0.08,0.0 +SM29,SM29,1.61,0.030000199580283982,0.0 +SM30,SM30,2.76,0.19000006621276927,0.0 +SM31,SM31,1.96,0.14000000230065585,0.0 +SM32,SM32,2.44,0.17000003171656936,0.0 +SM33,SM33,2.96,0.21,0.0 +SM34,SM34,2.83,0.2000000000348365,0.0 +SM35,SM35,0.88,0.020000456164835357,0.0 +SM36,SM36,0.76,0.050000628939601666,0.0 +SM37,SM37,1.45,0.10000005508585408,0.0 +SM38,SM38,1.03,0.07000326718909411,0.0 +SM39,SM39,1.89,0.13000009135794513,0.0 +SM40,SM40,1.83,0.05000172321773445,0.0 +SM41,SM41,-1.6,0.0594766376651706,0.0 +SM42,SM42,0.91,0.0594234547197126,0.0 +SM43,SM43,-0.94,0.04870466475253714,0.0 +SM44,SM44,0.06,0.0638472905132671,0.0 +SM45,SM45,1.07,0.07742105380207609,0.0 +SM46,SM46,0.7000000000000001,0.0427763943395595,0.0 + +Name: +logP_experimental + pKa_experimental +Category: +Experimental logP + Experimental pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-EC_RISM_wet-EC_RISM.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-EC_RISM_wet-EC_RISM.csv new file mode 100644 index 00000000..b46204fa --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-EC_RISM_wet-EC_RISM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,2.25,0.02344107623307046,2.447871928239328 +SM26,SM26,0.51,0.023361138100292885,2.4395583624304598 +SM27,SM27,2.21,0.010000039446495069,1.0500041024354871 +SM28,SM28,2.18,0.01,1.05 +SM29,SM29,2.07,0.010000149489547966,1.0500155469129884 +SM30,SM30,3.78,0.01000134536305702,1.05013991775793 +SM31,SM31,3.27,0.010000000433838209,1.0500000451191742 +SM32,SM32,2.59,0.010000027305881223,1.0500028398116472 +SM33,SM33,5.27,0.01,1.05 +SM34,SM34,5.2700000000000005,0.010000013696641566,1.0500014244507228 +SM35,SM35,0.95,0.010000571108298259,1.0500593952630188 +SM36,SM36,2.59,0.010001314864335895,1.050136745890933 +SM37,SM37,2.14,0.0100001107628115,1.0500115193323958 +SM38,SM38,2.29,0.010002027036018357,1.050210811745909 +SM39,SM39,4.12,0.010091902371153578,1.0595578465999722 +SM40,SM40,3.61,0.01000134536305702,1.05013991775793 +SM41,SM41,1.64,0.023142696321745242,2.416840417461505 +SM42,SM42,4.44,0.023308600664929967,2.4340944691527167 +SM43,SM43,3.34,0.020713282071424373,2.164181335428135 +SM44,SM44,0.51,0.02169725306161642,2.2665143184081082 +SM45,SM45,1.8,0.022575502951253247,2.357852306930338 +SM46,SM46,1.63,0.020713297459104524,2.1641829357468705 + +Name: +EC_RISM_wet + EC_RISM +Category: +QM logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv new file mode 100644 index 00000000..76d7cfdf --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.38,0.18,2.3999880195383225 +SM26,SM26,-3.45,0.11,2.3999886092463867 +SM27,SM27,-0.52,0.13,2.3979742881675223 +SM28,SM28,1.85,0.13,1.5 +SM29,SM29,-0.85,0.15,2.3970721226093223 +SM30,SM30,1.07,0.14,2.3986675007101423 +SM31,SM31,0.02,0.13,2.3983151936120537 +SM32,SM32,1.27,0.16,2.39855103740242 +SM33,SM33,5.45,0.16,1.5 +SM34,SM34,1.23,0.14,2.39892838869771 +SM35,SM35,-1.99,0.36,2.399809083536587 +SM36,SM36,-1.03,0.6,2.399648783489643 +SM37,SM37,-1.43,0.75,2.3997498106923096 +SM38,SM38,-1.94,0.41,2.3995443305472417 +SM39,SM39,0.08,0.41,2.399608172679832 +SM40,SM40,-1.18,0.31,2.3996953304337554 +SM41,SM41,-3.23,0.18,2.3999972682711226 +SM42,SM42,-0.71,0.13,2.399997478807581 +SM43,SM43,-1.77,0.16,2.399994399413065 +SM44,SM44,-3.46,0.22,2.3999104647733165 +SM45,SM45,-0.42,0.41,2.399886760987729 +SM46,SM46,-2.2,0.29,2.3999411369110883 + +Name: +MD (CGenFF/TIP3P) + Gaussian_corrected +Category: +MM logP + QM+LEC pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NES_1_GAFF2_OPC3_B-pKa_experimental.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NES_1_GAFF2_OPC3_B-pKa_experimental.csv new file mode 100644 index 00000000..19d292c3 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NES_1_GAFF2_OPC3_B-pKa_experimental.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-1.15,0.1199017598282084,2.0 +SM26,SM26,-2.92,0.1397423762290522,2.0 +SM27,SM27,1.44,0.17000003171656936,2.0 +SM28,SM28,1.39,0.13,2.0 +SM29,SM29,1.22,0.100000199580284,2.0 +SM30,SM30,3.5500000000000003,0.19000006621276927,2.0 +SM31,SM31,3.12,0.15000000230065585,2.0 +SM32,SM32,2.31,0.09000003171656935,2.0 +SM33,SM33,4.63,0.19,2.0 +SM34,SM34,3.66,0.1400000000348365,2.0 +SM35,SM35,-0.07,0.22000045616483532,2.0 +SM36,SM36,0.89,0.2000006289396017,2.0 +SM37,SM37,1.11,0.10000005508585408,2.0 +SM38,SM38,-0.35000000000000003,0.1300032671890941,2.0 +SM39,SM39,2.02,0.29000009135794513,2.0 +SM40,SM40,0.62,0.15000172321773445,2.0 +SM41,SM41,0.52,0.1294766376651706,2.0 +SM42,SM42,3.69,0.1994234547197126,2.0 +SM43,SM43,2.38,0.13870466475253715,2.0 +SM44,SM44,1.39,0.0938472905132671,2.0 +SM45,SM45,2.65,0.21742105380207608,2.0 +SM46,SM46,2.7,0.1427763943395595,2.0 + +Name: +NES-1 (GAFF2/OPC3) B + pKa_experimental +Category: +MM logP + Experimental pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NULL0-Bergazin.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NULL0-Bergazin.csv new file mode 100644 index 00000000..09b55a49 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-NULL0-Bergazin.csv @@ -0,0 +1,33 @@ +Predictions: +SM25,SM25,0,0,1.0 +SM26,SM26,0,0,1.0 +SM27,SM27,0,0,1.0 +SM28,SM28,0,0,1.0 +SM29,SM29,0,0,1.0 +SM30,SM30,0,0,1.0 +SM31,SM31,0,0,1.0 +SM32,SM32,0,0,1.0 +SM33,SM33,0,0,1.0 +SM34,SM34,0,0,1.0 +SM35,SM35,0,0,1.0 +SM36,SM36,0,0,1.0 +SM37,SM37,0,0,1.0 +SM38,SM38,0,0,1.0 +SM39,SM39,0,0,1.0 +SM40,SM40,0,0,1.0 +SM41,SM41,0,0,1.0 +SM42,SM42,0,0,1.0 +SM43,SM43,0,0,1.0 +SM44,SM44,0,0,1.0 +SM45,SM45,0,0,1.0 +SM46,SM46,0,0,1.0 + + +Name: +NULL0 + +Category: +Empirical (ref) + +Ranked: +TRUE diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-REF0-ChemAxon-Bergazin.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-REF0-ChemAxon-Bergazin.csv new file mode 100644 index 00000000..9f3c6449 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-REF0-ChemAxon-Bergazin.csv @@ -0,0 +1,32 @@ +Predictions: +SM25,SM25,1.78,0,0 +SM26,SM26,-0.36,0,0 +SM27,SM27,0.83,0,0 +SM28,SM28,0.1,0,0 +SM29,SM29,0.76,0,0 +SM30,SM30,2.81,0,0 +SM31,SM31,0.89,0,0 +SM32,SM32,1.34,0,0 +SM33,SM33,3.4,0,0 +SM34,SM34,1.47,0,0 +SM35,SM35,-0.43,0,0 +SM36,SM36,1.62,0,0 +SM37,SM37,-0.31,0,0 +SM38,SM38,-0.25,0,0 +SM39,SM39,1.81,0,0 +SM40,SM40,-0.12,0,0 +SM41,SM41,0.42,0,0 +SM42,SM42,2.24,0,0 +SM43,SM43,0.95,0,0 +SM44,SM44,-0.19,0,0 +SM45,SM45,1.59,0,0 +SM46,SM46,0.34,0,0 + +Name: +REF0 ChemAxon + +Category: +Empirical (ref) + +Ranked: +TRUE diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv new file mode 100644 index 00000000..0478cbd4 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,1.45,0.0,1.62736237955926 +SM26,SM26,-3.13,0.0,2.506510219032152 +SM27,SM27,1.75,0.0,1.0600000003220549 +SM28,SM28,0.83,0.0,1.06 +SM29,SM29,1.23,0.0,1.060000014275805 +SM30,SM30,3.53,0.0,1.0600001445262448 +SM31,SM31,1.61,0.0,1.0600003022939144 +SM32,SM32,1.63,0.0,1.0600000019066798 +SM33,SM33,4.27,0.0,1.06 +SM34,SM34,2.39,0.0,1.0600002390356282 +SM35,SM35,0.77,0.0,1.060003863243295 +SM36,SM36,3.73,0.0,1.060517639767404 +SM37,SM37,-1.31,0.0,2.508112831230342 +SM38,SM38,0.48,0.0,1.0600307528312118 +SM39,SM39,2.45,0.0,1.062395960435093 +SM40,SM40,1.35,0.0,1.0890629521889412 +SM41,SM41,-1.45,0.0,2.496029608152775 +SM42,SM42,1.16,0.0,2.502465326649413 +SM43,SM43,-1.16,0.0,2.5071930954873802 +SM44,SM44,-0.6900000000000001,0.0,1.7782049976993068 +SM45,SM45,1.68,0.0,1.5093985461690262 +SM46,SM46,-0.95,0.0,2.473467961947144 + +Name: +TFE IEFPCM MST + IEFPCM/MST +Category: +QM logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_MLR-pKa_experimental.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_MLR-pKa_experimental.csv new file mode 100644 index 00000000..2db9e853 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_MLR-pKa_experimental.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.56,0.03990175982820839,1.36 +SM26,SM26,-1.31,0.03974237622905219,1.36 +SM27,SM27,1.47,3.171656935122753e-08,1.36 +SM28,SM28,1.87,0.0,1.36 +SM29,SM29,1.47,1.9958028397883465e-07,1.36 +SM30,SM30,2.73,6.621276927099838e-08,1.36 +SM31,SM31,1.54,2.3006558391672453e-09,1.36 +SM32,SM32,1.97,3.171656935122753e-08,1.36 +SM33,SM33,3.24,0.0,1.36 +SM34,SM34,2.05,3.4836487372642844e-11,1.36 +SM35,SM35,1.36,4.5616483535177537e-07,1.36 +SM36,SM36,2.63,6.28939601658209e-07,1.36 +SM37,SM37,1.44,5.5085854073942854e-08,1.36 +SM38,SM38,0.93,3.2671890940976683e-06,1.36 +SM39,SM39,2.21,9.135794513077607e-08,1.36 +SM40,SM40,1.01,1.7232177344455449e-06,1.36 +SM41,SM41,-0.74,0.039476637665170594,1.36 +SM42,SM42,0.72,0.029423454719712593,1.36 +SM43,SM43,-1.41,0.038704664752537137,1.36 +SM44,SM44,0.29,0.03384729051326709,1.36 +SM45,SM45,1.17,0.03742105380207609,1.36 +SM46,SM46,0.44,0.0327763943395595,1.36 + +Name: +TFE MLR + pKa_experimental +Category: +Empirical logP + Experimental pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv new file mode 100644 index 00000000..369c92fe --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,1.21,0.0,0.0 +SM26,SM26,0.01,0.0,0.0 +SM27,SM27,0.27,0.0,0.0 +SM28,SM28,-0.23,0.0,0.0 +SM29,SM29,-0.03,0.0,0.0 +SM30,SM30,1.52,0.0,0.0 +SM31,SM31,0.76,0.0,0.0 +SM32,SM32,0.77,0.0,0.0 +SM33,SM33,3.12,0.0,0.0 +SM34,SM34,1.89,0.0,0.0 +SM35,SM35,-1.7,0.0,0.0 +SM36,SM36,-0.36,0.0,0.0 +SM37,SM37,0.06,0.0,0.0 +SM38,SM38,-2.58,0.0,0.0 +SM39,SM39,-0.48,0.0,0.0 +SM40,SM40,-1.67,0.0,0.0 +SM41,SM41,-0.74,0.0,0.0 +SM42,SM42,0.78,0.0,0.0 +SM43,SM43,-0.14,0.0,0.0 +SM44,SM44,-1.73,0.0,0.0 +SM45,SM45,-0.52,0.0,0.0 +SM46,SM46,-0.93,0.0,0.0 + +Name: +TFE-NHLBI-TZVP-QM + TZVP-QM +Category: +QM logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv new file mode 100644 index 00000000..d17fe2bb --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-18.33,0.0,2.1562024272328912 +SM26,SM26,-0.42,0.0,1.47 +SM27,SM27,0.01,0.0,1.4700000000432958 +SM28,SM28,-0.79,0.0,1.47 +SM29,SM29,-0.76,0.0,1.4700000019788169 +SM30,SM30,1.17,0.0,1.470000786180085 +SM31,SM31,-0.13,0.0,1.4700000037704035 +SM32,SM32,0.86,0.0,1.4700000003770786 +SM33,SM33,2.61,0.0,1.47 +SM34,SM34,0.92,0.0,1.4700000002072229 +SM35,SM35,-0.79,0.0,1.4700000040648111 +SM36,SM36,-0.18,0.0,1.470364134111271 +SM37,SM37,-1.11,0.0,1.4855093562606103 +SM38,SM38,-2.75,0.0,1.470000000075239 +SM39,SM39,-1.22,0.0,1.4720316802734248 +SM40,SM40,-2.18,0.0,1.4704507142861698 +SM41,SM41,-2.25,0.0,2.073783637412573 +SM42,SM42,-1.16,0.0,2.1392457493186248 +SM43,SM43,-2.93,0.0,2.127134816087262 +SM44,SM44,-3.08,0.0,2.00277016768521 +SM45,SM45,-1.52,0.0,1.8876243061912719 +SM46,SM46,-2.26,0.0,1.7097387894188616 + +Name: +TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water +Category: +QM logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv new file mode 100644 index 00000000..d24ae240 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv @@ -0,0 +1,32 @@ +Predictions: +SM25,SM25,-2.33,1.0438236634602829,1.894117861536453 +SM26,SM26,-0.99,0.6411256089625921,1.2498009743401473 +SM27,SM27,0.49,0.3600129619091414,0.8000207390546263 +SM28,SM28,-0.6,0.36,0.8 +SM29,SM29,-0.54,0.3625241056999536,0.8040385691199258 +SM30,SM30,0.94,0.360471392816815,0.8007542285069041 +SM31,SM31,0.58,0.3600000000237929,0.8000000000380687 +SM32,SM32,1.24,0.3600282398376334,0.8000451837402134 +SM33,SM33,2.55,0.36,0.8 +SM34,SM34,1.46,0.3600002991428096,0.8000004786284954 +SM35,SM35,-0.81,0.36000043227744816,0.8000006916439171 +SM36,SM36,0.19,0.3600807198221639,0.8001291517154623 +SM37,SM37,-1.62,0.3600049444450307,0.8000079111120492 +SM38,SM38,-3.9,0.9808838646523136,1.7934141834437018 +SM39,SM39,-2.0,1.0099094360425362,1.8398550976680577 +SM40,SM40,-1.93,0.36120044198792783,0.8019207071806846 +SM41,SM41,-1.03,0.36000012476185017,0.8000001996189604 +SM42,SM42,0.23,0.36049300672398177,0.8007888107583709 +SM43,SM43,-0.2,0.36001487301534085,0.8000237968245454 +SM44,SM44,-1.63,0.3600704059249329,0.8001126494798927 +SM45,SM45,-0.5,0.36000015704952115,0.8000002512792339 +SM46,SM46,-1.8,0.3600000000432957,0.8000000000692732 + +Name: +TFE b3lypd3 + DFT_M05-2X_SMD + +Category: +QM logP + QM pKa + +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-DFT_M05_2X_SMD.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-DFT_M05_2X_SMD.csv new file mode 100644 index 00000000..ebf6e907 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-DFT_M05_2X_SMD.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.09,0.693823663460283,1.0941178615364526 +SM26,SM26,0.6,0.29112560896259204,0.4498009743401473 +SM27,SM27,1.56,0.11001296190914142,2.073905462627814e-05 +SM28,SM28,1.18,0.08,0.0 +SM29,SM29,1.58,0.032524105699953586,0.0040385691199257565 +SM30,SM30,2.75,0.19047139281681505,0.0007542285069040628 +SM31,SM31,1.96,0.1400000000237929,3.8068618612047945e-11 +SM32,SM32,2.44,0.17002823983763338,4.51837402133877e-05 +SM33,SM33,2.96,0.21,0.0 +SM34,SM34,2.83,0.20000029914280965,4.786284954241563e-07 +SM35,SM35,0.88,0.02000043227744815,6.91643917039198e-07 +SM36,SM36,0.76,0.050080719822163916,0.00012915171546227784 +SM37,SM37,1.45,0.10000494444503068,7.911112049091283e-06 +SM38,SM38,-0.28,0.6908838646523137,0.9934141834437016 +SM39,SM39,0.32,0.7799094360425363,1.0398550976680578 +SM40,SM40,1.81,0.05120044198792785,0.0019207071806845567 +SM41,SM41,0.58,0.0200001247618502,1.9961896031780734e-07 +SM42,SM42,1.75,0.03049300672398178,0.0007888107583708566 +SM43,SM43,0.85,0.010014873015340854,2.3796824545365085e-05 +SM44,SM44,1.16,0.030070405924932892,0.00011264947989263564 +SM45,SM45,2.5500000000000003,0.04000015704952115,2.5127923383730587e-07 +SM46,SM46,1.72,0.01000000004329575,6.927319449141272e-11 + +Name: +logP_experimental + DFT_M05-2X_SMD +Category: +Experimental logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-EC_RISM.csv b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-EC_RISM.csv new file mode 100644 index 00000000..a92962e2 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/input_files/logD-logP_experimental-EC_RISM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,0.6900000000000001,0.02344107623307046,1.397871928239328 +SM26,SM26,-0.84,0.023361138100292885,1.3895583624304602 +SM27,SM27,1.56,0.11000003944649507,4.102435487131644e-06 +SM28,SM28,1.18,0.08,0.0 +SM29,SM29,1.61,0.03000014948954796,1.554691298840579e-05 +SM30,SM30,2.76,0.19000134536305704,0.00013991775793004748 +SM31,SM31,1.96,0.14000000043383826,4.5119173878258493e-08 +SM32,SM32,2.44,0.17000002730588126,2.839811647164077e-06 +SM33,SM33,2.96,0.21,0.0 +SM34,SM34,2.83,0.2000000136966416,1.4244507227482188e-06 +SM35,SM35,0.88,0.020000571108298257,5.939526301887661e-05 +SM36,SM36,0.76,0.0500013148643359,0.00013674589093300572 +SM37,SM37,1.45,0.1000001107628115,1.151933239564395e-05 +SM38,SM38,1.02,0.07000202703601836,0.0002108117459088799 +SM39,SM39,1.85,0.1300919023711536,0.009557846599972148 +SM40,SM40,1.83,0.05000134536305703,0.00013991775793004748 +SM41,SM41,-1.09,0.03314269632174524,1.366840417461505 +SM42,SM42,-0.06,0.04330860066492997,1.3840944691527168 +SM43,SM43,-0.08,0.020713282071424373,1.114181335428135 +SM44,SM44,0.05,0.04169725306161642,1.2165143184081078 +SM45,SM45,1.18,0.05257550295125325,1.3078523069303378 +SM46,SM46,0.79,0.020713297459104524,1.1141829357468709 + +Name: +logP_experimental + EC_RISM +Category: +Experimental logP + QM pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis.py b/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis.py new file mode 100644 index 00000000..5c3be0b9 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis.py @@ -0,0 +1,1673 @@ +#!/usr/bin/env python + +# ============================================================================= +# GLOBAL IMPORTS +# ============================================================================= +import os +import glob +import io +import collections +import pickle +import pandas as pd +import numpy as np +import seaborn as sns +from matplotlib import pyplot as plt +import scipy.stats +from pylab import rcParams +import math + + + +# ============================================================================= +# CONSTANTS +# ============================================================================= + +# Paths to input data. +LOGD_SUBMISSIONS_DIR_PATH = './logD_different_pKa_logP_combo_input_files' +EXPERIMENTAL_DATA_FILE_PATH = '../../logD_experimental_values.csv' +USER_MAP_FILE_PATH = './user-map2.csv' + +# ============================================================================= +# STATS FUNCTIONS +# ============================================================================= + +def r2(data): + x, y = data.T + slope, intercept, r_value, p_value, stderr = scipy.stats.linregress(x, y) + return r_value**2 + + +def slope(data): + x, y = data.T + slope, intercept, r_value, p_value, stderr = scipy.stats.linregress(x, y) + return slope + + +def me(data): + x, y = data.T + error = np.array(x) - np.array(y) + return error.mean() + + +def mae(data): + x, y = data.T + error = np.abs(np.array(x) - np.array(y)) + return error.mean() + + +def rmse(data): + x, y = data.T + error = np.array(x) - np.array(y) + rmse = np.sqrt((error**2).mean()) + return rmse + +def kendall_tau(data): + x, y = data.T + correlation, p_value = scipy.stats.kendalltau(x, y) + return correlation + + +def compute_bootstrap_statistics(samples, stats_funcs, percentile=0.95, n_bootstrap_samples=1000): + """Compute bootstrap confidence interval for the given statistics functions.""" + # Handle case where only a single function is passed. + #print("SAMPLES:\n", samples) + + try: + len(stats_funcs) + except TypeError: + stats_funcs = [stats_funcs] + + # Compute mean statistics. + statistics = [stats_func(samples) for stats_func in stats_funcs] + + # Generate bootstrap statistics. + bootstrap_samples_statistics = np.zeros((len(statistics), n_bootstrap_samples)) + for bootstrap_sample_idx in range(n_bootstrap_samples): + samples_indices = np.random.randint(low=0, high=len(samples), size=len(samples)) + for stats_func_idx, stats_func in enumerate(stats_funcs): + bootstrap_samples_statistics[stats_func_idx][bootstrap_sample_idx] = stats_func(samples[samples_indices]) + + # Compute confidence intervals. + percentile_index = int(np.floor(n_bootstrap_samples * (1 - percentile) / 2)) - 1 + bootstrap_statistics = [] + for stats_func_idx, samples_statistics in enumerate(bootstrap_samples_statistics): + samples_statistics.sort() + stat_lower_percentile = samples_statistics[percentile_index] + stat_higher_percentile = samples_statistics[-percentile_index+1] + confidence_interval = (stat_lower_percentile, stat_higher_percentile) + bootstrap_statistics.append([statistics[stats_func_idx], confidence_interval, samples_statistics]) + + return bootstrap_statistics + +# ============================================================================= +# STATS FUNCTIONS FOR QQ-PLOT AND ERROR SLOPE CALCULATION +# +# Methods from uncertain_check.py David L. Mobley wrote for the SAMPL4 analysis +# =============================================================================== + +def normal(y): + """Return unit normal distribution value at specified location.""" + return 1. / np.sqrt(2 * np.pi) * np.exp(-y ** 2 / 2.) + + +def compute_range_table(stepsize=0.001, maxextent=10): + """Compute integrals of the unit normal distribution and return these tabulated. + Returns: + -------- + - range: NumPy array giving integration range (x) where integration range runs -x to +x + - integral: NumPy arrange giving integrals over specified integration range. + + Arguments (optional): + --------------------- + - stepsize: Step size to advance integration range by each trial. Default: 0.001 + - maxextent: Maximum extent of integration range +""" + # Calculate integration range + x = np.arange(0, maxextent, stepsize) # Symmetric, so no need to do negative values. + + # Calculate distribution at specified x values + distrib = normal(x) + + integral = np.zeros(len(x), float) + for idx in range(1, len(x)): + integral[idx] = 2 * scipy.integrate.trapz(distrib[0:idx + 1], x[0:idx + 1]) # Factor of 2 handles symmetry + + return x, integral + + +def get_range(integral, rangetable, integraltable): + """Use rangetable and integral table provided (i.e. from compute_range_table) to find the smallest range of integration for which the integral is greater than the specified value (integral). Return this range as a float.""" + + idx = np.where(integraltable > integral)[0] + return rangetable[idx[0]] + + +# [DLM]Precompute integral of normal distribution so I can look up integration range which gives desired integral +# integral_range, integral = compute_range_table() + + +def fracfound_vs_error(calc, expt, dcalc, dexpt, integral_range, integral): + """ + Takes in calculated and experimental values, their uncertainties as well as + """ + # Fraction of Gaussian distribution we want to compute + X = np.arange(0, 1.0, 0.01) + Y = np.zeros(len(X)) + + for (i, x) in enumerate(X): + # Determine integration range which gives us this much probability + rng = get_range(x, integral_range, integral) + # print x, rng + + # Loop over samples and compute fraction of measurements found + y = 0. + # for n in range(0, len(DGcalc)): + # sigma_eff = sqrt( sigma_calc[n]**2 + sigma_expt[n]**2 ) + # absdiff = abs( DGcalc[n] - DGexpt[n]) + # #print absdiff, n, sigma_eff + # if absdiff < rng * sigma_eff: #If the difference falls within the specified range of sigma values, then this is within the range we're looking at; track it + # #print "Incrementing y for n=%s, x = %.2f" % (n, x) + # y += 1./len(DGcalc) + # Rewrite for speed + sigma_eff = np.sqrt(np.array(dcalc) ** 2 + np.array(dexpt) ** 2) + absdiff = np.sqrt((np.array(calc) - np.array(expt)) ** 2) + idx = np.where(absdiff < rng * sigma_eff)[0] + Y[i] = len(idx) * 1. / len(calc) + + # print Y + # raw_input() + + return X, Y + + +# Copied from David L. Mobley's scripts written for SAMPL4 analysis (added calculation uncertainty) +def bootstrap_exptnoise(calc1, expt1, exptunc1, returnunc=False): + """Take two datasets (equal length) of calculated and experimental values. Construct new datasets of equal length by picking, with replacement, a set of indices to use from both sets. Return the two new datasets. To take into account experimental uncertainties, random noise is added to the experimental set, distributed according to gaussians with variance taken from the experimental uncertainties. Approach suggested by J. Chodera. +Optionally, 'returnunc = True', which returns a third value -- experimental uncertainties corresponding to the data points actually used.""" + + # Make everything an array just in case + calc = np.array(calc1) + expt = np.array(expt1) + exptunc = np.array(exptunc1) + npoints = len(calc) + + # Pick random datapoint indices + idx = np.random.randint(0, npoints, + npoints) # Create an array consisting of npoints indices, where each index runs from 0 up to npoints. + + # Construct initial new datasets + newcalc = calc[idx] + newexpt = expt[idx] + newuncExp = exptunc[idx] + + # Add noise to experimental set + noise = np.random.normal(0., + exptunc) # For each data point, draw a random number from a normal distribution centered at 0, with standard devaitions given by exptunc + newexpt += noise + + if not returnunc: + return newcalc, newexpt + else: + return newcalc, newexpt, newuncExp + +# Modified from David L. Mobley's scripts written for SAMPL4 analysis (added bootstrapped values to the list of returned values ) +def getQQdata(calc, expt, dcalc, dexpt, boot_its): + """ + Takes calculated and experimental values and their uncertainties + + Parameters + ---------- + calc: predicted logD value + expt: experimental logD value + dcalc: predicted model uncertainty + dexp: experimental logD SEM + + Outputs + ------- + X: array of x axis values for QQ-plot + Y: array of y axis values for QQ-plot + slope: Error Slope (ES) of line fit to QQ-plot + slopes: Erros Slope (ES) of line fit to QQ-plot of bootstrapped datapoints + """ + integral_range, integral = compute_range_table() + X, Y = fracfound_vs_error(calc, expt, dcalc, dexpt, integral_range, integral) + xtemp = X[:, np.newaxis] + coeff, _, _, _ = np.linalg.lstsq(xtemp, Y,rcond=-1) + slope = coeff[0] + slopes = [] + for it in range(boot_its): + n_calc, n_expt, n_dexpt = bootstrap_exptnoise(calc, expt, dexpt, returnunc=True) + nX, nY = fracfound_vs_error(n_calc, n_expt, dcalc, n_dexpt, integral_range, integral) + a, _, _, _ = np.linalg.lstsq(xtemp, nY,rcond=-1) + slopes.append(a[0]) + return X, Y, slope, np.array(slopes).std(), slopes + +# ============================================================================= +# PLOTTING FUNCTIONS +# ============================================================================= + +def plot_correlation(x, y, data, title=None, color=None, kind='joint', ax=None): + # Extract only logD values. + data = data[[x, y]] + + # Find extreme values to make axes equal. + min_limit = np.ceil(min(data.min()) - 1) + max_limit = np.floor(max(data.max()) + 1) + axes_limits = np.array([min_limit, max_limit]) + + if kind == 'joint': + grid = sns.jointplot(x=x, y=y, data=data, + kind='reg', joint_kws={'ci': None}, #stat_func=None, + xlim=axes_limits, ylim=axes_limits, color=color) + ax = grid.ax_joint + grid.fig.subplots_adjust(top=0.95) + grid.fig.suptitle(title) + elif kind == 'reg': + #print("x_values",type(x_values)) + #print("y_values",type(y_values)) + #print("data",type(data)) + ax = sns.regplot(x=x, y=y, data=data, color=color, ax=ax) + ax.set_title(title) + + # Add diagonal line. + ax.plot(axes_limits, axes_limits, ls='--', c='black', alpha=0.8, lw=0.7) + + # Add shaded area for 0.5-1 logD error. + palette = sns.color_palette('BuGn_r') + ax.fill_between(axes_limits, axes_limits - 0.5, axes_limits + 0.5, alpha=0.2, color=palette[2]) + ax.fill_between(axes_limits, axes_limits - 1, axes_limits + 1, alpha=0.2, color=palette[3]) + + +def plot_correlation_with_SEM(x_lab, y_lab, x_err_lab, y_err_lab, data, title=None, color=None, ax=None): + # Extract only logD values. + x_error = data.loc[:, x_err_lab] + y_error = data.loc[:, y_err_lab] + x_values = data.loc[:, x_lab] + y_values = data.loc[:, y_lab] + data = data[[x_lab, y_lab]] + + # Find extreme values to make axes equal. + min_limit = np.ceil(min(data.min()) - 1) + max_limit = np.floor(max(data.max()) + 1) + axes_limits = np.array([min_limit, max_limit]) + + # Color + current_palette = sns.color_palette() + sns_blue = current_palette[0] + + # Plot + plt.figure(figsize=(6, 6)) + grid = sns.regplot(x=x_values, y=y_values, data=data, color=color, ci=None) + plt.errorbar(x=x_values, y=y_values, xerr=x_error, yerr=y_error, fmt="o", ecolor=sns_blue, capthick='2', + label='SEM', alpha=0.75) + plt.axis("equal") + + if len(title) > 70: + plt.title(title[:70]+"...") + else: + plt.title(title) + + # Add diagonal line. + grid.plot(axes_limits, axes_limits, ls='--', c='black', alpha=0.8, lw=0.7) + + # Add shaded area for 0.5-1 logD error. + palette = sns.color_palette('BuGn_r') + grid.fill_between(axes_limits, axes_limits - 0.5, axes_limits + 0.5, alpha=0.2, color=palette[2]) + grid.fill_between(axes_limits, axes_limits - 1, axes_limits + 1, alpha=0.2, color=palette[3]) + + plt.xlim(axes_limits) + plt.ylim(axes_limits) + + +def barplot_with_CI_errorbars(df, x_label, y_label, y_lower_label, y_upper_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:,delta_lower_yerr_label] = data.loc[:,y_label] - data.loc[:,y_lower_label] + data.loc[:,delta_upper_yerr_label] = data.loc[:,y_upper_label] - data.loc[:,y_label] + + # Color + current_palette = sns.color_palette() + sns_color = current_palette[2] + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 16 #14 + plt.rcParams['ytick.labelsize'] = 18 #16 + plt.rcParams['legend.fontsize'] = 16 + plt.rcParams['legend.handlelength'] = 2 + plt.rcParams['figure.autolayout'] = True + #plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=45, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=sns_color, capsize=3, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + + +def barplot_with_CI_errorbars_colored_by_label(df, x_label, y_label, y_lower_label, y_upper_label, color_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + color_label: str, column name of label that will determine the color of bars + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:, delta_lower_yerr_label] = data.loc[:, y_label] - data.loc[:, y_lower_label] + data.loc[:, delta_upper_yerr_label] = data.loc[:, y_upper_label] - data.loc[:, y_label] + + # Color + #current_palette = sns.color_palette() + #sns_color = current_palette[2] # Error bar color + + # Zesty colorblind-friendly color palette + color0 = "#0F2080" #dark blue for Physical (MM) + QM+LEC + color1 = "#F5793A" #orange for Empirical + color3 = "#85C0F9" #light blue for Physical (QM) + color2 = "#a866a1" #purple + color4 = "#009e73"#light green + color5 = "#000000"#black + color6 = "#f0e442"#yellow + color7 = "#DADADA"#grey + current_palette = [color0, color1, color3, color2, color4, color5,color6,color7] + error_color = 'gray' + # Bar colors + if color_label == "category": + category_list = ["MM logP + QM+LEC pKa", + "Empirical (ref)", + "QM logP + QM pKa", + "MM logP + Experimental pKa", + "Empirical logP + Experimental pKa", + "Experimental logP + QM pKa", + "Empirical logP + QM pKa", + "Experimental logP + Experimental pKa"] + elif color_label == "type": + category_list = ["Standard", "Reference"] + else: + Exception("Error: Unsupported label used for coloring") + bar_color_dict = {} + for i, cat in enumerate(category_list): + bar_color_dict[cat] = current_palette[i] + #print("bar_color_dict:\n", bar_color_dict) + + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 20 + plt.rcParams['ytick.labelsize'] = 20 #16 + plt.rcParams['legend.fontsize'] = 18 + plt.rcParams['legend.handlelength'] = 2 + # plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + #barlist = plt.bar(x, y) + fig, ax = plt.subplots(figsize=figsize) + barlist = ax.bar(x, y) + + plt.xticks(x, data[x_label], rotation=45, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=3, elinewidth=2, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + + # Reset color of bars based on color label + #print("data.columns:\n",data.columns) + #print("\nData:\n", data) + for i, c_label in enumerate(data.loc[:, color_label]): + barlist[i].set_color(bar_color_dict[c_label]) + + # create legend + from matplotlib.lines import Line2D + + + if color_label == 'category': + custom_lines = [Line2D([0], [0], color=bar_color_dict["MM logP + QM+LEC pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["Empirical (ref)"], lw=5), + Line2D([0], [0], color=bar_color_dict["QM logP + QM pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["MM logP + Experimental pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["Empirical logP + Experimental pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["Experimental logP + QM pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["Empirical logP + QM pKa"], lw=5), + Line2D([0], [0], color=bar_color_dict["Experimental logP + Experimental pKa"], lw=5)] + elif color_label == 'type': + custom_lines = [Line2D([0], [0], color=bar_color_dict["Standard"], lw=5), + Line2D([0], [0], color=bar_color_dict["Reference"], lw=5)] + ax.legend(custom_lines, category_list) + + +def barplot(df, x_label, y_label, title): + """Creates bar plot of a given dataframe. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + title: str, the title of the plot + + """ + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + #plt.tight_layout() + + # Plot + data = df + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=45) + plt.xlabel(x_label) + plt.ylabel(y_label) + if len(title) > 70: + plt.title(title[:70]+"...") + else: + plt.title(title) + plt.tight_layout() + +# ============================================================================ +# PLOTTING FUNCTIONS FOR QQ-PLOT +# +# Methods from uncertain_check.py David L. Mobley wrote for the SAMPL4 analysis +# ============================================================================= + + +def makeQQplot(X, Y, slope, title, xLabel ="Expected fraction within range" , yLabel ="Fraction of predictions within range", fileName = "QQplot.pdf", uncLabel = 'Model Unc.', leg = [1.02, 0.98, 2, 1], ax1 = None): + """ + Provided with experimental and calculated values (and their associated uncertainties) in the form of list like objects. + Provides the analysis to make a QQ-plot using the guassian integral methods David wrote for SAMPL4 that are included above. + Makes a files of the plot and returns the "error slope" as a float and the figure of the created plot + """ + if ax1 == None: + axReturn = False + # Get plot parameters for JCAMD + # plt.rcParams.update(JCAMDdict()) + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.rcParams['figure.figsize'] = 6, 6 + + # Set up plot + #fig1 = plt.figure(1, figsize=(6,6)) + #plt.ylim = (0,1) + #plt.xlim = (0,1) + #plt.xlabel(xLabel) + #plt.ylabel(yLabel) + #plt.title(title, fontsize=20) + #ax1 = fig1.add_subplot(111) + + # New way to plot with subplots + fig1, ax1 = plt.subplots(1,1) + ax1.set_xlim(0,1) + ax1.set_ylim(0,1) + ax1.set_xlabel(xLabel) + ax1.set_ylabel(yLabel) + ax1.set_title(title, fontsize=20) + + else: + axReturn = True + # Add data to plot + p1 = ax1.plot(X,Y,'bo', label = uncLabel) + + # Add x=y line + p2 = ax1.plot(X,X,'k-', label = r'$X=Y$') + + # X data needs to be a column vector to use linalg.lstsq + p3 = ax1.plot(X, slope*X, 'r-', label = 'Slope %.2f' % slope) + + # Build Legend + handles = [p1,p2,p3] + if leg != None: + ax1.legend(bbox_to_anchor = (leg[0], leg[1]), loc = leg[2], ncol = leg[3], borderaxespad = 0.) + + if axReturn: + return ax1 + else: + # Adjust spacing then save and close figure + plt.savefig(fileName) + plt.close(fig1) +## other + +def name_to_filename(id): + for ch in [' ','/']: + if ch in id: + id=id.replace(ch,"_") + for ch in ['(',')']: + if ch in id: + id=id.replace(ch,"") + return id + +# ============================================================================= +# UTILITY CLASSES +# ============================================================================= + +class IgnoredSubmissionError(Exception): + """Exception used to signal a submission that must be ignored.""" + pass + + +class BadFormatError(Exception): + """Exception used to signal a submission with unexpected formatting.""" + pass + + +class SamplSubmission: + """A generic SAMPL submission. + Parameters + ---------- + file_path : str + The path to the submission file. + Raises + ------ + IgnoredSubmission + If the submission ID is among the ignored submissions. + """ + # The IDs of submissions used for reference calculations + REF_SUBMISSIONS = ['REF0', 'NULL0'] + + + # Section of the submission file. + SECTIONS = {} + + # Sections in CSV format with columns names. + CSV_SECTIONS = {} + + def __init__(self, file_path, user_map): + file_name = os.path.splitext(os.path.basename(file_path))[0] + file_data = file_name.split('-') + + # Load predictions. + sections = self._load_sections(file_path) # From parent-class. + print(sections) + self.data = sections['Predictions'] # This is a list + self.data = pd.DataFrame(data=self.data) # Now a DataFrame + #self.name = sections['Name'][0] #want this to take the place of the 5 letter code + self.file_name = file_name + + self.method_name = sections['Name'][0] #want this to take the place of the 5 letter code + + # Check if this is a reference submission + self.reference_submission = False + #if self.method_name in self.REF_SUBMISSIONS: + if "REF" in self.method_name or "NULL" in self.method_name: + print("REF found: ", self.method_name) + self.reference_submission = True + + @classmethod + def _read_lines(cls, file_path): + """Generator to read the file and discard blank lines and comments.""" + with open(file_path, 'r', encoding='utf-8-sig') as f: + for line in f: + # Strip whitespaces. + line = line.strip() + # Don't return blank lines and comments. + if line != '' and line[0] != '#': + yield line + + @classmethod + def _load_sections(cls, file_path): + """Load the data in the file and separate it by sections.""" + #print("file_path",file_path) + sections = {} + current_section = None + for line in cls._read_lines(file_path): + # Check if this is a new section. + if line[:-1] in cls.SECTIONS: + current_section = line[:-1] + else: + if current_section is None: + import pdb + pdb.set_trace() + try: + sections[current_section].append(line) + except KeyError: + sections[current_section] = [line] + + # Check that all the sections have been loaded. + found_sections = set(sections.keys()) + if found_sections != cls.SECTIONS: + raise BadFormatError('Missing sections: {}.'.format(found_sections - cls.SECTIONS)) + + # Create a Pandas dataframe from the CSV format. + for section_name in cls.CSV_SECTIONS: + csv_str = io.StringIO('\n'.join(sections[section_name])) + columns = cls.CSV_SECTIONS[section_name] + id_column = columns[0] + #print("trying", sections) + section = pd.read_csv(csv_str, index_col=id_column, names=columns, skipinitialspace=True) + #section = pd.read_csv(csv_str, names=columns, skipinitialspace=True) + sections[section_name] = section + return sections + + @classmethod + def _create_comparison_dataframe(cls, column_name, submission_data, experimental_data): + """Create a single dataframe with submission and experimental data.""" + # Filter only the systems IDs in this submissions. + + + experimental_data = experimental_data[experimental_data.index.isin(submission_data.index)] # match by column index + # Fix the names of the columns for labelling. + submission_series = submission_data[column_name] + submission_series.name += ' (calc)' + experimental_series = experimental_data[column_name] + experimental_series.name += ' (expt)' + + # Concatenate the two columns into a single dataframe. + return pd.concat([experimental_series, submission_series], axis=1) + +# ============================================================================= +# LOGP PREDICTION CHALLENGE +# ============================================================================= + +class logDSubmission(SamplSubmission): + """A submission for logD challenge. + + Parameters + ---------- + file_path : str + The path to the submission file + + Raises + ------ + IgnoredSubmission + If the submission ID is among the ignored submissions. + + """ + # Section of the submission file. + SECTIONS = {"Predictions", + #"Participant name", + #"Participant organization", + "Name", + #"Compute time", + #"Computing and hardware", + #"Software", + "Category", + #"Method", + "Ranked"} + + # Sections in CSV format with columns names. + CSV_SECTIONS = {"Predictions": ("Molecule ID", "ID tag", "logD mean", "logD SEM", "logD model uncertainty")} + + + def __init__(self, file_path, user_map): + super().__init__(file_path, user_map) + + file_name = os.path.splitext(os.path.basename(file_path))[0] + file_data = file_name.split('-') + print(file_name) + + + # Load predictions. + sections = self._load_sections(file_path) # From parent-class. + self.data = sections['Predictions'] # This is a pandas DataFrame. + #self.participant = sections['Participant name'][0].strip() + self.method_name = sections['Name'][0] + self.category = sections['Category'][0] # New section for logD challenge. + self.ranked = sections['Ranked'][0].strip() =='True' + + # Check if this is a reference submission + self.reference_submission = False + if "REF" in self.method_name or "NULL" in self.method_name: + self.reference_submission = True + + + + + def compute_logD_statistics(self, experimental_data, stats_funcs): + data = self._create_comparison_dataframe('logD mean', self.data, experimental_data) + + # Create lists of stats functions to pass to compute_bootstrap_statistics. + stats_funcs_names, stats_funcs = zip(*stats_funcs.items()) + #bootstrap_statistics = compute_bootstrap_statistics(data.as_matrix(), stats_funcs, n_bootstrap_samples=10000) #10000 + + bootstrap_statistics = compute_bootstrap_statistics(data.to_numpy(), stats_funcs, n_bootstrap_samples=10000) + + # Return statistics as dict preserving the order. + return collections.OrderedDict((stats_funcs_names[i], + bootstrap_statistics[i]) + for i in range(len(stats_funcs))) + + def compute_logD_model_uncertainty_statistics(self,experimental_data): + + # Create a dataframe for data necessary for error slope analysis + expt_logD_series = experimental_data["logD mean"] + expt_logD_SEM_series = experimental_data["logD SEM"] + pred_logD_series = self.data["logD mean"] + pred_logD_SEM_series = self.data["logD SEM"] + pred_logD_mod_unc_series = self.data["logD model uncertainty"] + + # Concatenate the columns into a single dataframe. + data_exp = pd.concat([expt_logD_series, expt_logD_SEM_series], axis=1) + data_exp = data_exp.rename(index=str, columns={"logD mean": "logD mean (expt)", + "logD SEM": "logD SEM (expt)"}) + + data_mod_unc = pd.concat([data_exp, pred_logD_series, pred_logD_SEM_series, pred_logD_mod_unc_series], axis=1) + data_mod_unc = data_mod_unc.rename(index=str, columns={"logD mean (calc)": "logD mean (calc)", + "logD SEM": "logD SEM (calc)", + "logD model uncertainty": "logD model uncertainty"}) + #print("data_mod_unc:\n", data_mod_unc) + + # Compute QQ-Plot Error Slope (ES) + calc = data_mod_unc.loc[:, "logD mean (calc)"].values + expt = data_mod_unc.loc[:, "logD mean (expt)"].values + dcalc = data_mod_unc.loc[:, "logD model uncertainty"].values + dexpt = data_mod_unc.loc[:, "logD SEM (expt)"].values + n_bootstrap_samples = 1000 #1000 + + X, Y, error_slope, error_slope_std, slopes = getQQdata(calc, expt, dcalc, dexpt, boot_its=n_bootstrap_samples) + + QQplot_data = [X, Y, error_slope] + + # Compute 95% confidence intervals of Error Slope + percentile = 0.95 + percentile_index = int(np.floor(n_bootstrap_samples * (1 - percentile) / 2)) - 1 + + #for stats_func_idx, samples_statistics in enumerate(bootstrap_samples_statistics): + samples_statistics = np.asarray(slopes) + samples_statistics.sort() + stat_lower_percentile = samples_statistics[percentile_index] + stat_higher_percentile = samples_statistics[-percentile_index + 1] + confidence_interval = (stat_lower_percentile, stat_higher_percentile) + + model_uncertainty_statistics = [error_slope, confidence_interval, samples_statistics] + + + return model_uncertainty_statistics, QQplot_data + + +# ============================================================================= +# UTILITY FUNCTIONS +# ============================================================================= + + +def load_submissions(directory_path, user_map): + """Load submissions from a specified directory using a specified user map. + Optional argument: + ref_ids: List specifying submission IDs (alphanumeric, typically) of + reference submissions which are to be ignored/analyzed separately. + Returns: submissions + """ + submissions = [] + for file_path in glob.glob(os.path.join(directory_path, '*.csv')): + try: + submission = logDSubmission(file_path, user_map) + + except IgnoredSubmissionError: + continue + submissions.append(submission) + print(submissions) + return submissions + + + +def load_ranked_submissions(directory_path, user_map): + """ + Load submissions from a specified directory using a specified user map. + Optional argument: + ref_ids: List specifying submission IDs (alphanumeric, typically) of + reference submissions which are to be ignored/analyzed separately. + Returns: submissions + """ + submissions = [] + + for file_path in glob.glob(os.path.join(directory_path, '*.csv')): + try: + submission = logDSubmission(file_path, user_map) + except IgnoredSubmissionError: + continue + # only continue if submission is ranked + if not submission.ranked: + continue + if "REF" in submission.method_name or "NULL" in submission.method_name: + continue + + submissions.append(submission) + + method_names = [] + for submission in submissions: + method_names.append(submission.method_name) + print("Ranked submissions: \n", method_names) + + return submissions + + + +class logDSubmissionCollection: + """A collection of logD submissions.""" + + LOGP_CORRELATION_PLOT_BY_METHOD_PATH_DIR = 'logDCorrelationPlots' + LOGP_CORRELATION_PLOT_WITH_SEM_BY_METHOD_PATH_DIR = 'logDCorrelationPlotsWithSEM' + LOGP_CORRELATION_PLOT_BY_LOGP_PATH_DIR = 'error_for_each_logD.pdf' + ABSOLUTE_ERROR_VS_LOGP_PLOT_PATH_DIR = 'AbsoluteErrorPlots' + + + def __init__(self, submissions, experimental_data, output_directory_path, logD_submission_collection_file_path, + ignore_refcalcs = True, ranked_only = True, allow_multiple = True): + # Build collection dataframe from the beginning. + # Build full logD collection table. + + data = [] + + # Participant names we've found so far; tracked to ensure no one has more than one + # ranked submission + self.method_names_ranked = [] + + # Submissions for logD. + for submission in submissions: + if submission.reference_submission and ignore_refcalcs: + continue + + if ranked_only and not submission.ranked: + continue + # Store names associated with ranked submission, skip if they submitted multiple (only if we need to check for duplicate authors) + if submission.ranked and not allow_multiple: + if not submission.method_name in self.method_names_ranked: + self.method_names_ranked.append(submission.method_name) + else: + print(f"Error: {submission.method_name} submitted multiple ranked submissions.") + continue + + + for mol_ID, series in submission.data.iterrows(): + logD_mean_exp = experimental_data.loc[mol_ID, 'logD mean'] + logD_SEM_exp = experimental_data.loc[mol_ID, 'logD SEM'] + + logD_mean_pred = submission.data.loc[mol_ID, "logD mean"] + logD_SEM_pred = submission.data.loc[mol_ID, "logD SEM"] + print("logD_mean_pred \n",logD_mean_pred) + + logD_model_uncertainty = submission.data.loc[mol_ID, "logD model uncertainty"] + ranked = submission.ranked + + data.append({ + 'method_name': submission.method_name, + #'participant': submission.participant, + #'file name': submission.file_name, + 'category': submission.category, + 'Molecule ID': mol_ID, + 'logD (calc)': logD_mean_pred, + 'logD SEM (calc)': logD_SEM_pred, + 'logD (exp)': logD_mean_exp, + 'logD SEM (exp)': logD_SEM_exp, + '$\Delta$logD error (calc - exp)': logD_mean_pred - logD_mean_exp, + 'logD model uncertainty': logD_model_uncertainty + }) + + # Transform into Pandas DataFrame. + self.data = pd.DataFrame(data=data) + self.output_directory_path = output_directory_path + + print("\n SubmissionCollection: \n") + print(self.data) + + # Create general output directory. + os.makedirs(self.output_directory_path, exist_ok=True) + + # Save collection.data dataframe in a CSV file. + self.data.to_csv(logD_submission_collection_file_path) + + def generate_correlation_plots(self): + # logD correlation plots. + output_dir_path = os.path.join(self.output_directory_path, + self.LOGP_CORRELATION_PLOT_BY_METHOD_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + print("self.data \n",self.data) + print(print("self.data.method_name\n",self.data.method_name)) + for method_name in self.data.method_name.unique(): + # Skip NULL0 submission + if "NULL" in method_name: + continue + + data = self.data[self.data.method_name == method_name] + print("data \n",data) + title = '{}'.format(method_name) + + plt.close('all') + plot_correlation(x='logD (exp)', y='logD (calc)', + data=data, title=title, kind='joint') + plt.tight_layout() + # plt.show() + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + def generate_correlation_plots_with_SEM(self): + # logD correlation plots. + output_dir_path = os.path.join(self.output_directory_path, + self.LOGP_CORRELATION_PLOT_WITH_SEM_BY_METHOD_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + for method_name in self.data.method_name.unique(): + + # Skip NULL0 submission + if "NULL" in method_name: + continue + + data = self.data[self.data.method_name == method_name] + title = '{}'.format(method_name) + + plt.close('all') + plot_correlation_with_SEM(x_lab='logD (exp)', y_lab='logD (calc)', + x_err_lab='logD SEM (exp)', y_err_lab='logD SEM (calc)', + data=data, title=title) + plt.tight_layout() + # plt.show() + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + def generate_molecules_plot(self): + # Correlation plot by molecules. + plt.close('all') + data_ordered_by_mol_ID = self.data.sort_values(["Molecule ID"], ascending=["True"]) + sns.set(rc={'figure.figsize': (8.27,11.7)}) + sns.violinplot(y='Molecule ID', x='$\Delta$logD error (calc - exp)', data=data_ordered_by_mol_ID, + inner='point', linewidth=1, width=1.2) + plt.tight_layout() + # plt.show() + plt.savefig(os.path.join(self.output_directory_path, self.LOGP_CORRELATION_PLOT_BY_LOGP_PATH_DIR)) + + def generate_absolute_error_vs_molecule_ID_plot(self): + """ + For each method a bar plot is generated so that absolute errors of each molecule can be compared. + """ + # Setup output directory + output_dir_path = os.path.join(self.output_directory_path, + self.ABSOLUTE_ERROR_VS_LOGP_PLOT_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + + # Calculate absolute errors. + self.data["absolute error"] = np.NaN + self.data.loc[:, "absolute error"] = np.absolute(self.data.loc[:, "$\Delta$logD error (calc - exp)"]) + + # Create a separate plot for each submission. + for method_name in self.data.method_name.unique(): + data = self.data[self.data.method_name == method_name] + title = '{}'.format(method_name) + + plt.close('all') + barplot(df=data, x_label="Molecule ID", y_label="absolute error", title=title) + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + +def generate_statistics_tables(submissions, stats_funcs, directory_path, file_base_name, + sort_stat=None, ordering_functions=None, + latex_header_conversions=None, ignore_refcalcs = True): + stats_names = list(stats_funcs.keys()) + ci_suffixes = ('', '_lower_bound', '_upper_bound') + + # Collect the records for the DataFrames. + statistics_csv = [] + statistics_latex = [] + statistics_plot = [] + + # Collect the records for QQ Plot + # Dictionary of receipt ID: [X, Y, error_slope] + QQplot_dict = {} + + for i, submission in enumerate(submissions): + method_name = submission.method_name + category = submission.category + file_name = submission.file_name + + + # Pull submission type + type = 'Standard' + if submission.reference_submission: + type = 'Reference' + + # Ignore reference calculation, if applicable + if submission.reference_submission and ignore_refcalcs: + continue + + print('\rGenerating bootstrap statistics for submission {} ({}/{})' + ''.format(method_name, i + 1, len(submissions)), end='') + + bootstrap_statistics = submission.compute_logD_statistics(experimental_data, stats_funcs) + + # Compute error slope + error_slope_bootstrap_statistics, QQplot_data = submission.compute_logD_model_uncertainty_statistics(experimental_data) + #print("error_slope_bootstrap_statistics:\n") + #print(error_slope_bootstrap_statistics) + + # Add data to to QQplot dictionary + QQplot_dict.update({method_name : QQplot_data}) + + # Add error slope and CI to bootstrap_statistics + bootstrap_statistics.update({'ES' : error_slope_bootstrap_statistics }) + #print("bootstrap_statistics:\n", bootstrap_statistics) + + # Organize data to construct CSV and PDF versions of statistics tables + record_csv = {} + record_latex = {} + for stats_name, (stats, (lower_bound, upper_bound), bootstrap_samples) in bootstrap_statistics.items(): + # For CSV and JSON we put confidence interval in separate columns. + for suffix, info in zip(ci_suffixes, [stats, lower_bound, upper_bound]): + record_csv[stats_name + suffix] = info + + # For the PDF, print bootstrap CI in the same column. + stats_name_latex = latex_header_conversions.get(stats_name, stats_name) + record_latex[stats_name_latex] = '{:.2f} [{:.2f}, {:.2f}]'.format(stats, lower_bound, upper_bound) + + # For the violin plot, we need all the bootstrap statistics series. + for bootstrap_sample in bootstrap_samples: + statistics_plot.append(dict(ID=method_name, category=category, + statistics=stats_name_latex, value=bootstrap_sample)) + + '''statistics_csv.append({'ID': method_name, 'name': file_name, 'category': category, 'type': type, **record_csv}) + escaped_name = file_name.replace('_', '\_') + statistics_latex.append({'ID': method_name, 'name': escaped_name, 'category': category, 'type':type, **record_latex})''' + + statistics_csv.append({'method name': method_name, 'file name': file_name, 'category': category, 'type': type, **record_csv}) + escaped_name = file_name.replace('_', '\_') + statistics_latex.append({'method name': method_name, 'file name': escaped_name, 'category': category, 'type':type, **record_latex}) + print() + print("statistics_csv:\n",statistics_csv) + print() + + + # Write QQplot_dict to a JSON file for plotting later + #print("QQplot_dict:\n", QQplot_dict) + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + os.makedirs(QQplot_directory_path, exist_ok=True) + QQplot_dict_filename = os.path.join(QQplot_directory_path, 'QQplot_dict.pickle') + + with open(QQplot_dict_filename, 'wb') as outfile: + pickle.dump(QQplot_dict, outfile) + + + # Convert dictionary to Dataframe to create tables/plots easily. + statistics_csv = pd.DataFrame(statistics_csv) + statistics_csv.set_index('method name', inplace=True) + statistics_latex = pd.DataFrame(statistics_latex) + statistics_plot = pd.DataFrame(statistics_plot) + + # Sort by the given statistics. + if sort_stat is not None: + statistics_csv.sort_values(by=sort_stat, inplace=True) + statistics_latex.sort_values(by=latex_header_conversions.get(sort_stat, sort_stat), + inplace=True) + + # Reorder columns that were scrambled by going through a dictionaries. + stats_names_csv = [name + suffix for name in stats_names for suffix in ci_suffixes] + #print("stats_names_csv:", stats_names_csv) + stats_names_latex = [latex_header_conversions.get(name, name) for name in stats_names] + #print("stats_names_latex:", stats_names_latex) + '''statistics_csv = statistics_csv[['name', "category", "type"] + stats_names_csv + ["ES", "ES_lower_bound", "ES_upper_bound"] ] + statistics_latex = statistics_latex[['ID', 'name'] + stats_names_latex + ["ES"]] ## Add error slope(ES)''' + + statistics_csv = statistics_csv[['file name', "category", "type"] + stats_names_csv + ["ES", "ES_lower_bound", "ES_upper_bound"] ] + statistics_latex = statistics_latex[['method name', 'file name', "category", "type"] + stats_names_latex + ["ES"]] ## Add error slope(ES) + + # Create CSV and JSON tables (correct LaTex syntax in column names). + os.makedirs(directory_path) + file_base_path = os.path.join(directory_path, file_base_name) + with open(file_base_path + '.csv', 'w') as f: + statistics_csv.to_csv(f) + '''with open(file_base_path + '.json', 'w') as f: + statistics_csv.to_json(f, orient='index')''' + + + # Create LaTex table. + latex_directory_path = os.path.join(directory_path, file_base_name + 'LaTex') + os.makedirs(latex_directory_path, exist_ok=True) + with open(os.path.join(latex_directory_path, file_base_name + '.tex'), 'w') as f: + f.write('\\documentclass{article}\n' + '\\usepackage[a4paper,margin=0.005in,tmargin=0.5in,lmargin=0.5in,rmargin=0.5in,landscape]{geometry}\n' + '\\usepackage{booktabs}\n' + '\\usepackage{longtable}\n' + '\\pagenumbering{gobble}\n' + '\\begin{document}\n' + '\\begin{center}\n' + '\\scriptsize\n') + statistics_latex.to_latex(f, column_format='|ccccccccc|', escape=False, index=False, longtable=True) + f.write('\end{center}\n' + '\nNotes\n\n' + '- RMSE: Root mean square error\n\n' + '- MAE: Mean absolute error\n\n' + '- ME: Mean error\n\n' + '- R2: R-squared, square of Pearson correlation coefficient\n\n' + '- m: slope of the line fit to predicted vs experimental logD values\n\n' + '- $\\tau$: Kendall rank correlation coefficient\n\n' + '- ES: error slope calculated from the QQ Plots of model uncertainty predictions\n\n' + '- Mean and 95\% confidence intervals of RMSE, MAE, ME, R2, and m were calculated by bootstrapping with 10000 samples.\n\n' + '- 95\% confidence intervals of ES were calculated by bootstrapping with 1000 samples.' + #'- Some logD predictions were submitted after the submission deadline to be used as a reference method.\n\n' + '\end{document}\n') + + # Violin plots by statistics across submissions. + plt.close('all') + fig, axes = plt.subplots(ncols=len(stats_names), figsize=(28, 10)) + for ax, stats_name in zip(axes, stats_names): + stats_name_latex = latex_header_conversions.get(stats_name, stats_name) + data = statistics_plot[statistics_plot.statistics == stats_name_latex] + # Plot ordering submission by statistics. + ordering_function = ordering_functions.get(stats_name, lambda x: x) + order = sorted(statistics_csv[stats_name].items(), key=lambda x: ordering_function(x[1])) + order = [method_name for method_name, value in order] + #sns.violinplot(x='value', y='ID', data=data, ax=ax, + sns.violinplot(x='value', y='ID', data=data, ax=ax, + order=order, palette='PuBuGn_r', linewidth=0.5, width=1) + ax.set_xlabel(stats_name_latex) + ax.set_ylabel('') + sns.set_style("whitegrid") + plt.tight_layout() + # plt.show() + plt.savefig(file_base_path + '_bootstrap_distributions.pdf') + + + + +def generate_performance_comparison_plots(statistics_filename, directory_path, ignore_refcalcs = False): + # Read statistics table + statistics_file_path = os.path.join(directory_path, statistics_filename) + df_statistics = pd.read_csv(statistics_file_path) + + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 20 #14 + plt.rcParams['ytick.labelsize'] = 20 #16 + plt.rcParams['legend.fontsize'] = 20 + #plt.rcParams['legend.handlelength'] = 2 + #plt.rcParams['figure.autolayout'] = True + + # RMSE comparison plot + barplot_with_CI_errorbars(df=df_statistics, x_label="method name", y_label="RMSE", y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", figsize=(28,10)) # figsize=(22,10) + plt.savefig(directory_path + "/RMSE_vs_method_plot.pdf") + + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label = "category", figsize=(28,10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + barplot_with_CI_errorbars_colored_by_label(df=df_statistics, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label = "type", figsize=(28,10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_colored_by_type.pdf") + + # MAE comparison plot + # Reorder based on MAE value + df_statistics_MAE = df_statistics.sort_values(by="MAE", inplace=False) + + barplot_with_CI_errorbars(df=df_statistics_MAE, x_label="method name", y_label="MAE", y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", figsize=(28,10)) + plt.savefig(directory_path + "/MAE_vs_method_plot.pdf") + + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_colored_by_type.pdf") + + + # Kendall's Tau comparison plot + # Reorder based on Kendall's Tau value + df_statistics_tau = df_statistics.sort_values(by="kendall_tau", inplace=False, ascending=False) + + barplot_with_CI_errorbars(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot.pdf") + + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="category", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_colored_by_method_category.pdf") + + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="type", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_colored_by_type.pdf") + + + + # R-squared comparison plot + # Reorder based on R-squared + df_statistics_R2 = df_statistics.sort_values(by="R2", inplace=False, ascending=False) + + barplot_with_CI_errorbars(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot.pdf") + + # R-squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_colored_by_method_category.pdf") + + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_colored_by_type.pdf") + + + + '''# Plot RMSE, MAE, Kendall's Tau, and R-squared comparison plots for each category separately + category_list = ["MM logP + QM+LEC pKa", + "Empirical (ref)", + "QM logP + QM pKa", + "MM logP + Experimental pKa", + "Empirical logP + Experimental pKa", + "Experimental logP + QM pKa", + "Empirical logP + QM pKa", + "Experimental logP + Experimental pKa"] + + + # New labels for file naming for reassigned categories + category_path_label_dict = {"MM logP + QM+LEC pKa": "Physical_MM_QM_LEC", + "Empirical (ref)": "Empirical", + "QM logP + QM pKa": "Physical_QM", + "MM logP + Experimental pKa":"Physical_MM_Experimental_pKa", + "Empirical logP + Experimental pKa":"Empirical_Experimental_pKa", + "Experimental logP + QM pKa":"Experimental_logP_QM", + "Empirical logP + QM pKa":"Empirical_QM", + "Experimental logP + Experimental pKa":"Experimental_only"} + + + for category in category_list: + print("category: ",category) + #print("df_statistics.columns:\n", df_statistics.columns) + + # Take subsection of dataframe for each category + df_statistics_1category = df_statistics.loc[df_statistics['category'] == category] + df_statistics_MAE_1category = df_statistics_MAE.loc[df_statistics_MAE['category'] == category] + df_statistics_tau_1category = df_statistics_tau.loc[df_statistics_tau['category'] == category] + df_statistics_R2_1category = df_statistics_R2.loc[df_statistics_R2['category'] == category] + + # RMSE comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_1category, x_label="method name", y_label="RMSE", y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0.0,7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # MAE comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_MAE_1category, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # Kendall's Tau comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_tau_1category, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # R-squared comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_R2_1category, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + + # Create plots for Physical methods (both MM and QM methods) + + df_statistics_MM = df_statistics.loc[df_statistics['category'] == "Physical (MM) + QM+LEC"] + df_statistics_QM = df_statistics.loc[df_statistics['category'] == "Physical (QM)"] + df_statistics_physical = pd.concat([df_statistics_MM, df_statistics_QM]) + + # RMSE comparison plot + # Reorder based on RMSE value + df_statistics_physical_RMSE = df_statistics_physical.sort_values(by="RMSE", inplace=False) + + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_RMSE, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_RMSE, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_physical_methods_colored_by_type.pdf") + + # MAE comparison plot + # Reorder based on MAE value + df_statistics_physical_MAE = df_statistics_physical.sort_values(by="MAE", inplace=False) + + # ME comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_physical_methods_colored_by_type.pdf") + + # Kendall's Tau comparison plot + # Reorder based on Tau value + df_statistics_physical_tau = df_statistics_physical.sort_values(by="kendall_tau", inplace=False, ascending=False) + + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="category", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="type", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendall_tau_vs_method_plot_physical_methods_colored_by_type.pdf") + + + # R-squared comparison plot + # Reorder based on R-squared value + df_statistics_physical_R2 = df_statistics_physical.sort_values(by="R2", inplace=False, ascending=False) + + # R-squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # R-Squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_physical_methods_colored_by_type.pdf")''' + + + + +def generate_QQplots_for_model_uncertainty(input_file_name, directory_path): + + # Read QQplot data points from Pickle file + QQplot_dict_filename = os.path.join(directory_path, input_file_name) + with open(QQplot_dict_filename, 'rb') as handle: + QQplot_dict = pickle.load(handle) + + # Iterate through dictionary to create QQ Plot for each submission + for submission_ID, data in QQplot_dict.items(): + X, Y, slope = data + submission_ID = name_to_filename(submission_ID) + QQplot_output_filename = os.path.join(directory_path, "{}_QQ.pdf".format(submission_ID)) + makeQQplot(X, Y, slope, title=submission_ID, xLabel="Expected fraction within range", + yLabel="Fraction of predictions within range", fileName=QQplot_output_filename, + uncLabel='Model Unc.', leg=[0.05, 0.975, "upper left", 1], ax1=None) + # leg=[1.02, 0.98, 2, 1] + + # Replot first item of the dictionary to fix style + #submission_ID = list(QQplot_dict.keys())[0] # first submission ID + #print("Submission ID:", submission_ID) + #data = QQplot_dict.get(submission_ID) + #X, Y, slope = data + #makeQQplot(X, Y, slope, title=submission_ID, xLabel="Expected fraction within range", + # yLabel="Fraction of predictions within range", fileName=QQplot_output_filename, + # uncLabel='Model Unc.', leg=[0.05, 0.95, "upper left", 1], ax1=None) + + print("QQ Plots for model uncertainty generated.") + + +# ============================================================================= +# MAIN +# ============================================================================= + +if __name__ == '__main__': + + sns.set_style('whitegrid') + sns.set_context('paper') + + # Read experimental data. + with open(EXPERIMENTAL_DATA_FILE_PATH, 'r') as f: + # experimental_data = pd.read_json(f, orient='index') + names = ('Molecule ID', 'logD mean', 'logD SEM')#,'Assay Type', 'Experimental ID', 'Isomeric SMILES') + experimental_data = pd.read_csv(f, names=names, skiprows=1) + + # Convert numeric values to dtype float. + for col in experimental_data.columns[1:7]: + experimental_data[col] = pd.to_numeric(experimental_data[col], errors='coerce') + + + experimental_data.set_index("Molecule ID", inplace=True) + experimental_data["Molecule ID"] = experimental_data.index + print("Experimental data: \n", experimental_data) + + # Import user map. + with open(USER_MAP_FILE_PATH, 'r') as f: + user_map = pd.read_csv(f) + + # Configuration: statistics to compute. + stats_funcs = collections.OrderedDict([ + ('RMSE', rmse), + ('MAE', mae), + ('ME', me), + ('R2', r2), + ('m', slope), + ('kendall_tau', kendall_tau) + ]) + ordering_functions = { + 'ME': lambda x: abs(x), + 'R2': lambda x: -x, + 'm': lambda x: abs(1 - x), + 'kendall_tau': lambda x: -x + } + latex_header_conversions = { + 'R2': 'R$^2$', + 'RMSE': 'RMSE', + 'MAE': 'MAE', + 'ME': 'ME', + 'kendall_tau': '$\\tau$' + } + + # ========================================================================================== + # Analysis of ranked and non-ranked blind submissions WITH reference calculations + # ========================================================================================== + + # Load submissions data. + submissions_logD = load_submissions(LOGD_SUBMISSIONS_DIR_PATH, user_map) + print("done w/ submissions_logD") + + # Perform the analysis + output_directory_path='./analysis_outputs_all_submissions' + logD_submission_collection_file_path = '{}/logD_submission_collection.csv'.format(output_directory_path) + + collection_logD = logDSubmissionCollection(submissions_logD, + experimental_data, + output_directory_path, + logD_submission_collection_file_path, + ignore_refcalcs = False, + ranked_only = False, + allow_multiple = True) + + + # Generate plots and tables. + '''for collection in [collection_logD]: + collection.generate_correlation_plots() + collection.generate_correlation_plots_with_SEM() + collection.generate_molecules_plot() + collection.generate_absolute_error_vs_molecule_ID_plot() + + import shutil + + if os.path.isdir('{}/StatisticsTables'.format(output_directory_path)): + shutil.rmtree('{}/StatisticsTables'.format(output_directory_path)) + + + for submissions, type in zip([submissions_logD], ['logD']): + generate_statistics_tables(submissions, + stats_funcs, + directory_path=output_directory_path + '/StatisticsTables', + file_base_name='statistics', + sort_stat='RMSE', + ordering_functions=ordering_functions, + latex_header_conversions=latex_header_conversions, + ignore_refcalcs = False)''' + + # Generate RMSE, MAE, Kendall's Tau comparison plots. + statistics_directory_path = os.path.join(output_directory_path, "StatisticsTables") + generate_performance_comparison_plots(statistics_filename="statistics.csv", + directory_path=statistics_directory_path, + ignore_refcalcs = False) + + # Generate QQ-Plots for model uncertainty predictions + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + generate_QQplots_for_model_uncertainty(input_file_name="QQplot_dict.pickle", + directory_path=QQplot_directory_path) + + + #========================================================================================== + #========================================================================================== + # Analysis of ranked blind submissions only (no nonranked or ref) + #========================================================================================== + #========================================================================================== + + '''# Load submissions data. + ranked_submissions_logD = load_ranked_submissions(LOGD_SUBMISSIONS_DIR_PATH, user_map) + + # Perform the analysis + output_directory_path='./analysis_outputs_ranked_submissions' + logD_submission_collection_file_path = '{}/logD_submission_collection.csv'.format(output_directory_path) + + collection_logD = logDSubmissionCollection(ranked_submissions_logD, + experimental_data, + output_directory_path, + logD_submission_collection_file_path, + ignore_refcalcs = True, ranked_only = True, allow_multiple = False) + + #print("collection_logD: \n", collection_logD) + + # Generate plots and tables. + for collection in [collection_logD]: + collection.generate_correlation_plots() + collection.generate_correlation_plots_with_SEM() + collection.generate_molecules_plot() + collection.generate_absolute_error_vs_molecule_ID_plot() + + + import shutil + + if os.path.isdir('{}/StatisticsTables'.format(output_directory_path)): + shutil.rmtree('{}/StatisticsTables'.format(output_directory_path)) + + + for ranked_submissions, type in zip([ranked_submissions_logD], ['logD']): + generate_statistics_tables(ranked_submissions, + stats_funcs, + directory_path = output_directory_path + '/StatisticsTables', + file_base_name = 'statistics', + sort_stat = 'RMSE', + ordering_functions = ordering_functions, + latex_header_conversions = latex_header_conversions, + ignore_refcalcs = True) + + # Generate RMSE, MAE, Kendall's Tau comparison plots. + statistics_directory_path = os.path.join(output_directory_path, "StatisticsTables") + generate_performance_comparison_plots(statistics_filename="statistics.csv", + directory_path=statistics_directory_path, + ignore_refcalcs = True) + + # Generate QQ-Plots for model uncertainty predictions + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + generate_QQplots_for_model_uncertainty(input_file_name="QQplot_dict.pickle", + directory_path=QQplot_directory_path)''' diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis2.py b/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis2.py new file mode 100644 index 00000000..c4225f59 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/logD_analysis2.py @@ -0,0 +1,785 @@ +#!/usr/bin/env python + +# ============================================================================= +# GLOBAL IMPORTS +# ============================================================================= +import os +import numpy as np +import pandas as pd +from logD_analysis import mae, rmse#, barplot_with_CI_errorbars +from logD_analysis import compute_bootstrap_statistics +import shutil +import seaborn as sns +from matplotlib import pyplot as plt +from matplotlib import cm +import joypy + + +# ============================================================================= +# PLOTTING FUNCTIONS +# ============================================================================= + +def barplot_with_CI_errorbars(df, x_label, y_label, y_lower_label, y_upper_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:,delta_lower_yerr_label] = data.loc[:,y_label] - data.loc[:,y_lower_label] + data.loc[:,delta_upper_yerr_label] = data.loc[:,y_upper_label] - data.loc[:,y_label] + + # Color + current_palette = sns.color_palette() + sns_color = current_palette[2] + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 16 #14 + plt.rcParams['ytick.labelsize'] = 18 #16 + plt.rcParams['legend.fontsize'] = 16 + plt.rcParams['legend.handlelength'] = 2 + plt.rcParams['figure.autolayout'] = True + #plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=90)#, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=sns_color, capsize=3, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + +def barplot_with_CI_errorbars_and_4groups(df1, df2, df3, df4, df5, df6, df7, df8, x_label, y_label, y_lower_label, y_upper_label,group_labels): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + group_labels: List of 4 method category labels + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.tight_layout() + #plt.figure(figsize=(8, 6)) + bar_width = 0.2 + + # Color + #current_palette = sns.color_palette("deep") + + # Zesty colorblind-friendly color palette + color0 = "#0F2080" #dark blue for Physical (MM) + QM+LEC + color1 = "#F5793A" #orange for Empirical + color3 = "#85C0F9" #light blue for Physical (QM) + color2 = "#a866a1" #purple + color4 = "#009e73"#light green + color5 = "#000000"#black + color6 = "#f0e442"#yellow + color7 = "#808080"#grey + current_palette = [color0, color1, color3, color2, color4, color5,color6,color7] + error_color = 'gray' + + + fig, ax = plt.subplots(figsize=(8, 6)) + + # Plot 1st group of data + data = df1 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(x, y, label = "MM logP + QM+LEC pKa", width=bar_width, color=current_palette[0]) + plt.xticks(x, data[x_label], rotation=90) + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 2nd group of data + data = df2 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df2.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + bar_width, y, label = "Empirical (ref)", width=bar_width, color=current_palette[1]) + plt.xticks(index + bar_width/2, data[x_label], rotation=90) + plt.errorbar(index + bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 3nd group of data + data = df3 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df3.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="QM logP + QM pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + + ######### + # Plot 4nd group of data + data = df4 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df4.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="MM logP + Experimental pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + ######### + # Plot 5nd group of data + data = df5 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df5.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="Empirical logP + Experimental pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + # Plot 6nd group of data + data = df6 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df6.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="Experimental logP + QM pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + # Plot 7nd group of data + data = df7 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df7.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="Empirical logP + QM pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + plt.xlabel(x_label) + plt.ylabel(y_label) + + # Plot 8th group of data + data = df8 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df8.shape[0]) + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(index + 2*bar_width, y, label="Experimental logP + Experimental pKa", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + plt.xlabel(x_label) + plt.ylabel(y_label) + + # create legend + from matplotlib.lines import Line2D + custom_lines = [Line2D([0], [0], color=current_palette[0], lw=5), #dark blue for Physical (MM) + QM+LEC + Line2D([0], [0], color=current_palette[1], lw=5), #orange for Empirical + Line2D([0], [0], color=current_palette[2], lw=5), + Line2D([0], [0], color=current_palette[3], lw=5), + Line2D([0], [0], color=current_palette[4], lw=5), + Line2D([0], [0], color=current_palette[5], lw=5)] #light blue for Physical (QM) + + ax.legend(custom_lines, group_labels) + + +def barplot_with_CI_errorbars_and_4groups_ranked(df1, df3, x_label, y_label, y_lower_label, y_upper_label,group_labels): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + group_labels: List of 4 method category labels + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.tight_layout() + #plt.figure(figsize=(8, 6)) + bar_width = 0.2 + + # color palette + color0 = "#0F2080" #dark blue for Physical (MM) + QM+LEC + color1 = "#F5793A" #orange for Empirical + color3 = "#85C0F9" #light blue for Physical (QM) + color2 = "#a866a1" #purple + color4 = "#009e73"#light green + color5 = "#000000"#black + color6 = "#f0e442"#yellow + color7 = "#808080"#grey + current_palette = [color0, color1, color3, color2, color4, color5,color6,color7] + error_color = 'gray' + + fig, ax = plt.subplots(figsize=(8, 6)) + + # Plot 1st group of data + data = df1 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(x, y, label = "Physical (MM) + QM+LEC", width=bar_width, color=current_palette[0]) + plt.xticks(x, data[x_label], rotation=90) + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 3nd group of data + data = df3 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df3.shape[0]) + + x = range(len(data[y_label])) + y = data[y_label] + # plt.bar(x, y) + ax.bar(index + 2*bar_width, y, label="Physical (QM)", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + + + plt.xlabel(x_label) + plt.ylabel(y_label) + + # create legend + from matplotlib.lines import Line2D + custom_lines = [Line2D([0], [0], color=current_palette[0], lw=5), + #Line2D([0], [0], color=current_palette[1], lw=5), + Line2D([0], [0], color=current_palette[2], lw=5)] + #, Line2D([0], [0], color=current_palette[3], lw=5)] + ax.legend(custom_lines, group_labels) + + +def ridge_plot(df, by, column, figsize, colormap): + plt.rcParams['axes.labelsize'] = 14 + plt.rcParams['xtick.labelsize'] = 14 + plt.tight_layout() + + # Make ridge plot + fig, axes = joypy.joyplot(data=df, by=by, column=column, figsize=figsize, colormap=colormap, linewidth=1) + # Add x-axis label + axes[-1].set_xlabel(column) + +def ridge_plot_wo_overlap(df, by, column, figsize, colormap): + plt.rcParams['axes.labelsize'] = 14 + plt.rcParams['xtick.labelsize'] = 14 + plt.tight_layout() + + # Make ridge plot + fig, axes = joypy.joyplot(data=df, by=by, column=column, figsize=figsize, colormap=colormap, linewidth=1, overlap=0) + # Add x-axis label + axes[-1].set_xlabel(column) + + +# ============================================================================= +# CONSTANTS +# ============================================================================= + +# Paths to input data. +#LOGD_COLLECTION_PATH_RANKED_SUBMISSIONS = './analysis_outputs_ranked_submissions/logD_submission_collection.csv' +LOGD_COLLECTION_PATH_ALL_SUBMISSIONS = './analysis_outputs_all_submissions/logD_submission_collection.csv' + +# ============================================================================= +# UTILITY FUNCTIONS +# ============================================================================= + +def read_collection_file(collection_file_path): + """ + Function to read SAMPL6 collection CSV file that was created by logDubmissionCollection. + :param collection_file_path + :return: Pandas DataFrame + """ + + # Check if submission collection file already exists. + if os.path.isfile(collection_file_path): + print("Analysis will be done using the existing collection file: {}".format(collection_file_path)) + + collection_df = pd.read_csv(collection_file_path, index_col=0) + print("\n SubmissionCollection: \n") + print(collection_df) + else: + raise Exception("Collection file doesn't exist: {}".format(collection_file_path)) + + return collection_df + + +def calc_MAE_for_molecules_across_all_predictions(collection_df, directory_path, file_base_name): + """ + Calculate mean absolute error for each molecule for all methods. + :param collection_df: Pandas DataFrame of submission collection. + :param directory_path: Directory for outputs + :param file_base_name: Filename for outputs + :return: + """ + # Create list of Molecule IDs + mol_IDs= list(set(collection_df["Molecule ID"].values)) # List of unique IDs + mol_IDs.sort() + print(mol_IDs) + + # List for keeping records of stats values for each molecule + molecular_statistics = [] + + # Slice the dataframe for each molecule to calculate MAE + for mol_ID in mol_IDs: + collection_df_mol_slice = collection_df.loc[collection_df["Molecule ID"] == mol_ID] + + # 2D array of matched calculated and experimental pKas + data = collection_df_mol_slice[["logD (calc)", "logD (exp)"]].values + + # Calculate mean absolute error + #MAE_value = mae(data) + + # Calculate MAE and RMSE and their 95% confidence intervals + bootstrap_statistics = compute_bootstrap_statistics(samples=data, stats_funcs=[mae, rmse], percentile=0.95, + n_bootstrap_samples=10000) + MAE = bootstrap_statistics[0][0] + MAE_lower_CI = bootstrap_statistics[0][1][0] + MAE_upper_CI = bootstrap_statistics[0][1][1] + print("{} MAE: {} [{}, {}]".format(mol_ID, MAE, MAE_lower_CI, MAE_upper_CI)) + + RMSE = bootstrap_statistics[1][0] + RMSE_lower_CI = bootstrap_statistics[1][1][0] + RMSE_upper_CI = bootstrap_statistics[1][1][1] + print("{} RMSE: {} [{}, {}]\n".format(mol_ID, RMSE, RMSE_lower_CI, RMSE_upper_CI)) + + # Record in CSV file + molecular_statistics.append({'Molecule ID': mol_ID, 'MAE': MAE, 'MAE_lower_CI': MAE_lower_CI, + 'MAE_upper_CI': MAE_upper_CI, 'RMSE': RMSE, 'RMSE_lower_CI': RMSE_lower_CI, + 'RMSE_upper_CI': RMSE_upper_CI}) + + + + # Convert dictionary to Dataframe to create tables/plots easily and save as CSV. + molecular_statistics_df = pd.DataFrame(molecular_statistics) + #molecular_statistics_df.set_index('Molecule ID', inplace=True) + # Sort values by MAE values + molecular_statistics_df.sort_values(by='MAE', inplace=True) + # Create CSV + os.makedirs(directory_path) + file_base_path = os.path.join(directory_path, file_base_name) + with open(file_base_path + '.csv', 'w') as f: + molecular_statistics_df.to_csv(f) + + # Plot MAE and RMSE of each molecule across predictions as a bar plot + barplot_with_CI_errorbars(df = molecular_statistics_df, x_label = 'Molecule ID', + y_label = 'MAE', y_lower_label = 'MAE_lower_CI', y_upper_label = 'MAE_upper_CI', + figsize=(7.5, 6)) + plt.savefig(directory_path + "/MAE_vs_molecule_ID_plot.pdf") + + barplot_with_CI_errorbars(df=molecular_statistics_df, x_label = 'Molecule ID', + y_label = 'RMSE', y_lower_label = 'RMSE_lower_CI', y_upper_label = 'RMSE_upper_CI', + figsize=(7.5, 6)) + plt.savefig(directory_path + "/RMSE_vs_molecule_ID_plot.pdf") + + +def select_subsection_of_collection(collection_df, method_group): + """ + Returns a dataframe which is the subset of rows of collecion dataframe that match the requested method category + :param collection_df: Pandas DataFrame of submission collection. + :param method_group: String that specifies with method group is requested. "Physical","Empirical","Mixed" or "Other" + :return: Pandas DataFrame of subsection of submission collection. + """ + # Filter collection dataframe based on method category + #collection_df_of_selected_method_group = collection_df.loc[collection_df["reassigned category"] == method_group] + collection_df_of_selected_method_group = collection_df.loc[collection_df["category"] == method_group] + collection_df_of_selected_method_group = collection_df_of_selected_method_group.reset_index(drop=True) + + return collection_df_of_selected_method_group + + +def calc_MAE_for_molecules_across_selected_predictions(collection_df, selected_method_group, directory_path, file_base_name): + """ + Calculates mean absolute error for each molecule across prediction method category + :param collection_df: Pandas DataFrame of submission collection. + + :param selected_method_group: "Physical", "Empirical", "Mixed", or "Other" + :param directory_path: Directory path for outputs + :param file_base_name: Output file name + :return: + """ + + # Create subsection of collection dataframe for selected methods + + collection_df_subset = select_subsection_of_collection(collection_df=collection_df, method_group=selected_method_group) + + + subset_directory_path = os.path.join(directory_path, category_path_label_dict[selected_method_group]) + + # Calculate MAE using subsection of collection database + calc_MAE_for_molecules_across_all_predictions(collection_df=collection_df_subset, directory_path=subset_directory_path, file_base_name=file_base_name) + + +def create_comparison_plot_of_molecular_MAE_of_method_categories(directory_path, group1, group2, group3, group4, group5, group6, group7, group8,file_base_name): + label1 = category_path_label_dict[group1] + label2 = category_path_label_dict[group2] + label3 = category_path_label_dict[group3] + label4 = category_path_label_dict[group4] + label5 = category_path_label_dict[group5] + label6 = category_path_label_dict[group6] + label7 = category_path_label_dict[group7] + label8 = category_path_label_dict[group8] + # Read molecular_error_statistics table + df_gr1 = pd.read_csv(directory_path + "/" + label1 + "/molecular_error_statistics_for_{}_methods.csv".format(label1)) + df_gr2 = pd.read_csv(directory_path + "/" + label2 + "/molecular_error_statistics_for_{}_methods.csv".format(label2)) + df_gr3 = pd.read_csv(directory_path + "/" + label3 + "/molecular_error_statistics_for_{}_methods.csv".format(label3)) + df_gr4 = pd.read_csv(directory_path + "/" + label4 + "/molecular_error_statistics_for_{}_methods.csv".format(label4)) + df_gr5 = pd.read_csv(directory_path + "/" + label5 + "/molecular_error_statistics_for_{}_methods.csv".format(label5)) + df_gr6 = pd.read_csv(directory_path + "/" + label6 + "/molecular_error_statistics_for_{}_methods.csv".format(label6)) + df_gr7 = pd.read_csv(directory_path + "/" + label7 + "/molecular_error_statistics_for_{}_methods.csv".format(label7)) + df_gr8 = pd.read_csv(directory_path + "/" + label8 + "/molecular_error_statistics_for_{}_methods.csv".format(label8)) + + # Reorder dataframes based on the order of molecular MAE statistic of Physical QM methods group + ordered_molecule_list = list(df_gr3["Molecule ID"]) + print("ordered_molecule_list: \n", ordered_molecule_list) + + df_gr2_reordered = df_gr2.set_index("Molecule ID") + df_gr2_reordered = df_gr2_reordered.reindex(index=df_gr3['Molecule ID']) #Reset row order based on index of df_gr3 + df_gr2_reordered = df_gr2_reordered.reset_index() + + df_gr1_reordered = df_gr1.set_index("Molecule ID") + df_gr1_reordered = df_gr1_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr3 + df_gr1_reordered = df_gr1_reordered.reset_index() + + df_gr4_reordered = df_gr4.set_index("Molecule ID") + df_gr4_reordered = df_gr4_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr4_reordered = df_gr4_reordered.reset_index() + + df_gr5_reordered = df_gr5.set_index("Molecule ID") + df_gr5_reordered = df_gr5_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr5_reordered = df_gr5_reordered.reset_index() + + df_gr6_reordered = df_gr6.set_index("Molecule ID") + df_gr6_reordered = df_gr6_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr6_reordered = df_gr6_reordered.reset_index() + + df_gr7_reordered = df_gr7.set_index("Molecule ID") + df_gr7_reordered = df_gr7_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr7_reordered = df_gr7_reordered.reset_index() + + df_gr8_reordered = df_gr8.set_index("Molecule ID") + df_gr8_reordered = df_gr8_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr8_reordered = df_gr8_reordered.reset_index() + + + + + # Plot + # Molecular labels will be taken from 1st dataframe, so the second dataframe should have the same molecule ID order. + barplot_with_CI_errorbars_and_4groups(df1=df_gr1_reordered, + df2=df_gr2_reordered, + df3=df_gr3, + df4=df_gr4_reordered, + df5=df_gr5_reordered, + df6=df_gr6_reordered, + df7=df_gr7_reordered, + df8=df_gr8_reordered, + x_label="Molecule ID", y_label="MAE", + y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI", + group_labels=[group1, group2, group3, group4, group5, group6, group7, group8]) + plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + ".pdf") + + # Same comparison plot with only QM results (only for presentation effects) + #barplot_with_CI_errorbars_and_1st_of_2groups(df1=df_qm, df2=df_empirical_reordered, x_label="Molecule ID", y_label="MAE", + # y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI") + #plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + "_only_QM.pdf") + +def create_comparison_plot_of_molecular_MAE_of_method_categories_ranked(directory_path, group1, group2, group3, group4, group5, group6, group7, group8, file_base_name): + label1 = category_path_label_dict[group1] + label2 = category_path_label_dict[group2] + label3 = category_path_label_dict[group3] + label4 = category_path_label_dict[group4] + label5 = category_path_label_dict[group5] + label6 = category_path_label_dict[group6] + label7 = category_path_label_dict[group7] + label8 = category_path_label_dict[group8] + + # Read molecular_error_statistics table + df_gr1 = pd.read_csv(directory_path + "/" + label1 + "/molecular_error_statistics_for_{}_methods.csv".format(label1)) + df_gr2 = pd.read_csv(directory_path + "/" + label2 + "/molecular_error_statistics_for_{}_methods.csv".format(label2)) + df_gr3 = pd.read_csv(directory_path + "/" + label3 + "/molecular_error_statistics_for_{}_methods.csv".format(label3)) + df_gr4 = pd.read_csv(directory_path + "/" + label4 + "/molecular_error_statistics_for_{}_methods.csv".format(label4)) + df_gr5 = pd.read_csv(directory_path + "/" + label5 + "/molecular_error_statistics_for_{}_methods.csv".format(label5)) + df_gr6 = pd.read_csv(directory_path + "/" + label6 + "/molecular_error_statistics_for_{}_methods.csv".format(label6)) + df_gr7 = pd.read_csv(directory_path + "/" + label7 + "/molecular_error_statistics_for_{}_methods.csv".format(label7)) + df_gr8 = pd.read_csv(directory_path + "/" + label8 + "/molecular_error_statistics_for_{}_methods.csv".format(label8)) + + + # Reorder dataframes based on the order of molecular MAE statistic of first group (Physical QM methods) + ordered_molecule_list = list(df_gr3["Molecule ID"]) + print("ordered_molecule_list: \n", ordered_molecule_list) + + df_gr1_reordered = df_gr1.set_index("Molecule ID") + df_gr1_reordered = df_gr1_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr1_reordered = df_gr1_reordered.reset_index() + + df_gr2_reordered = df_gr2.set_index("Molecule ID") + df_gr2_reordered = df_gr2_reordered.reindex(index=df_gr3['Molecule ID']) #Reset row order based on index of df_gr1 + df_gr2_reordered = df_gr2_reordered.reset_index() + + df_gr4_reordered = df_gr4.set_index("Molecule ID") + df_gr4_reordered = df_gr4_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr4_reordered = df_gr4_reordered.reset_index() + + df_gr5_reordered = df_gr5.set_index("Molecule ID") + df_gr5_reordered = df_gr5_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr5_reordered = df_gr5_reordered.reset_index() + + df_gr6_reordered = df_gr6.set_index("Molecule ID") + df_gr6_reordered = df_gr6_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr6_reordered = df_gr6_reordered.reset_index() + + df_gr7_reordered = df_gr7.set_index("Molecule ID") + df_gr7_reordered = df_gr7_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr7_reordered = df_gr7_reordered.reset_index() + + df_gr8_reordered = df_gr8.set_index("Molecule ID") + df_gr8_reordered = df_gr8_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr8_reordered = df_gr8_reordered.reset_index() + + # Plot + # Molecular labels will be taken from 1st dataframe, so the second dataframe should have the same molecule ID order. + barplot_with_CI_errorbars_and_4groups_ranked(df1=df_gr1_reordered, + df2=df_gr2_reordered, + df3=df_gr3, + df4=df_gr4_reordered, + df5=df_gr5_reordered, + df6=df_gr6_reordered, + df7=df_gr7_reordered, + df8=df_gr8_reordered, + x_label="Molecule ID", y_label="MAE", + y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI", + group_labels=[group1, + group2, + group3, + group4, + group5, + group6, + group7, + group8]) + plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + ".pdf") + + # Same comparison plot with only QM results (only for presentation effects) + #barplot_with_CI_errorbars_and_1st_of_2groups(df1=df_qm, df2=df_empirical_reordered, x_label="Molecule ID", y_label="MAE", + # y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI") + #plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + "_only_QM.pdf") + + +def create_molecular_error_distribution_plots(collection_df, directory_path, file_base_name):#, subset_of_method_ids): + + # Ridge plot using all predictions + ridge_plot(df=collection_df, by = "Molecule ID", column = "$\Delta$logD error (calc - exp)", figsize=(4, 6), colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +"_all_methods.pdf") + + # Ridge plot using only consistently well-performing methods + '''collection_subset_df = collection_df[collection_df["method_name"].isin(subset_of_method_ids)].reset_index(drop=True) + ridge_plot(df=collection_subset_df, by = "Molecule ID", column = "$\Delta$logD error (calc - exp)", figsize=(4, 6), + colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +"_well_performing_methods.pdf")''' + + +def create_category_error_distribution_plots(collection_df, directory_path, file_base_name): + + # Ridge plot using all predictions + '''ridge_plot_wo_overlap(df=collection_df, by = "reassigned category", column = "$\Delta$logD error (calc - exp)", figsize=(4, 4), + colormap=cm.plasma)''' + ridge_plot_wo_overlap(df=collection_df, by = "category", column = "$\Delta$logD error (calc - exp)", figsize=(4, 4), + colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +".pdf") + + +def calculate_summary_statistics_of_top_methods_of_each_category(statistics_df, categories, top, directory_path, file_base_name): + df_stat = pd.read_csv(statistics_df) + + data = [] + + for category in categories: + #print(category) + #is_cat = (df_stat["category"] == "Physical") + #print(is_cat) + #df_cat = df_stat[df_stat["reassigned_category"] == category].reset_index(drop=False) + df_cat = df_stat[df_stat["category"] == category].reset_index(drop=False) + + # Already ordered by RMSE + df_cat_top = df_cat.head(top).reset_index(drop=False) + RMSE_mean = df_cat_top["RMSE"].mean() + RMSE_std = df_cat_top["RMSE"].values.std(ddof=1) + + # Reorder by increasing MEA + df_cat = df_cat.sort_values(by="MAE", inplace=False, ascending=True) + df_cat_top = df_cat.head(top).reset_index(drop=False) + MAE_mean = df_cat_top["MAE"].mean() + MAE_std = df_cat_top["MAE"].values.std(ddof=1) + + # Reorder by decreasing Kendall's Tau + df_cat = df_cat.sort_values(by="kendall_tau", inplace=False, ascending=False) + df_cat_top = df_cat.head(top).reset_index(drop=False) + tau_mean = df_cat_top["kendall_tau"].mean() + tau_std = df_cat_top["kendall_tau"].values.std(ddof=1) + + # Reorder by decreasing R-Squared + df_cat = df_cat.sort_values(by="R2", inplace=False, ascending=False) + df_cat_top = df_cat.head(top).reset_index(drop=False) + r2_mean = df_cat_top["R2"].mean() + r2_std = df_cat_top["R2"].values.std(ddof=1) + + # Number of predictions, in case less than 10 + num_predictions =df_cat_top.shape[0] + + data.append({ + 'category': category, + 'RMSE_mean': RMSE_mean, + 'RMSE_std': RMSE_std, + 'MAE_mean': MAE_mean, + 'MAE_std': MAE_std, + 'kendall_tau_mean': tau_mean, + 'kendall_tau_std': tau_std, + 'R2_mean': r2_mean, + 'R2_std': r2_std, + 'N': num_predictions + }) + + # Transform into Pandas DataFrame. + df_stat_summary = pd.DataFrame(data=data) + file_name = os.path.join(directory_path, file_base_name) + df_stat_summary.to_csv(file_name, index=False) + + + +# ============================================================================= +# MAIN +# ============================================================================= + +if __name__ == '__main__': + + # Read collection file + collection_data = read_collection_file(collection_file_path = LOGD_COLLECTION_PATH_ALL_SUBMISSIONS) + + # Create new directory to store molecular statistics + output_directory_path = './analysis_outputs_all_submissions' + analysis_directory_name = 'MolecularStatisticsTables' + + if os.path.isdir('{}/{}'.format(output_directory_path, analysis_directory_name)): + shutil.rmtree('{}/{}'.format(output_directory_path, analysis_directory_name)) + + # Calculate MAE of each molecule across all predictions methods + molecular_statistics_directory_path = os.path.join(output_directory_path, "MolecularStatisticsTables") + calc_MAE_for_molecules_across_all_predictions(collection_df = collection_data, + directory_path = molecular_statistics_directory_path, + file_base_name = "molecular_error_statistics") + + + # Calculate MAE for each molecule across each method category + list_of_method_categories = ["MM logP + QM+LEC pKa", + "Empirical (ref)", + "QM logP + QM pKa", + "MM logP + Experimental pKa", + "Empirical logP + Experimental pKa", + "Experimental logP + QM pKa", + "Empirical logP + QM pKa", + "Experimental logP + Experimental pKa"] + + # New labels for file naming for reassigned categories + category_path_label_dict = {"MM logP + QM+LEC pKa": "Physical_MM_QM_LEC", + "Empirical (ref)": "Empirical", + "QM logP + QM pKa": "Physical_QM", + "MM logP + Experimental pKa":"Physical_MM_Experimental_pKa", + "Empirical logP + Experimental pKa":"Empirical_Experimental_pKa", + "Experimental logP + QM pKa":"Experimental_logP_QM", + "Empirical logP + QM pKa":"Empirical_QM", + "Experimental logP + Experimental pKa":"Experimental_only"} + + for category in list_of_method_categories: + category_file_label = category_path_label_dict[category] + calc_MAE_for_molecules_across_selected_predictions(collection_df=collection_data, + selected_method_group=category, + directory_path=molecular_statistics_directory_path, + file_base_name="molecular_error_statistics_for_{}_methods".format(category_file_label)) + + # Create comparison plot of MAE for each molecule across method categories + create_comparison_plot_of_molecular_MAE_of_method_categories(directory_path=molecular_statistics_directory_path, + group1="MM logP + QM+LEC pKa", + group2="Empirical (ref)", + group3="QM logP + QM pKa", + group4="MM logP + Experimental pKa", + group5="Empirical logP + Experimental pKa", + group6="Experimental logP + QM pKa", + group7="Empirical logP + QM pKa", + group8="Experimental logP + Experimental pKa", + file_base_name="molecular_MAE_comparison_between_method_categories") + + # Create molecular error distribution ridge plots for all methods and a subset of well performing methods (found consistently in the top 15 across 4 metrics) + #well_performing_method_ids = ["4K631", "006AC", "43M66", "5W956", "847L9", "HC032", "7RS67", "D4406"] + #well_performing_method_ids = ["Chemprop", "ClassicalGSG DB3", "COSMO-RS", + # "MD (CGenFF/TIP3P)", "TFE MLR"] + '''create_molecular_error_distribution_plots(collection_df=collection_data, + directory_path=molecular_statistics_directory_path, + #subset_of_method_ids=well_performing_method_ids, + file_base_name="molecular_error_distribution_ridge_plot")''' + + # Compare method categories + + # Calculate error distribution plots for each method category + '''category_comparison_directory_path = os.path.join(output_directory_path, "StatisticsTables/MethodCategoryComparison") + os.makedirs(category_comparison_directory_path, exist_ok=True) + create_category_error_distribution_plots(collection_df=collection_data, + directory_path=category_comparison_directory_path, + file_base_name="error_distribution_of_method_categories_ridge_plot")''' diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/make_input.ipynb b/physical_property/logD/analysis_different_pKa_logP_combos/make_input.ipynb new file mode 100644 index 00000000..566149fa --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/make_input.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import pandas as pd \n", + "data = pd.read_csv('experimental_pKa_and_logP_combined.csv')#, skiprows=4)\n", + "\n", + "\n", + "def cleanup_name(id):\n", + " for ch in [' ','/',\"-\"]:\n", + " if ch in id:\n", + " id=id.replace(ch,\"_\")\n", + " for ch in ['(',')']:\n", + " if ch in id:\n", + " id=id.replace(ch,\"\")\n", + " return id\n", + "data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ct=0\n", + "for key, group_df in data.groupby('logP method_name'):\n", + " print(group_df)\n", + " ct+=1\n", + " #f = cleanup_name(group_df['logP method_name'].unique()[0])+\"-\"+cleanup_name(group_df['pKa method_name'].unique()[0])\n", + " #myFile = open(\"analysis_different_pKa_logP_combos/logD_different_pKa_logP_combo_input_files/\"+f+'.csv', 'w')\n", + " myFile = open(\"experimental_pKa_and_logP_combined\"+'.csv', 'w')\n", + " with myFile: \n", + " myFile.write('Predictions:\\n')\n", + " for index, row in group_df.iterrows():\n", + " myFile.write('%s,%s,%s,%s,%s\\n'%(row['Molecule ID'], row['Molecule ID'], row['logD7.4 (calc)'], \n", + " row['logD SEM (calc)'], row['logD model uncertainty']))\n", + " myFile.write('\\n')\n", + " myFile.write('Name:\\n')\n", + " name = row[\"logP method_name\"] +\" + \"+row[\"pKa method_name\"]\n", + " myFile.write(name)\n", + " myFile.write('\\n')\n", + " myFile.write('Category:\\n')\n", + " \n", + " category = row[\"logP category\"]+\" + \"+row[\"pKa category\"]\n", + " if category == \"Physical (QM) + QM\":\n", + " category = \"Physical (QM)\"\n", + " myFile.write(category)\n", + " myFile.write('\\n')\n", + " myFile.write('Ranked:\\n')\n", + " myFile.write('True')\n", + " myFile.close()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = pd.read_csv('logD_experimental_pKa_logP_combo.csv')#, skiprows=4)\n", + "\n", + "def cleanup_name(id):\n", + " for ch in [' ','/',\"-\"]:\n", + " if ch in id:\n", + " id=id.replace(ch,\"_\")\n", + " for ch in ['(',')']:\n", + " if ch in id:\n", + " id=id.replace(ch,\"\")\n", + " return id\n", + "\n", + "ct=0\n", + "for key, group_df in data.groupby('pKa method_name'):\n", + " print(key)\n", + " ct+=1\n", + " f = cleanup_name(group_df['logP method_name'].unique()[0])+\"-\"+cleanup_name(group_df['pKa method_name'].unique()[0])\n", + " myFile = open(\"logD_using_exp_logP/logD-\"+f+'.csv', 'w+')\n", + " with myFile: \n", + " myFile.write('Predictions:\\n')\n", + " for index, row in group_df.iterrows():\n", + " myFile.write('%s,%s,%s,%s,%s\\n'%(row['Molecule ID'], \n", + " row['Molecule ID'], \n", + " row['logD7.4 (calc)'], \n", + " row['logD SEM (calc)'], \n", + " row['logD model uncertainty']))\n", + " myFile.write('\\n')\n", + " myFile.write('Name:\\n')\n", + " name = row[\"logP method_name\"] +\" + \"+row[\"pKa method_name\"]\n", + " myFile.write(name)\n", + " myFile.write('\\n')\n", + " myFile.write('Category:\\n')\n", + " \n", + " category = row[\"logP category\"]+\" + \"+row[\"pKa category\"]\n", + " if category == \"Physical (QM) + QM\":\n", + " category = \"Physical (QM)\"\n", + " myFile.write(category)\n", + " myFile.write('\\n')\n", + " myFile.write('Ranked:\\n')\n", + " myFile.write('True')\n", + " myFile.close()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python [conda env:s7] *", + "language": "python", + "name": "conda-env-s7-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/best_pKa_and_logP_method_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/best_pKa_and_logP_method_combined.csv new file mode 100644 index 00000000..577aac33 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/best_pKa_and_logP_method_combined.csv @@ -0,0 +1,23 @@ +,logP method_name,pKa method_name,logP file name,pKa file name,logP category,pKa category,Molecule ID,logD7.4 (calc),logD SEM (calc),logD7.4 (exp),logD7.4 SEM (exp),ΔlogD7.4 error (calc-exp),logD model uncertainty +0,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM25,0.37,0.01344107623307046,-0.09,0.01,0.45999999999999996,2.757871928239328 +1,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM26,-0.7000000000000001,0.013361138100292885,-0.87,0.06,0.16999999999999993,2.7495583624304603 +2,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM27,1.47,3.94464950685735e-8,1.56,0.11,-0.09000000000000008,1.3600041024354872 +3,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM28,1.87,0.,1.18,0.08,0.6900000000000002,1.36 +4,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM29,1.47,1.494895479654403e-7,1.61,0.11,-0.14000000000000012,1.3600155469129884 +5,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM30,2.73,1.3453630570196872e-6,2.76,0.19,-0.029999999999999805,1.36013991775793 +6,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM31,1.54,4.3383821036787013e-10,1.96,0.14,-0.41999999999999993,1.360000045119174 +7,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM32,1.97,2.730588122273151e-8,2.44,0.17,-0.47,1.3600028398116473 +8,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM33,3.24,0.,2.96,0.21,0.28000000000000025,1.36 +9,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM34,2.05,1.3696641564886717e-8,2.83,0.2,-0.7800000000000002,1.3600014244507228 +10,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM35,1.36,5.71108298258429e-7,0.87,0.06,0.4900000000000001,1.3600593952630189 +11,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM36,2.63,1.3148643358942858e-6,0.76,0.05,1.8699999999999999,1.360136745890933 +12,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM37,1.44,1.1076281149657644e-7,1.45,0.1,-0.010000000000000009,1.3600115193323958 +13,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM38,0.92,2.027036018354614e-6,1.03,0.07,-0.10999999999999999,1.360210811745909 +14,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM39,2.17,0.00009190237115357836,1.89,0.13,0.28,1.3695578465999723 +15,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM40,1.01,1.3453630570196872e-6,1.82,0.13,-0.81,1.36013991775793 +16,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM41,-0.23,0.01314269632174524,-0.42,0.03,0.18999999999999997,2.726840417461505 +17,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM42,-0.25,0.013308600664929969,0.99,0.07,-1.24,2.744094469152717 +18,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM43,-0.55,0.010713282071424374,0.42,0.03,-0.97,2.474181335428135 +19,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM44,0.28,0.011697253061616422,0.06,0.,0.22000000000000003,2.576514318408108 +20,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM45,1.28,0.012575502951253247,1.06,0.07,0.21999999999999997,2.667852306930338 +21,TFE MLR,EC_RISM,logP-MLRUCR-1,pKa-ECRISM-1,Empirical,QM,SM46,0.53,0.010713297459104525,0.69,0.05,-0.15999999999999992,2.474182935746871 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_pKa_predictions_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_pKa_predictions_combined.csv new file mode 100644 index 00000000..e139080a --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_pKa_predictions_combined.csv @@ -0,0 +1,133 @@ +,logP method_name,pKa method_name,logP file name,pKa file name,logP category,pKa category,Molecule ID,logD7.4 (calc),logD SEM (calc),logD7.4 (exp),logD7.4 SEM (exp),ΔlogD7.4 error (calc-exp),logD model uncertainty +0,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM25,-2.5100000000000002,0.01,-0.09,0.01,-2.4200000000000004,0.8999880195383227 +1,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM26,-4.16,0.01,-0.87,0.06,-3.29,0.8999886092463868 +2,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM27,-1.3900000000000001,0.11,1.56,0.11,-2.95,0.8979742881675223 +3,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM28,1.18,0.08,1.18,0.08,0.,0. +4,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM29,-1.18,0.03,1.61,0.11,-2.79,0.8970721226093223 +5,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM30,-0.37,0.19,2.76,0.19,-3.13,0.8986675007101423 +6,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM31,-1.07,0.14,1.96,0.14,-3.0300000000000002,0.8983151936120535 +7,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM32,-0.65,0.17,2.44,0.17,-3.09,0.8985510374024199 +8,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM33,2.96,0.21,2.96,0.21,0.,0. +9,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM34,-0.4,0.2,2.83,0.2,-3.23,0.8989283886977106 +10,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM35,-3.09,0.02,0.87,0.06,-3.96,0.8998090835365871 +11,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM36,-2.95,0.05,0.76,0.05,-3.71,0.8996487834896427 +12,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM37,-2.41,0.1,1.45,0.1,-3.8600000000000003,0.8997498106923096 +13,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM38,-2.57,0.07,1.03,0.07,-3.5999999999999996,0.899544330547242 +14,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM39,-1.77,0.13,1.89,0.13,-3.66,0.8996081726798318 +15,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM40,-1.94,0.05,1.82,0.13,-3.76,0.8996953304337554 +16,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM41,-5.24,0.02,-0.42,0.03,-4.82,0.8999972682711228 +17,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM42,-4.09,0.03,0.99,0.07,-5.08,0.8999974788075811 +18,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM43,-4.66,0.01,0.42,0.03,-5.08,0.8999943994130649 +19,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM44,-3.14,0.03,0.06,0.,-3.2,0.8999104647733166 +20,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM45,-1.6500000000000001,0.04,1.06,0.07,-2.71,0.899886760987729 +21,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM46,-2.77,0.01,0.69,0.05,-3.46,0.8999411369110885 +22,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM25,2.24,0.01,-0.09,0.01,2.33,0.5673623795592599 +23,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM26,-1.8800000000000001,0.01,-0.87,0.06,-1.0100000000000002,1.4465102190321517 +24,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM27,1.56,0.11,1.56,0.11,0.,3.2205456435880015e-10 +25,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +26,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM29,1.61,0.03,1.61,0.11,0.,1.4275805035488127e-8 +27,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM30,2.7600000000000002,0.19,2.76,0.19,4.440892098500626e-16,1.445262447183409e-7 +28,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM31,1.96,0.14,1.96,0.14,0.,3.0229391433145506e-7 +29,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM32,2.44,0.17,2.44,0.17,0.,1.906679711948364e-9 +30,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +31,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM34,2.83,0.2,2.83,0.2,0.,2.390356281008162e-7 +32,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM35,0.88,0.02,0.87,0.06,0.010000000000000009,3.863243295339658e-6 +33,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM36,0.75,0.05,0.76,0.05,-0.010000000000000009,0.0005176397674037647 +34,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM37,-1.74,0.1,1.45,0.1,-3.19,1.4481128312303422 +35,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM38,1.03,0.07,1.03,0.07,0.,0.000030752831211697175 +36,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM39,1.87,0.13,1.89,0.13,-0.019999999999999796,0.002395960435092921 +37,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM40,1.76,0.05,1.82,0.13,-0.06000000000000005,0.029062952188941078 +38,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM41,-1.74,0.02,-0.42,0.03,-1.32,1.4360296081527748 +39,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM42,-0.8200000000000001,0.03,0.99,0.07,-1.81,1.4424653266494132 +40,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM43,-2.16,0.01,0.42,0.03,-2.58,1.4471930954873795 +41,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM44,0.63,0.03,0.06,0.,0.5700000000000001,0.718204997699307 +42,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM45,2.2,0.04,1.06,0.07,1.1400000000000001,0.44939854616902625 +43,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM46,-0.18,0.01,0.69,0.05,-0.8699999999999999,1.413467961947144 +44,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM25,0.6900000000000001,0.02344107623307046,-0.09,0.01,0.78,1.397871928239328 +45,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM26,-0.84,0.023361138100292885,-0.87,0.06,0.030000000000000027,1.3895583624304602 +46,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM27,1.56,0.11000003944649507,1.56,0.11,0.,4.102435487131644e-6 +47,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +48,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM29,1.61,0.030000149489547965,1.61,0.11,0.,0.00001554691298840579 +49,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM30,2.7600000000000002,0.19000134536305702,2.76,0.19,4.440892098500626e-16,0.00013991775793004748 +50,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM31,1.96,0.14000000043383823,1.96,0.14,0.,4.5119173878258493e-8 +51,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM32,2.44,0.17000002730588124,2.44,0.17,0.,2.839811647164077e-6 +52,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +53,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM34,2.83,0.20000001369664158,2.83,0.2,0.,1.4244507227482186e-6 +54,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM35,0.88,0.020000571108298257,0.87,0.06,0.010000000000000009,0.00005939526301887661 +55,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM36,0.76,0.0500013148643359,0.76,0.05,0.,0.00013674589093300572 +56,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM37,1.45,0.1000001107628115,1.45,0.1,0.,0.00001151933239564395 +57,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM38,1.02,0.07000202703601836,1.03,0.07,-0.010000000000000009,0.00021081174590887987 +58,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM39,1.85,0.1300919023711536,1.89,0.13,-0.039999999999999813,0.009557846599972148 +59,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM40,1.83,0.050001345363057025,1.82,0.13,0.010000000000000009,0.00013991775793004748 +60,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM41,-1.09,0.03314269632174524,-0.42,0.03,-0.6700000000000002,1.366840417461505 +61,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM42,-0.06,0.04330860066492997,0.99,0.07,-1.05,1.3840944691527168 +62,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM43,-0.08,0.020713282071424373,0.42,0.03,-0.5,1.114181335428135 +63,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM44,0.05,0.041697253061616424,0.06,0.,-0.009999999999999995,1.2165143184081078 +64,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM45,1.18,0.052575502951253246,1.06,0.07,0.11999999999999988,1.3078523069303378 +65,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM46,0.79,0.020713297459104524,0.69,0.05,0.10000000000000009,1.1141829357468707 +66,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM25,2.34,0.01,-0.09,0.01,2.4299999999999997,0. +67,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM26,0.9400000000000001,0.01,-0.87,0.06,1.81,0. +68,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM27,1.36,0.11,1.56,0.11,-0.19999999999999996,0. +69,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +70,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM29,1.4000000000000001,0.03,1.61,0.11,-0.20999999999999996,0. +71,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM30,2.5500000000000003,0.19,2.76,0.19,-0.20999999999999952,0. +72,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM31,1.78,0.14,1.96,0.14,-0.17999999999999994,0. +73,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM32,1.71,0.17,2.44,0.17,-0.73,0. +74,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +75,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM34,2.54,0.2,2.83,0.2,-0.29000000000000004,0. +76,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM35,-0.13,0.02,0.87,0.06,-1.,0. +77,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM36,-0.18,0.05,0.76,0.05,-0.94,0. +78,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM37,0.73,0.1,1.45,0.1,-0.72,0. +79,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM38,0.05,0.07,1.03,0.07,-0.98,0. +80,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM39,0.9400000000000001,0.13,1.89,0.13,-0.9499999999999998,0. +81,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM40,0.87,0.05,1.82,0.13,-0.9500000000000001,0. +82,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM41,0.45,0.02,-0.42,0.03,0.87,0. +83,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM42,1.48,0.03,0.99,0.07,0.49,0. +84,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM43,0.62,0.01,0.42,0.03,0.2,0. +85,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM44,0.96,0.03,0.06,0.,0.8999999999999999,0. +86,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM45,2.29,0.04,1.06,0.07,1.23,0. +87,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM46,1.57,0.01,0.69,0.05,0.8800000000000001,0. +88,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM25,-16.09,0.01,-0.09,0.01,-16.,0.6862024272328914 +89,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM26,1.04,0.01,-0.87,0.06,1.9100000000000001,6.862024272327598e-27 +90,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM27,1.56,0.11,1.56,0.11,0.,4.3295745542936e-11 +91,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +92,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM29,1.61,0.03,1.61,0.11,0.,1.9788168744384654e-9 +93,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM30,2.7600000000000002,0.19,2.76,0.19,4.440892098500626e-16,7.861800849750126e-7 +94,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM31,1.96,0.14,1.96,0.14,0.,3.770403786545822e-9 +95,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM32,2.44,0.17,2.44,0.17,0.,3.7707860212996606e-10 +96,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +97,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM34,2.83,0.2,2.83,0.2,0.,2.0722261777124052e-10 +98,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM35,0.88,0.02,0.87,0.06,0.010000000000000009,4.064811345851252e-9 +99,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM36,0.75,0.05,0.76,0.05,-0.010000000000000009,0.0003641341112709075 +100,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM37,1.3800000000000001,0.1,1.45,0.1,-0.06999999999999984,0.01550935626061002 +101,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM38,1.03,0.07,1.03,0.07,0.,7.523902425328459e-11 +102,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM39,1.87,0.13,1.89,0.13,-0.019999999999999796,0.0020316802734248454 +103,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM40,1.82,0.05,1.82,0.13,0.,0.000450714286169811 +104,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM41,-0.63,0.02,-0.42,0.03,-0.21000000000000002,0.6037836374125737 +105,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM42,-0.15,0.03,0.99,0.07,-1.14,0.6692457493186246 +106,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM43,-0.8200000000000001,0.01,0.42,0.03,-1.24,0.657134816087262 +107,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM44,0.24,0.03,0.06,0.,0.18,0.53277016768521 +108,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM45,1.8900000000000001,0.04,1.06,0.07,0.8300000000000001,0.41762430619127205 +109,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM46,1.33,0.01,0.69,0.05,0.6400000000000001,0.2397387894188615 +110,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM25,-0.09,0.693823663460283,-0.09,0.01,0.,1.0941178615364526 +111,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM26,0.6,0.29112560896259204,-0.87,0.06,1.47,0.44980097434014726 +112,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM27,1.56,0.11001296190914142,1.56,0.11,0.,0.00002073905462627814 +113,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +114,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM29,1.58,0.03252410569995359,1.61,0.11,-0.030000000000000027,0.004038569119925756 +115,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM30,2.75,0.19047139281681505,2.76,0.19,-0.009999999999999787,0.0007542285069040627 +116,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM31,1.96,0.1400000000237929,1.96,0.14,0.,3.8068618612047945e-11 +117,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM32,2.44,0.17002823983763338,2.44,0.17,0.,0.0000451837402133877 +118,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +119,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM34,2.83,0.20000029914280965,2.83,0.2,0.,4.786284954241563e-7 +120,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM35,0.88,0.02000043227744815,0.87,0.06,0.010000000000000009,6.91643917039198e-7 +121,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM36,0.76,0.05008071982216392,0.76,0.05,0.,0.00012915171546227784 +122,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM37,1.45,0.10000494444503069,1.45,0.1,0.,7.911112049091283e-6 +123,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM38,-0.28,0.6908838646523137,1.03,0.07,-1.31,0.9934141834437017 +124,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM39,0.32,0.7799094360425363,1.89,0.13,-1.5699999999999998,1.0398550976680578 +125,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM40,1.81,0.05120044198792785,1.82,0.13,-0.010000000000000009,0.0019207071806845565 +126,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM41,0.58,0.0200001247618502,-0.42,0.03,1.,1.9961896031780734e-7 +127,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM42,1.75,0.030493006723981784,0.99,0.07,0.76,0.0007888107583708565 +128,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM43,0.85,0.010014873015340854,0.42,0.03,0.43,0.000023796824545365085 +129,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM44,1.16,0.030070405924932896,0.06,0.,1.0999999999999999,0.00011264947989263565 +130,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM45,2.5500000000000003,0.04000015704952115,1.06,0.07,1.4900000000000002,2.512792338373059e-7 +131,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM46,1.72,0.010000000043295747,0.69,0.05,1.03,6.927319449141272e-11 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_predictions_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_predictions_combined.csv new file mode 100644 index 00000000..e139080a --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_logP_and_participant_predictions_combined.csv @@ -0,0 +1,133 @@ +,logP method_name,pKa method_name,logP file name,pKa file name,logP category,pKa category,Molecule ID,logD7.4 (calc),logD SEM (calc),logD7.4 (exp),logD7.4 SEM (exp),ΔlogD7.4 error (calc-exp),logD model uncertainty +0,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM25,-2.5100000000000002,0.01,-0.09,0.01,-2.4200000000000004,0.8999880195383227 +1,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM26,-4.16,0.01,-0.87,0.06,-3.29,0.8999886092463868 +2,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM27,-1.3900000000000001,0.11,1.56,0.11,-2.95,0.8979742881675223 +3,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM28,1.18,0.08,1.18,0.08,0.,0. +4,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM29,-1.18,0.03,1.61,0.11,-2.79,0.8970721226093223 +5,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM30,-0.37,0.19,2.76,0.19,-3.13,0.8986675007101423 +6,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM31,-1.07,0.14,1.96,0.14,-3.0300000000000002,0.8983151936120535 +7,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM32,-0.65,0.17,2.44,0.17,-3.09,0.8985510374024199 +8,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM33,2.96,0.21,2.96,0.21,0.,0. +9,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM34,-0.4,0.2,2.83,0.2,-3.23,0.8989283886977106 +10,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM35,-3.09,0.02,0.87,0.06,-3.96,0.8998090835365871 +11,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM36,-2.95,0.05,0.76,0.05,-3.71,0.8996487834896427 +12,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM37,-2.41,0.1,1.45,0.1,-3.8600000000000003,0.8997498106923096 +13,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM38,-2.57,0.07,1.03,0.07,-3.5999999999999996,0.899544330547242 +14,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM39,-1.77,0.13,1.89,0.13,-3.66,0.8996081726798318 +15,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM40,-1.94,0.05,1.82,0.13,-3.76,0.8996953304337554 +16,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM41,-5.24,0.02,-0.42,0.03,-4.82,0.8999972682711228 +17,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM42,-4.09,0.03,0.99,0.07,-5.08,0.8999974788075811 +18,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM43,-4.66,0.01,0.42,0.03,-5.08,0.8999943994130649 +19,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM44,-3.14,0.03,0.06,0.,-3.2,0.8999104647733166 +20,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM45,-1.6500000000000001,0.04,1.06,0.07,-2.71,0.899886760987729 +21,logP_experimental,Gaussian_corrected,logP_experimental_values,pKa_prediction_Iorga_Beckstein_1,experimental,QM+LEC,SM46,-2.77,0.01,0.69,0.05,-3.46,0.8999411369110885 +22,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM25,2.24,0.01,-0.09,0.01,2.33,0.5673623795592599 +23,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM26,-1.8800000000000001,0.01,-0.87,0.06,-1.0100000000000002,1.4465102190321517 +24,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM27,1.56,0.11,1.56,0.11,0.,3.2205456435880015e-10 +25,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +26,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM29,1.61,0.03,1.61,0.11,0.,1.4275805035488127e-8 +27,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM30,2.7600000000000002,0.19,2.76,0.19,4.440892098500626e-16,1.445262447183409e-7 +28,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM31,1.96,0.14,1.96,0.14,0.,3.0229391433145506e-7 +29,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM32,2.44,0.17,2.44,0.17,0.,1.906679711948364e-9 +30,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +31,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM34,2.83,0.2,2.83,0.2,0.,2.390356281008162e-7 +32,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM35,0.88,0.02,0.87,0.06,0.010000000000000009,3.863243295339658e-6 +33,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM36,0.75,0.05,0.76,0.05,-0.010000000000000009,0.0005176397674037647 +34,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM37,-1.74,0.1,1.45,0.1,-3.19,1.4481128312303422 +35,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM38,1.03,0.07,1.03,0.07,0.,0.000030752831211697175 +36,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM39,1.87,0.13,1.89,0.13,-0.019999999999999796,0.002395960435092921 +37,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM40,1.76,0.05,1.82,0.13,-0.06000000000000005,0.029062952188941078 +38,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM41,-1.74,0.02,-0.42,0.03,-1.32,1.4360296081527748 +39,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM42,-0.8200000000000001,0.03,0.99,0.07,-1.81,1.4424653266494132 +40,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM43,-2.16,0.01,0.42,0.03,-2.58,1.4471930954873795 +41,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM44,0.63,0.03,0.06,0.,0.5700000000000001,0.718204997699307 +42,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM45,2.2,0.04,1.06,0.07,1.1400000000000001,0.44939854616902625 +43,logP_experimental,IEFPCM/MST,logP_experimental_values,pKa-IEFPCMMST-1,experimental,QM,SM46,-0.18,0.01,0.69,0.05,-0.8699999999999999,1.413467961947144 +44,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM25,0.6900000000000001,0.02344107623307046,-0.09,0.01,0.78,1.397871928239328 +45,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM26,-0.84,0.023361138100292885,-0.87,0.06,0.030000000000000027,1.3895583624304602 +46,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM27,1.56,0.11000003944649507,1.56,0.11,0.,4.102435487131644e-6 +47,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +48,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM29,1.61,0.030000149489547965,1.61,0.11,0.,0.00001554691298840579 +49,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM30,2.7600000000000002,0.19000134536305702,2.76,0.19,4.440892098500626e-16,0.00013991775793004748 +50,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM31,1.96,0.14000000043383823,1.96,0.14,0.,4.5119173878258493e-8 +51,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM32,2.44,0.17000002730588124,2.44,0.17,0.,2.839811647164077e-6 +52,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +53,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM34,2.83,0.20000001369664158,2.83,0.2,0.,1.4244507227482186e-6 +54,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM35,0.88,0.020000571108298257,0.87,0.06,0.010000000000000009,0.00005939526301887661 +55,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM36,0.76,0.0500013148643359,0.76,0.05,0.,0.00013674589093300572 +56,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM37,1.45,0.1000001107628115,1.45,0.1,0.,0.00001151933239564395 +57,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM38,1.02,0.07000202703601836,1.03,0.07,-0.010000000000000009,0.00021081174590887987 +58,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM39,1.85,0.1300919023711536,1.89,0.13,-0.039999999999999813,0.009557846599972148 +59,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM40,1.83,0.050001345363057025,1.82,0.13,0.010000000000000009,0.00013991775793004748 +60,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM41,-1.09,0.03314269632174524,-0.42,0.03,-0.6700000000000002,1.366840417461505 +61,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM42,-0.06,0.04330860066492997,0.99,0.07,-1.05,1.3840944691527168 +62,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM43,-0.08,0.020713282071424373,0.42,0.03,-0.5,1.114181335428135 +63,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM44,0.05,0.041697253061616424,0.06,0.,-0.009999999999999995,1.2165143184081078 +64,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM45,1.18,0.052575502951253246,1.06,0.07,0.11999999999999988,1.3078523069303378 +65,logP_experimental,EC_RISM,logP_experimental_values,pKa-ECRISM-1,experimental,QM,SM46,0.79,0.020713297459104524,0.69,0.05,0.10000000000000009,1.1141829357468707 +66,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM25,2.34,0.01,-0.09,0.01,2.4299999999999997,0. +67,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM26,0.9400000000000001,0.01,-0.87,0.06,1.81,0. +68,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM27,1.36,0.11,1.56,0.11,-0.19999999999999996,0. +69,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +70,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM29,1.4000000000000001,0.03,1.61,0.11,-0.20999999999999996,0. +71,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM30,2.5500000000000003,0.19,2.76,0.19,-0.20999999999999952,0. +72,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM31,1.78,0.14,1.96,0.14,-0.17999999999999994,0. +73,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM32,1.71,0.17,2.44,0.17,-0.73,0. +74,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +75,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM34,2.54,0.2,2.83,0.2,-0.29000000000000004,0. +76,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM35,-0.13,0.02,0.87,0.06,-1.,0. +77,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM36,-0.18,0.05,0.76,0.05,-0.94,0. +78,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM37,0.73,0.1,1.45,0.1,-0.72,0. +79,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM38,0.05,0.07,1.03,0.07,-0.98,0. +80,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM39,0.9400000000000001,0.13,1.89,0.13,-0.9499999999999998,0. +81,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM40,0.87,0.05,1.82,0.13,-0.9500000000000001,0. +82,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM41,0.45,0.02,-0.42,0.03,0.87,0. +83,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM42,1.48,0.03,0.99,0.07,0.49,0. +84,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM43,0.62,0.01,0.42,0.03,0.2,0. +85,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM44,0.96,0.03,0.06,0.,0.8999999999999999,0. +86,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM45,2.29,0.04,1.06,0.07,1.23,0. +87,logP_experimental,TZVP-QM,logP_experimental_values,pka-nhlbi-1c,experimental,QM,SM46,1.57,0.01,0.69,0.05,0.8800000000000001,0. +88,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM25,-16.09,0.01,-0.09,0.01,-16.,0.6862024272328914 +89,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM26,1.04,0.01,-0.87,0.06,1.9100000000000001,6.862024272327598e-27 +90,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM27,1.56,0.11,1.56,0.11,0.,4.3295745542936e-11 +91,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +92,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM29,1.61,0.03,1.61,0.11,0.,1.9788168744384654e-9 +93,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM30,2.7600000000000002,0.19,2.76,0.19,4.440892098500626e-16,7.861800849750126e-7 +94,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM31,1.96,0.14,1.96,0.14,0.,3.770403786545822e-9 +95,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM32,2.44,0.17,2.44,0.17,0.,3.7707860212996606e-10 +96,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +97,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM34,2.83,0.2,2.83,0.2,0.,2.0722261777124052e-10 +98,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM35,0.88,0.02,0.87,0.06,0.010000000000000009,4.064811345851252e-9 +99,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM36,0.75,0.05,0.76,0.05,-0.010000000000000009,0.0003641341112709075 +100,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM37,1.3800000000000001,0.1,1.45,0.1,-0.06999999999999984,0.01550935626061002 +101,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM38,1.03,0.07,1.03,0.07,0.,7.523902425328459e-11 +102,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM39,1.87,0.13,1.89,0.13,-0.019999999999999796,0.0020316802734248454 +103,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM40,1.82,0.05,1.82,0.13,0.,0.000450714286169811 +104,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM41,-0.63,0.02,-0.42,0.03,-0.21000000000000002,0.6037836374125737 +105,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM42,-0.15,0.03,0.99,0.07,-1.14,0.6692457493186246 +106,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM43,-0.8200000000000001,0.01,0.42,0.03,-1.24,0.657134816087262 +107,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM44,0.24,0.03,0.06,0.,0.18,0.53277016768521 +108,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM45,1.8900000000000001,0.04,1.06,0.07,0.8300000000000001,0.41762430619127205 +109,logP_experimental,DFT_M06-2X_SMD_explicit_water,logP_experimental_values,pKa_RodriguezPaluch_SMD_1,experimental,QM,SM46,1.33,0.01,0.69,0.05,0.6400000000000001,0.2397387894188615 +110,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM25,-0.09,0.693823663460283,-0.09,0.01,0.,1.0941178615364526 +111,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM26,0.6,0.29112560896259204,-0.87,0.06,1.47,0.44980097434014726 +112,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM27,1.56,0.11001296190914142,1.56,0.11,0.,0.00002073905462627814 +113,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM28,1.18,0.08,1.18,0.08,0.,0. +114,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM29,1.58,0.03252410569995359,1.61,0.11,-0.030000000000000027,0.004038569119925756 +115,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM30,2.75,0.19047139281681505,2.76,0.19,-0.009999999999999787,0.0007542285069040627 +116,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM31,1.96,0.1400000000237929,1.96,0.14,0.,3.8068618612047945e-11 +117,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM32,2.44,0.17002823983763338,2.44,0.17,0.,0.0000451837402133877 +118,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM33,2.96,0.21,2.96,0.21,0.,0. +119,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM34,2.83,0.20000029914280965,2.83,0.2,0.,4.786284954241563e-7 +120,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM35,0.88,0.02000043227744815,0.87,0.06,0.010000000000000009,6.91643917039198e-7 +121,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM36,0.76,0.05008071982216392,0.76,0.05,0.,0.00012915171546227784 +122,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM37,1.45,0.10000494444503069,1.45,0.1,0.,7.911112049091283e-6 +123,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM38,-0.28,0.6908838646523137,1.03,0.07,-1.31,0.9934141834437017 +124,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM39,0.32,0.7799094360425363,1.89,0.13,-1.5699999999999998,1.0398550976680578 +125,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM40,1.81,0.05120044198792785,1.82,0.13,-0.010000000000000009,0.0019207071806845565 +126,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM41,0.58,0.0200001247618502,-0.42,0.03,1.,1.9961896031780734e-7 +127,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM42,1.75,0.030493006723981784,0.99,0.07,0.76,0.0007888107583708565 +128,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM43,0.85,0.010014873015340854,0.42,0.03,0.43,0.000023796824545365085 +129,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM44,1.16,0.030070405924932896,0.06,0.,1.0999999999999999,0.00011264947989263565 +130,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM45,2.5500000000000003,0.04000015704952115,1.06,0.07,1.4900000000000002,2.512792338373059e-7 +131,logP_experimental,DFT_M05-2X_SMD,logP_experimental_values,pKa-VA-2-charge-correction,experimental,QM,SM46,1.72,0.010000000043295747,0.69,0.05,1.03,6.927319449141272e-11 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_logP_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_logP_combined.csv new file mode 100644 index 00000000..4d0599d5 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_logP_combined.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.24,0.04990175982820839,0.0 +SM26,SM26,-1.45,0.049742376229052185,0.0 +SM27,SM27,1.56,0.11000003171656936,0.0 +SM28,SM28,1.18,0.08,0.0 +SM29,SM29,1.61,0.030000199580283982,0.0 +SM30,SM30,2.76,0.19000006621276927,0.0 +SM31,SM31,1.96,0.14000000230065585,0.0 +SM32,SM32,2.44,0.17000003171656936,0.0 +SM33,SM33,2.96,0.21,0.0 +SM34,SM34,2.83,0.2000000000348365,0.0 +SM35,SM35,0.88,0.020000456164835357,0.0 +SM36,SM36,0.76,0.050000628939601666,0.0 +SM37,SM37,1.45,0.10000005508585408,0.0 +SM38,SM38,1.03,0.07000326718909411,0.0 +SM39,SM39,1.89,0.13000009135794513,0.0 +SM40,SM40,1.83,0.05000172321773445,0.0 +SM41,SM41,-1.6,0.0594766376651706,0.0 +SM42,SM42,0.91,0.0594234547197126,0.0 +SM43,SM43,-0.94,0.04870466475253714,0.0 +SM44,SM44,0.06,0.0638472905132671,0.0 +SM45,SM45,1.07,0.07742105380207609,0.0 +SM46,SM46,0.7000000000000001,0.0427763943395595,0.0 + +Name: +logP_experimental + pKa_experimental +Category: +Experimental logP + Experimental pKa +Ranked: +True diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_logP_predictions_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_logP_predictions_combined.csv new file mode 100644 index 00000000..242d1e0a --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_logP_predictions_combined.csv @@ -0,0 +1,133 @@ +,logP method_name,pKa method_name,logP file name,pKa file name,logP category,pKa category,Molecule ID,logD7.4 (calc),logD SEM (calc),logD7.4 (exp),logD7.4 SEM (exp),ΔlogD7.4 error (calc-exp),logD model uncertainty +0,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM25,1.8900000000000001,0.21990175982820837,-0.09,0.01,1.9800000000000002,1.5 +1,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM26,-0.74,0.14974237622905218,-0.87,0.06,0.13,1.5 +2,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM27,2.43,0.13000003171656935,1.56,0.11,0.8700000000000001,1.5 +3,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM28,1.85,0.13,1.18,0.08,0.6700000000000002,1.5 +4,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM29,1.94,0.15000019958028396,1.61,0.11,0.32999999999999985,1.5 +5,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM30,4.2,0.14000006621276928,2.76,0.19,1.4400000000000004,1.5 +6,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM31,3.0500000000000003,0.13000000230065584,1.96,0.14,1.0900000000000003,1.5 +7,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM32,4.36,0.16000003171656935,2.44,0.17,1.9200000000000004,1.5 +8,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM33,5.45,0.16,2.96,0.21,2.49,1.5 +9,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM34,4.46,0.1400000000348365,2.83,0.2,1.63,1.5 +10,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM35,1.98,0.36000045616483534,0.87,0.06,1.1099999999999999,1.5 +11,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM36,2.68,0.6000006289396016,0.76,0.05,1.9200000000000002,1.5 +12,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM37,2.43,0.7500000550858541,1.45,0.1,0.9800000000000002,1.5 +13,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM38,1.6600000000000001,0.4100032671890941,1.03,0.07,0.6300000000000001,1.5 +14,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM39,3.74,0.41000009135794513,1.89,0.13,1.8500000000000003,1.5 +15,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM40,2.59,0.31000172321773445,1.82,0.13,0.7699999999999998,1.5 +16,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM41,0.41000000000000003,0.2194766376651706,-0.42,0.03,0.8300000000000001,1.5 +17,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM42,4.29,0.1594234547197126,0.99,0.07,3.3,1.5 +18,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM43,1.95,0.19870466475253715,0.42,0.03,1.53,1.5 +19,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM44,-0.26,0.2538472905132671,0.06,0.,-0.32,1.5 +20,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM45,2.3000000000000003,0.4474210538020761,1.06,0.07,1.2400000000000002,1.5 +21,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM46,1.27,0.32277639433955946,0.69,0.05,0.5800000000000001,1.5 +22,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM25,-1.03,0.039901759828208386,-0.09,0.01,-0.9400000000000001,1.06 +23,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM26,-2.7,0.03974237622905219,-0.87,0.06,-1.83,1.06 +24,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM27,1.75,3.171656935122753e-8,1.56,0.11,0.18999999999999995,1.06 +25,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM28,0.83,0.,1.18,0.08,-0.35,1.06 +26,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM29,1.23,1.9958028397883462e-7,1.61,0.11,-0.3800000000000001,1.06 +27,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM30,3.5300000000000002,6.621276927099837e-8,2.76,0.19,0.7700000000000005,1.06 +28,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM31,1.61,2.300655839167245e-9,1.96,0.14,-0.34999999999999987,1.06 +29,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM32,1.6300000000000001,3.171656935122753e-8,2.44,0.17,-0.8099999999999998,1.06 +30,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM33,4.27,0.,2.96,0.21,1.3099999999999996,1.06 +31,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM34,2.39,3.4836487372642844e-11,2.83,0.2,-0.43999999999999995,1.06 +32,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM35,0.77,4.561648353517754e-7,0.87,0.06,-0.09999999999999998,1.06 +33,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM36,3.74,6.28939601658209e-7,0.76,0.05,2.9800000000000004,1.06 +34,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM37,1.8800000000000001,5.508585407394285e-8,1.45,0.1,0.43000000000000016,1.06 +35,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM38,0.48,3.267189094097668e-6,1.03,0.07,-0.55,1.06 +36,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM39,2.47,9.135794513077607e-8,1.89,0.13,0.5800000000000003,1.06 +37,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM40,1.42,1.7232177344455449e-6,1.82,0.13,-0.40000000000000013,1.06 +38,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM41,-1.31,0.039476637665170594,-0.42,0.03,-0.8900000000000001,1.06 +39,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM42,2.89,0.029423454719712593,0.99,0.07,1.9000000000000001,1.06 +40,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM43,0.06,0.038704664752537137,0.42,0.03,-0.36,1.06 +41,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM44,-1.26,0.03384729051326709,0.06,0.,-1.32,1.06 +42,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM45,0.55,0.03742105380207609,1.06,0.07,-0.51,1.06 +43,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM46,-0.07,0.0327763943395595,0.69,0.05,-0.76,1.06 +44,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM25,1.32,0.04990175982820839,-0.09,0.01,1.4100000000000001,1.05 +45,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM26,-0.1,0.04974237622905219,-0.87,0.06,0.77,1.05 +46,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM27,2.21,0.010000031716569352,1.56,0.11,0.6499999999999999,1.05 +47,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM28,2.18,0.01,1.18,0.08,1.0000000000000002,1.05 +48,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM29,2.07,0.010000199580283978,1.61,0.11,0.45999999999999974,1.05 +49,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM30,3.7800000000000002,0.010000066212769271,2.76,0.19,1.0200000000000005,1.05 +50,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM31,3.27,0.01000000230065584,1.96,0.14,1.31,1.05 +51,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM32,2.59,0.010000031716569352,2.44,0.17,0.1499999999999999,1.05 +52,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM33,5.27,0.01,2.96,0.21,2.3099999999999996,1.05 +53,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM34,5.2700000000000005,0.010000000034836488,2.83,0.2,2.4400000000000004,1.05 +54,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM35,0.9500000000000001,0.010000456164835352,0.87,0.06,0.08000000000000007,1.05 +55,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM36,2.59,0.010000628939601658,0.76,0.05,1.8299999999999998,1.05 +56,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM37,2.14,0.010000055085854075,1.45,0.1,0.6900000000000002,1.05 +57,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM38,2.3000000000000003,0.010003267189094098,1.03,0.07,1.2700000000000002,1.05 +58,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM39,4.16,0.010000091357945131,1.89,0.13,2.2700000000000005,1.05 +59,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM40,3.61,0.010001723217734446,1.82,0.13,1.7899999999999998,1.05 +60,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM41,1.1300000000000001,0.049476637665170596,-0.42,0.03,1.55,1.05 +61,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM42,5.41,0.03942345471971259,0.99,0.07,4.42,1.05 +62,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM43,2.48,0.04870466475253714,0.42,0.03,2.06,1.05 +63,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM44,0.52,0.04384729051326709,0.06,0.,0.46,1.05 +64,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM45,1.69,0.04742105380207609,1.06,0.07,0.6299999999999999,1.05 +65,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM46,1.54,0.0427763943395595,0.69,0.05,0.8500000000000001,1.05 +66,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM25,-1.37,0.039901759828208386,-0.09,0.01,-1.28,0. +67,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM26,-2.38,0.03974237622905219,-0.87,0.06,-1.5099999999999998,0. +68,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM27,0.47000000000000003,3.171656935122753e-8,1.56,0.11,-1.09,0. +69,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM28,-0.23,0.,1.18,0.08,-1.41,0. +70,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM29,0.18,1.9958028397883462e-7,1.61,0.11,-1.4300000000000002,0. +71,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM30,1.73,6.621276927099837e-8,2.76,0.19,-1.0299999999999998,0. +72,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM31,0.9400000000000001,2.300655839167245e-9,1.96,0.14,-1.02,0. +73,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM32,1.5,3.171656935122753e-8,2.44,0.17,-0.94,0. +74,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM33,3.12,0.,2.96,0.21,0.16000000000000014,0. +75,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM34,2.18,3.4836487372642844e-11,2.83,0.2,-0.6499999999999999,0. +76,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM35,-0.6900000000000001,4.561648353517754e-7,0.87,0.06,-1.56,0. +77,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM36,0.58,6.28939601658209e-7,0.76,0.05,-0.18000000000000005,0. +78,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM37,0.78,5.508585407394285e-8,1.45,0.1,-0.6699999999999999,0. +79,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM38,-1.6,3.267189094097668e-6,1.03,0.07,-2.63,0. +80,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM39,0.47000000000000003,9.135794513077607e-8,1.89,0.13,-1.42,0. +81,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM40,-0.71,1.7232177344455449e-6,1.82,0.13,-2.5300000000000002,0. +82,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM41,-2.79,0.039476637665170594,-0.42,0.03,-2.37,0. +83,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM42,0.21,0.029423454719712593,0.99,0.07,-0.78,0. +84,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM43,-1.7,0.038704664752537137,0.42,0.03,-2.12,0. +85,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM44,-2.63,0.03384729051326709,0.06,0.,-2.69,0. +86,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM45,-1.74,0.03742105380207609,1.06,0.07,-2.8,0. +87,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM46,-1.8,0.0327763943395595,0.69,0.05,-2.49,0. +88,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM25,-2.48,0.039901759828208386,-0.09,0.01,-2.39,1.47 +89,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM26,-2.91,0.03974237622905219,-0.87,0.06,-2.04,1.47 +90,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM27,0.01,3.171656935122753e-8,1.56,0.11,-1.55,1.47 +91,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM28,-0.79,0.,1.18,0.08,-1.97,1.47 +92,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM29,-0.76,1.9958028397883462e-7,1.61,0.11,-2.37,1.47 +93,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM30,1.17,6.621276927099837e-8,2.76,0.19,-1.5899999999999999,1.47 +94,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM31,-0.13,2.300655839167245e-9,1.96,0.14,-2.09,1.47 +95,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM32,0.86,3.171656935122753e-8,2.44,0.17,-1.58,1.47 +96,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM33,2.61,0.,2.96,0.21,-0.3500000000000001,1.47 +97,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM34,0.92,3.4836487372642844e-11,2.83,0.2,-1.9100000000000001,1.47 +98,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM35,-0.79,4.561648353517754e-7,0.87,0.06,-1.6600000000000001,1.47 +99,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM36,-0.17,6.28939601658209e-7,0.76,0.05,-0.93,1.47 +100,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM37,-1.04,5.508585407394285e-8,1.45,0.1,-2.49,1.47 +101,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM38,-2.75,3.267189094097668e-6,1.03,0.07,-3.7800000000000002,1.47 +102,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM39,-1.2,9.135794513077607e-8,1.89,0.13,-3.09,1.47 +103,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM40,-2.17,1.7232177344455449e-6,1.82,0.13,-3.99,1.47 +104,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM41,-3.22,0.039476637665170594,-0.42,0.03,-2.8000000000000003,1.47 +105,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM42,-0.1,0.029423454719712593,0.99,0.07,-1.09,1.47 +106,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM43,-3.0500000000000003,0.038704664752537137,0.42,0.03,-3.47,1.47 +107,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM44,-3.2600000000000002,0.03384729051326709,0.06,0.,-3.3200000000000003,1.47 +108,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM45,-2.34,0.03742105380207609,1.06,0.07,-3.4,1.47 +109,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM46,-2.89,0.0327763943395595,0.69,0.05,-3.58,1.47 +110,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM25,-2.48,0.39990175982820836,-0.09,0.01,-2.39,0.8 +111,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM26,-3.04,0.3997423762290522,-0.87,0.06,-2.17,0.8 +112,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM27,0.49,0.36000003171656936,1.56,0.11,-1.07,0.8 +113,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM28,-0.6,0.36,1.18,0.08,-1.7799999999999998,0.8 +114,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM29,-0.51,0.36000019958028395,1.61,0.11,-2.12,0.8 +115,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM30,0.9500000000000001,0.36000006621276925,2.76,0.19,-1.8099999999999996,0.8 +116,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM31,0.58,0.3600000023006558,1.96,0.14,-1.38,0.8 +117,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM32,1.24,0.36000003171656936,2.44,0.17,-1.2,0.8 +118,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM33,2.55,0.36,2.96,0.21,-0.41000000000000014,0.8 +119,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM34,1.46,0.36000000003483645,2.83,0.2,-1.37,0.8 +120,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM35,-0.81,0.36000045616483534,0.87,0.06,-1.6800000000000002,0.8 +121,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM36,0.19,0.36000062893960166,0.76,0.05,-0.5700000000000001,0.8 +122,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM37,-1.62,0.3600000550858541,1.45,0.1,-3.0700000000000003,0.8 +123,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM38,-2.59,0.3600032671890941,1.03,0.07,-3.62,0.8 +124,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM39,-0.43,0.36000009135794514,1.89,0.13,-2.32,0.8 +125,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM40,-1.9100000000000001,0.36000172321773444,1.82,0.13,-3.7300000000000004,0.8 +126,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM41,-3.21,0.3994766376651706,-0.42,0.03,-2.79,0.8 +127,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM42,-0.61,0.38942345471971257,0.99,0.07,-1.6,0.8 +128,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM43,-1.99,0.39870466475253713,0.42,0.03,-2.41,0.8 +129,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM44,-2.73,0.39384729051326706,0.06,0.,-2.79,0.8 +130,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM45,-1.98,0.3974210538020761,1.06,0.07,-3.04,0.8 +131,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM46,-2.82,0.39277639433955946,0.69,0.05,-3.51,0.8 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_predictions_combined.csv b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_predictions_combined.csv new file mode 100644 index 00000000..242d1e0a --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/submission_collection_files/experimental_pKa_and_participant_predictions_combined.csv @@ -0,0 +1,133 @@ +,logP method_name,pKa method_name,logP file name,pKa file name,logP category,pKa category,Molecule ID,logD7.4 (calc),logD SEM (calc),logD7.4 (exp),logD7.4 SEM (exp),ΔlogD7.4 error (calc-exp),logD model uncertainty +0,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM25,1.8900000000000001,0.21990175982820837,-0.09,0.01,1.9800000000000002,1.5 +1,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM26,-0.74,0.14974237622905218,-0.87,0.06,0.13,1.5 +2,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM27,2.43,0.13000003171656935,1.56,0.11,0.8700000000000001,1.5 +3,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM28,1.85,0.13,1.18,0.08,0.6700000000000002,1.5 +4,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM29,1.94,0.15000019958028396,1.61,0.11,0.32999999999999985,1.5 +5,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM30,4.2,0.14000006621276928,2.76,0.19,1.4400000000000004,1.5 +6,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM31,3.0500000000000003,0.13000000230065584,1.96,0.14,1.0900000000000003,1.5 +7,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM32,4.36,0.16000003171656935,2.44,0.17,1.9200000000000004,1.5 +8,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM33,5.45,0.16,2.96,0.21,2.49,1.5 +9,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM34,4.46,0.1400000000348365,2.83,0.2,1.63,1.5 +10,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM35,1.98,0.36000045616483534,0.87,0.06,1.1099999999999999,1.5 +11,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM36,2.68,0.6000006289396016,0.76,0.05,1.9200000000000002,1.5 +12,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM37,2.43,0.7500000550858541,1.45,0.1,0.9800000000000002,1.5 +13,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM38,1.6600000000000001,0.4100032671890941,1.03,0.07,0.6300000000000001,1.5 +14,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM39,3.74,0.41000009135794513,1.89,0.13,1.8500000000000003,1.5 +15,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM40,2.59,0.31000172321773445,1.82,0.13,0.7699999999999998,1.5 +16,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM41,0.41000000000000003,0.2194766376651706,-0.42,0.03,0.8300000000000001,1.5 +17,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM42,4.29,0.1594234547197126,0.99,0.07,3.3,1.5 +18,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM43,1.95,0.19870466475253715,0.42,0.03,1.53,1.5 +19,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM44,-0.26,0.2538472905132671,0.06,0.,-0.32,1.5 +20,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM45,2.3000000000000003,0.4474210538020761,1.06,0.07,1.2400000000000002,1.5 +21,MD (CGenFF/TIP3P),pKa_experimental,logP_prediction_Iorga_Beckstein_CGenFF,pKa_experimental_values,Physical (MM),experimental,SM46,1.27,0.32277639433955946,0.69,0.05,0.5800000000000001,1.5 +22,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM25,-1.03,0.039901759828208386,-0.09,0.01,-0.9400000000000001,1.06 +23,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM26,-2.7,0.03974237622905219,-0.87,0.06,-1.83,1.06 +24,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM27,1.75,3.171656935122753e-8,1.56,0.11,0.18999999999999995,1.06 +25,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM28,0.83,0.,1.18,0.08,-0.35,1.06 +26,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM29,1.23,1.9958028397883462e-7,1.61,0.11,-0.3800000000000001,1.06 +27,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM30,3.5300000000000002,6.621276927099837e-8,2.76,0.19,0.7700000000000005,1.06 +28,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM31,1.61,2.300655839167245e-9,1.96,0.14,-0.34999999999999987,1.06 +29,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM32,1.6300000000000001,3.171656935122753e-8,2.44,0.17,-0.8099999999999998,1.06 +30,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM33,4.27,0.,2.96,0.21,1.3099999999999996,1.06 +31,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM34,2.39,3.4836487372642844e-11,2.83,0.2,-0.43999999999999995,1.06 +32,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM35,0.77,4.561648353517754e-7,0.87,0.06,-0.09999999999999998,1.06 +33,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM36,3.74,6.28939601658209e-7,0.76,0.05,2.9800000000000004,1.06 +34,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM37,1.8800000000000001,5.508585407394285e-8,1.45,0.1,0.43000000000000016,1.06 +35,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM38,0.48,3.267189094097668e-6,1.03,0.07,-0.55,1.06 +36,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM39,2.47,9.135794513077607e-8,1.89,0.13,0.5800000000000003,1.06 +37,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM40,1.42,1.7232177344455449e-6,1.82,0.13,-0.40000000000000013,1.06 +38,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM41,-1.31,0.039476637665170594,-0.42,0.03,-0.8900000000000001,1.06 +39,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM42,2.89,0.029423454719712593,0.99,0.07,1.9000000000000001,1.06 +40,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM43,0.06,0.038704664752537137,0.42,0.03,-0.36,1.06 +41,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM44,-1.26,0.03384729051326709,0.06,0.,-1.32,1.06 +42,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM45,0.55,0.03742105380207609,1.06,0.07,-0.51,1.06 +43,TFE IEFPCM MST,pKa_experimental,logP-IEFPCMMST-1,pKa_experimental_values,Physical (QM),experimental,SM46,-0.07,0.0327763943395595,0.69,0.05,-0.76,1.06 +44,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM25,1.32,0.04990175982820839,-0.09,0.01,1.4100000000000001,1.05 +45,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM26,-0.1,0.04974237622905219,-0.87,0.06,0.77,1.05 +46,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM27,2.21,0.010000031716569352,1.56,0.11,0.6499999999999999,1.05 +47,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM28,2.18,0.01,1.18,0.08,1.0000000000000002,1.05 +48,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM29,2.07,0.010000199580283978,1.61,0.11,0.45999999999999974,1.05 +49,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM30,3.7800000000000002,0.010000066212769271,2.76,0.19,1.0200000000000005,1.05 +50,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM31,3.27,0.01000000230065584,1.96,0.14,1.31,1.05 +51,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM32,2.59,0.010000031716569352,2.44,0.17,0.1499999999999999,1.05 +52,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM33,5.27,0.01,2.96,0.21,2.3099999999999996,1.05 +53,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM34,5.2700000000000005,0.010000000034836488,2.83,0.2,2.4400000000000004,1.05 +54,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM35,0.9500000000000001,0.010000456164835352,0.87,0.06,0.08000000000000007,1.05 +55,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM36,2.59,0.010000628939601658,0.76,0.05,1.8299999999999998,1.05 +56,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM37,2.14,0.010000055085854075,1.45,0.1,0.6900000000000002,1.05 +57,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM38,2.3000000000000003,0.010003267189094098,1.03,0.07,1.2700000000000002,1.05 +58,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM39,4.16,0.010000091357945131,1.89,0.13,2.2700000000000005,1.05 +59,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM40,3.61,0.010001723217734446,1.82,0.13,1.7899999999999998,1.05 +60,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM41,1.1300000000000001,0.049476637665170596,-0.42,0.03,1.55,1.05 +61,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM42,5.41,0.03942345471971259,0.99,0.07,4.42,1.05 +62,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM43,2.48,0.04870466475253714,0.42,0.03,2.06,1.05 +63,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM44,0.52,0.04384729051326709,0.06,0.,0.46,1.05 +64,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM45,1.69,0.04742105380207609,1.06,0.07,0.6299999999999999,1.05 +65,EC_RISM_wet,pKa_experimental,logP-ECRISM-1,pKa_experimental_values,Physical (QM),experimental,SM46,1.54,0.0427763943395595,0.69,0.05,0.8500000000000001,1.05 +66,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM25,-1.37,0.039901759828208386,-0.09,0.01,-1.28,0. +67,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM26,-2.38,0.03974237622905219,-0.87,0.06,-1.5099999999999998,0. +68,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM27,0.47000000000000003,3.171656935122753e-8,1.56,0.11,-1.09,0. +69,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM28,-0.23,0.,1.18,0.08,-1.41,0. +70,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM29,0.18,1.9958028397883462e-7,1.61,0.11,-1.4300000000000002,0. +71,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM30,1.73,6.621276927099837e-8,2.76,0.19,-1.0299999999999998,0. +72,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM31,0.9400000000000001,2.300655839167245e-9,1.96,0.14,-1.02,0. +73,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM32,1.5,3.171656935122753e-8,2.44,0.17,-0.94,0. +74,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM33,3.12,0.,2.96,0.21,0.16000000000000014,0. +75,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM34,2.18,3.4836487372642844e-11,2.83,0.2,-0.6499999999999999,0. +76,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM35,-0.6900000000000001,4.561648353517754e-7,0.87,0.06,-1.56,0. +77,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM36,0.58,6.28939601658209e-7,0.76,0.05,-0.18000000000000005,0. +78,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM37,0.78,5.508585407394285e-8,1.45,0.1,-0.6699999999999999,0. +79,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM38,-1.6,3.267189094097668e-6,1.03,0.07,-2.63,0. +80,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM39,0.47000000000000003,9.135794513077607e-8,1.89,0.13,-1.42,0. +81,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM40,-0.71,1.7232177344455449e-6,1.82,0.13,-2.5300000000000002,0. +82,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM41,-2.79,0.039476637665170594,-0.42,0.03,-2.37,0. +83,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM42,0.21,0.029423454719712593,0.99,0.07,-0.78,0. +84,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM43,-1.7,0.038704664752537137,0.42,0.03,-2.12,0. +85,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM44,-2.63,0.03384729051326709,0.06,0.,-2.69,0. +86,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM45,-1.74,0.03742105380207609,1.06,0.07,-2.8,0. +87,TFE-NHLBI-TZVP-QM,pKa_experimental,logp-nhlbi-1,pKa_experimental_values,Physical (QM),experimental,SM46,-1.8,0.0327763943395595,0.69,0.05,-2.49,0. +88,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM25,-2.48,0.039901759828208386,-0.09,0.01,-2.39,1.47 +89,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM26,-2.91,0.03974237622905219,-0.87,0.06,-2.04,1.47 +90,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM27,0.01,3.171656935122753e-8,1.56,0.11,-1.55,1.47 +91,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM28,-0.79,0.,1.18,0.08,-1.97,1.47 +92,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM29,-0.76,1.9958028397883462e-7,1.61,0.11,-2.37,1.47 +93,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM30,1.17,6.621276927099837e-8,2.76,0.19,-1.5899999999999999,1.47 +94,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM31,-0.13,2.300655839167245e-9,1.96,0.14,-2.09,1.47 +95,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM32,0.86,3.171656935122753e-8,2.44,0.17,-1.58,1.47 +96,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM33,2.61,0.,2.96,0.21,-0.3500000000000001,1.47 +97,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM34,0.92,3.4836487372642844e-11,2.83,0.2,-1.9100000000000001,1.47 +98,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM35,-0.79,4.561648353517754e-7,0.87,0.06,-1.6600000000000001,1.47 +99,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM36,-0.17,6.28939601658209e-7,0.76,0.05,-0.93,1.47 +100,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM37,-1.04,5.508585407394285e-8,1.45,0.1,-2.49,1.47 +101,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM38,-2.75,3.267189094097668e-6,1.03,0.07,-3.7800000000000002,1.47 +102,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM39,-1.2,9.135794513077607e-8,1.89,0.13,-3.09,1.47 +103,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM40,-2.17,1.7232177344455449e-6,1.82,0.13,-3.99,1.47 +104,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM41,-3.22,0.039476637665170594,-0.42,0.03,-2.8000000000000003,1.47 +105,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM42,-0.1,0.029423454719712593,0.99,0.07,-1.09,1.47 +106,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM43,-3.0500000000000003,0.038704664752537137,0.42,0.03,-3.47,1.47 +107,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM44,-3.2600000000000002,0.03384729051326709,0.06,0.,-3.3200000000000003,1.47 +108,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM45,-2.34,0.03742105380207609,1.06,0.07,-3.4,1.47 +109,TFE-SMD-solvent-opt,pKa_experimental,logP_RodriguezPaluch_SMD_2,pKa_experimental_values,Physical (QM),experimental,SM46,-2.89,0.0327763943395595,0.69,0.05,-3.58,1.47 +110,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM25,-2.48,0.39990175982820836,-0.09,0.01,-2.39,0.8 +111,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM26,-3.04,0.3997423762290522,-0.87,0.06,-2.17,0.8 +112,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM27,0.49,0.36000003171656936,1.56,0.11,-1.07,0.8 +113,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM28,-0.6,0.36,1.18,0.08,-1.7799999999999998,0.8 +114,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM29,-0.51,0.36000019958028395,1.61,0.11,-2.12,0.8 +115,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM30,0.9500000000000001,0.36000006621276925,2.76,0.19,-1.8099999999999996,0.8 +116,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM31,0.58,0.3600000023006558,1.96,0.14,-1.38,0.8 +117,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM32,1.24,0.36000003171656936,2.44,0.17,-1.2,0.8 +118,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM33,2.55,0.36,2.96,0.21,-0.41000000000000014,0.8 +119,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM34,1.46,0.36000000003483645,2.83,0.2,-1.37,0.8 +120,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM35,-0.81,0.36000045616483534,0.87,0.06,-1.6800000000000002,0.8 +121,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM36,0.19,0.36000062893960166,0.76,0.05,-0.5700000000000001,0.8 +122,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM37,-1.62,0.3600000550858541,1.45,0.1,-3.0700000000000003,0.8 +123,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM38,-2.59,0.3600032671890941,1.03,0.07,-3.62,0.8 +124,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM39,-0.43,0.36000009135794514,1.89,0.13,-2.32,0.8 +125,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM40,-1.9100000000000001,0.36000172321773444,1.82,0.13,-3.7300000000000004,0.8 +126,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM41,-3.21,0.3994766376651706,-0.42,0.03,-2.79,0.8 +127,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM42,-0.61,0.38942345471971257,0.99,0.07,-1.6,0.8 +128,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM43,-1.99,0.39870466475253713,0.42,0.03,-2.41,0.8 +129,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM44,-2.73,0.39384729051326706,0.06,0.,-2.79,0.8 +130,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM45,-1.98,0.3974210538020761,1.06,0.07,-3.04,0.8 +131,TFE b3lypd3,pKa_experimental,logP-EvrimArslan-6,pKa_experimental_values,Physical (QM),experimental,SM46,-2.82,0.39277639433955946,0.69,0.05,-3.51,0.8 diff --git a/physical_property/logD/analysis_different_pKa_logP_combos/user-map2.csv b/physical_property/logD/analysis_different_pKa_logP_combos/user-map2.csv new file mode 100644 index 00000000..99c4a550 --- /dev/null +++ b/physical_property/logD/analysis_different_pKa_logP_combos/user-map2.csv @@ -0,0 +1,12 @@ +1,logD-EC_RISM_wet-EC_RISM.csv +3,logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv +4,logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv +5,logD-TFE_IEFPCM_MST-IEFPCM_MST.csv +6,logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv +8,logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv +9,logD-NES_1_GAFF2_OPC3_B-pKa_experimental.csv +10,logD-TFE_MLR-pKa_experimental.csv +11,logD-logP_experimental-DFT_M05_2X_SMD.csv +12,logD-logP_experimental-EC_RISM.csv +13,logD_2_best_ranked_pKa_logP_combo.csv +14,logP_experimental-pKa_experimental.csv diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..2ff6cb46 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..53058392 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/NULL0.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/NULL0.pdf new file mode 100644 index 00000000..52e19f41 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/NULL0.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/REF0_ChemAxon.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/REF0_ChemAxon.pdf new file mode 100644 index 00000000..0315b0ac Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..d5b48dcd Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..6c01cdd4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..0615fc14 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..ba78cd61 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..868d09c6 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..c78f91b0 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv new file mode 100644 index 00000000..2197abf9 --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Empirical/molecular_error_statistics_for_Empirical_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +19,SM44,0.155,0.06,0.25,0.18179658962697842,0.06,0.25 +18,SM43,0.475,0.42,0.53,0.47817360864020925,0.42,0.53 +21,SM46,0.5199999999999999,0.3499999999999999,0.69,0.5470831746635971,0.3499999999999999,0.69 +16,SM41,0.63,0.42,0.84,0.6640783086353597,0.42,0.84 +1,SM26,0.69,0.51,0.87,0.7130918594402827,0.51,0.87 +20,SM45,0.795,0.53,1.06,0.8380035799446206,0.53,1.06 +11,SM36,0.81,0.76,0.8600000000000001,0.8115417426109394,0.76,0.8600000000000001 +0,SM25,0.9800000000000001,0.09,1.87,1.3238202294873729,0.09,1.87 +14,SM39,0.9849999999999999,0.07999999999999985,1.89,1.3376284985002376,0.07999999999999985,1.89 +10,SM35,1.085,0.87,1.3,1.1060967407962108,0.87,1.3 +17,SM42,1.12,0.99,1.2500000000000002,1.1275194011634568,0.99,1.2500000000000002 +3,SM28,1.13,1.0799999999999998,1.18,1.1311056537742175,1.0799999999999998,1.18 +2,SM27,1.145,0.7300000000000001,1.56,1.2178875153313626,0.7300000000000001,1.56 +13,SM38,1.155,1.03,1.28,1.161744378079791,1.03,1.28 +4,SM29,1.23,0.8500000000000001,1.61,1.2873616430514,0.8500000000000001,1.61 +5,SM30,1.405,0.050000000000000266,2.76,1.9519349374402826,0.050000000000000266,2.76 +6,SM31,1.515,1.0699999999999998,1.96,1.5790028499024311,1.0699999999999998,1.96 +12,SM37,1.605,1.45,1.76,1.612467053926994,1.45,1.76 +8,SM33,1.7,0.43999999999999995,2.96,2.1160340261914503,0.43999999999999995,2.96 +7,SM32,1.77,1.0999999999999999,2.44,1.8925643978475342,1.0999999999999999,2.44 +15,SM40,1.88,1.82,1.94,1.880957203128237,1.82,1.94 +9,SM34,2.095,1.36,2.83,2.220191433187688,1.36,2.83 diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..2cd2342a Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..830ed746 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..774a968e Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv new file mode 100644 index 00000000..686a5999 --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_MM_QM_LEC/molecular_error_statistics_for_Physical_MM_QM_LEC_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +0,SM25,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004 +3,SM28,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002 +7,SM32,1.17,1.17,1.17,1.17,1.17,1.17 +20,SM45,1.48,1.48,1.48,1.48,1.48,1.48 +9,SM34,1.6,1.6,1.6,1.6,1.6,1.6 +5,SM30,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997 +17,SM42,1.7,1.7,1.7,1.7,1.7,1.7 +11,SM36,1.79,1.79,1.79,1.79,1.79,1.79 +14,SM39,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998 +6,SM31,1.94,1.94,1.94,1.94,1.94,1.94 +2,SM27,2.08,2.08,2.08,2.08,2.08,2.08 +18,SM43,2.19,2.19,2.19,2.19,2.19,2.19 +4,SM29,2.46,2.46,2.46,2.46,2.46,2.46 +8,SM33,2.49,2.49,2.49,2.49,2.49,2.49 +1,SM26,2.58,2.58,2.58,2.58,2.58,2.58 +16,SM41,2.81,2.81,2.81,2.81,2.81,2.81 +10,SM35,2.86,2.86,2.86,2.86,2.86,2.86 +12,SM37,2.88,2.88,2.88,2.88,2.88,2.88 +21,SM46,2.89,2.89,2.89,2.89,2.89,2.89 +13,SM38,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998 +15,SM40,3.0,3.0,3.0,3.0,3.0,3.0 +19,SM44,3.52,3.52,3.52,3.52,3.52,3.52 diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..c91d75ec Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..ec8544c7 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv new file mode 100644 index 00000000..48fa71a4 --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +8,SM33,0.9079999999999998,0.28600000000000014,1.7099999999999995,1.2139522231125899,0.30472938814627,1.8841443681416767 +2,SM27,0.95,0.502,1.35,1.064903751519357,0.6421993459977984,1.3782089827018253 +1,SM26,1.018,0.404,1.7079999999999997,1.2651719250757978,0.4915689168366934,1.8569652662341316 +7,SM32,1.082,0.57,1.54,1.2168730418576952,0.7479572180278763,1.549877414507354 +16,SM41,1.1700000000000002,0.5780000000000001,1.762,1.3511402591885122,0.6340189271622733,1.813934949219514 +10,SM35,1.218,0.404,2.032,1.562005121630528,0.7467797533409701,2.1280742468250486 +6,SM31,1.2659999999999998,0.7479999999999999,1.77,1.3821070870232883,0.8930845424706442,1.813764041985616 +5,SM30,1.2879999999999998,0.9640000000000001,1.614,1.3424902234280887,0.9802244640897307,1.6435145268600457 +3,SM28,1.302,0.7739999999999999,1.7819999999999996,1.4254753593100091,0.9201195574489219,1.793694511336866 +17,SM42,1.348,0.30399999999999994,2.5420000000000003,1.8534076723700053,0.38042081961953655,2.8416227758096255 +4,SM29,1.4,0.664,2.136,1.6300306745579975,0.8248151308020483,2.1525984298052436 +20,SM45,1.416,0.8360000000000001,2.012,1.582302120329743,0.9162095830103504,2.1453764238473396 +9,SM34,1.4200000000000004,0.8260000000000002,2.0340000000000003,1.5846640022414846,0.8979198182465962,2.116246677492962 +11,SM36,1.486,0.8280000000000001,2.286,1.7107133015207427,0.8569013945606577,2.455056007507771 +19,SM44,1.564,0.7780000000000001,2.3920000000000003,1.8267347919169876,0.9353181276977368,2.5824561951754377 +18,SM43,1.8059999999999998,0.788,2.824,2.1421157765162926,0.8823151364450231,2.9181603794171425 +21,SM46,1.9280000000000002,1.3519999999999999,2.5479999999999996,2.054273594242013,1.3932408262751994,2.588547855458732 +12,SM37,2.0940000000000003,1.2440000000000002,2.8440000000000003,2.280539409876532,1.4817557153593166,2.850915642385793 +14,SM39,2.432,1.432,3.4019999999999997,2.6723622508933924,1.7994388014044826,3.4432687957811248 +15,SM40,2.7,1.39,3.7980000000000005,3.0221714048015214,1.8936367127831042,3.8027647836804213 +13,SM38,2.8259999999999996,1.338,4.206,3.2715592612697697,1.8321299080578322,4.247792367807071 +0,SM25,5.132,1.584,11.7,8.333695458798575,1.6211600784623337,14.180846237090366 diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1eabe575 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf new file mode 100644 index 00000000..018fc898 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf new file mode 100644 index 00000000..e4e5e3ca Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_statistics.csv b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_statistics.csv new file mode 100644 index 00000000..61dfb6af --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/MolecularStatisticsTables/molecular_error_statistics.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +1,SM26,1.13125,0.59875,1.73625,1.399852670819326,0.7044767561814939,1.9417710987652483 +2,SM27,1.1400000000000001,0.74,1.52875,1.2729395115244087,0.8762419757121888,1.6070314247083037 +3,SM28,1.18,0.8362499999999999,1.52125,1.2829458289421263,0.9267348596011699,1.5872421050362795 +16,SM41,1.24,0.69625,1.85,1.4960782065119458,0.8096912991998864,2.0460907848871224 +7,SM32,1.2650000000000001,0.83375,1.7062499999999998,1.4114000141703273,0.9799362224144997,1.8214005600086984 +20,SM45,1.2687500000000003,0.8574999999999999,1.7325,1.419211929205783,0.9346724025026095,1.8783004285789853 +8,SM33,1.30375,0.6000000000000001,2.0612500000000002,1.6779339975100331,0.8794600616287245,2.2805618825193057 +17,SM42,1.335,0.6925,2.09375,1.6810785228537066,0.873734799581658,2.3980617173042065 +11,SM36,1.355,0.89375,1.9049999999999998,1.5473364210797855,0.9511374769190836,2.1016243717658014 +5,SM30,1.3675,0.8400000000000001,1.8962499999999998,1.5607530233832638,1.0407088930147566,2.035488270661366 +10,SM35,1.3900000000000001,0.73125,2.04375,1.6891491941211112,1.0077512093766001,2.2186059812413736 +6,SM31,1.4125,1.0174999999999998,1.7762499999999999,1.5124979338828863,1.1570220395480804,1.813573819837505 +19,SM44,1.45625,0.6487499999999999,2.36875,1.908576039878946,0.9271124527262051,2.676261384842669 +4,SM29,1.4900000000000002,0.9312499999999999,2.0162500000000003,1.682676439485619,1.1357871719648889,2.09164110210141 +18,SM43,1.52125,0.79375,2.31125,1.8773884787118515,0.9286145594378757,2.544668445986628 +9,SM34,1.61125,1.115,2.1225,1.7668580871139596,1.2435785057647146,2.226491073415746 +21,SM46,1.69625,1.0625,2.33875,1.9381337673132886,1.2712493854472457,2.4647971518970886 +14,SM39,1.9925000000000002,1.1712500000000001,2.79,2.3065721319741987,1.540929102846721,2.972318118909885 +12,SM37,2.0700000000000003,1.50625,2.61375,2.2220148514355165,1.660858211889263,2.6724380254741176 +13,SM38,2.4262500000000005,1.405,3.47875,2.851221229578652,1.6851409436602034,3.752932186970609 +15,SM40,2.5324999999999998,1.73125,3.2824999999999998,2.778119867824281,1.9956327317419906,3.3852621759621515 +0,SM25,3.4887499999999996,0.9512499999999999,7.878749999999998,6.622324931623334,1.2480434687942563,11.25918402904935 diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf new file mode 100644 index 00000000..80f39dbb Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf new file mode 100644 index 00000000..da9bbe38 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/NULL0_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/NULL0_QQ.pdf new file mode 100644 index 00000000..e94d5216 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/NULL0_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/QQplot_dict.pickle b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/QQplot_dict.pickle new file mode 100644 index 00000000..a7a4f99c Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/QQplot_dict.pickle differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/REF0_ChemAxon_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/REF0_ChemAxon_QQ.pdf new file mode 100644 index 00000000..13ab503c Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/REF0_ChemAxon_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf new file mode 100644 index 00000000..aff92625 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf new file mode 100644 index 00000000..f63b1686 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf new file mode 100644 index 00000000..c5217a80 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf new file mode 100644 index 00000000..3be5ff87 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot.pdf new file mode 100644 index 00000000..c9784cc7 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..a6b1fe36 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..e7e224e8 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..6690a420 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..63c48950 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..8057846e Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..f06e8143 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_type.pdf new file mode 100644 index 00000000..de8bc451 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot.pdf new file mode 100644 index 00000000..a9d60fe9 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..64920bf3 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..52106319 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..1d50e778 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..688c2196 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..2569a214 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..a12750b0 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_type.pdf new file mode 100644 index 00000000..0f009979 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf new file mode 100644 index 00000000..de7512c4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..624de670 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..7c3dd486 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..5b3d5804 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..fd55d9ef Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..a7336ab0 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..6b7760fe Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_type.pdf new file mode 100644 index 00000000..233063ad Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..14b5be57 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_type.pdf new file mode 100644 index 00000000..050901f9 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf new file mode 100644 index 00000000..4df82d87 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..d6017229 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf new file mode 100644 index 00000000..342194ec Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_type.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..37bbca08 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..f84e5424 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..4dd43bb4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics.csv b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics.csv new file mode 100644 index 00000000..5af531fd --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics.csv @@ -0,0 +1,9 @@ +method name,file name,category,type,RMSE,RMSE_lower_bound,RMSE_upper_bound,MAE,MAE_lower_bound,MAE_upper_bound,ME,ME_lower_bound,ME_upper_bound,R2,R2_lower_bound,R2_upper_bound,m,m_lower_bound,m_upper_bound,kendall_tau,kendall_tau_lower_bound,kendall_tau_upper_bound,ES,ES_lower_bound,ES_upper_bound +REF0 ChemAxon,logD-REF0-ChemAxon-Bergazin,Empirical,Reference,1.055933451759842,0.826699351748759,1.2641975536068144,0.9104545454545456,0.6845454545454546,1.1354545454545457,0.285,-0.14045454545454553,0.7095454545454545,0.26674767037924135,0.01230698979500488,0.5750595622026736,0.535610976813044,0.09843463165544228,0.895697397353355,0.30735930735930733,-0.01904761904761905,0.6018518518518517,0.12324708944169888,-0.0,0.2806733391475283 +TFE IEFPCM MST + IEFPCM/MST,logD-TFE_IEFPCM_MST-IEFPCM_MST,Physical (QM),Standard,1.2715702533053015,0.8348652585896721,1.6504668760962482,0.9818181818181817,0.6609090909090909,1.3368181818181817,0.2427272727272727,-0.2745454545454545,0.7568181818181817,0.5463858543100745,0.1700311921763144,0.8688030289656609,1.3073698729337788,0.7215637508083488,1.6943507384447705,0.567099567099567,0.2760180995475113,0.817351598173516,1.1577031161316225,0.8768221271647496,1.2435178648061245 +NULL0,logD-NULL0-Bergazin,Empirical,Reference,1.5909545448052134,1.223989379038887,1.9232087297487546,1.3509090909090908,1.0063636363636363,1.7036363636363638,1.2254545454545451,0.8000000000000003,1.6377272727272727,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.6478951229979097,0.43859794841978494,0.8640585849356978 +EC_RISM_wet + EC_RISM,logD-EC_RISM_wet-EC_RISM,Physical (QM),Standard,1.6920267567194514,1.2945058727350196,2.056547194604676,1.4318181818181817,1.0640909090909092,1.816818181818182,-1.4318181818181817,-1.8159090909090916,-1.064090909090909,0.534429811315914,0.20772319200356454,0.7787953759376356,0.9505925394859067,0.5473015665092197,1.3003959186470309,0.500004725964924,0.20861731638973616,0.7346957664160274,0.8359704860390108,0.6377202818500216,1.0127081689438933 +TFE-NHLBI-TZVP-QM + TZVP-QM,logD-TFE_NHLBI_TZVP_QM-TZVP_QM,Physical (QM),Standard,1.7173896682836056,1.2959552461408534,2.1270252381286787,1.470909090909091,1.1222727272727275,1.8581818181818184,1.2581818181818183,0.7722727272727272,1.744090909090909,0.2543523095779817,0.007377551941060897,0.6372260524229922,0.6390157606431108,0.06622162978509182,1.2559996359444208,0.37662337662337664,0.018181818181818184,0.7,0.047205725597685405,-0.0,0.16940072262137135 +TFE b3lypd3 + DFT_M05-2X_SMD,logD-TFE_b3lypd3-DFT_M05_2X_SMD,Physical (QM),Standard,2.1486274688740252,1.545634615177974,2.7085806413893403,1.7799999999999998,1.3004545454545455,2.311363636363637,1.7799999999999998,1.3004545454545455,2.311363636363637,0.3192339234671041,0.04269017008091467,0.6575251485585784,0.8006777614443499,0.26906964597252425,1.2976256663054573,0.4112554112554112,0.04545454545454545,0.7104072398190044,0.4241316776721072,0.2814485651397484,0.6792087157550841 +MD (CGenFF/TIP3P) + Gaussian_corrected,logD-MD_CGenFF_TIP3P-Gaussian_corrected,Physical (MM) + QM+LEC,Standard,2.2724266164769165,1.958033568288719,2.545637087610522,2.130454545454546,1.7850000000000001,2.446818181818182,1.843181818181818,1.2359090909090906,2.334090909090909,0.6178907028584649,0.3444705889962179,0.8451946017368824,1.5257966025220564,0.9141548003929737,2.188128958887081,0.6190476190476191,0.3515981735159817,0.8280542986425339,0.8785110123620862,0.7433586666112935,1.0017027285186264 +TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water,Physical (QM),Standard,4.535362569449507,2.083669185050082,7.158462252340359,2.9159090909090906,1.8731818181818178,4.592272727272727,2.875,1.788636363636364,4.57090909090909,0.24858890827216865,0.1039040823153686,0.7692674192911501,1.9172620428702987,0.5066527522177462,4.465073470710519,0.5466390301186075,0.20909090909090908,0.8063802242134231,0.5468665642260893,0.38244943726898967,0.7253069756495979 diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statisticsLaTex/statistics.tex b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statisticsLaTex/statistics.tex new file mode 100644 index 00000000..ff84dad7 --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statisticsLaTex/statistics.tex @@ -0,0 +1,50 @@ +\documentclass{article} +\usepackage[a4paper,margin=0.005in,tmargin=0.5in,lmargin=0.5in,rmargin=0.5in,landscape]{geometry} +\usepackage{booktabs} +\usepackage{longtable} +\pagenumbering{gobble} +\begin{document} +\begin{center} +\scriptsize +\begin{longtable}{|ccccccccc|} +\toprule + method name & file name & category & type & RMSE & MAE & ME & R$^2$ & m & $\tau$ & ES \\ +\midrule +\endhead +\midrule +\multicolumn{11}{r}{{Continued on next page}} \\ +\midrule +\endfoot + +\bottomrule +\endlastfoot + REF0 ChemAxon & logD-REF0-ChemAxon-Bergazin & Empirical & Reference & 1.06 [0.83, 1.26] & 0.91 [0.68, 1.14] & 0.28 [-0.14, 0.71] & 0.27 [0.01, 0.58] & 0.54 [0.10, 0.90] & 0.31 [-0.02, 0.60] & 0.12 [-0.00, 0.28] \\ + TFE IEFPCM MST + IEFPCM/MST & logD-TFE\_IEFPCM\_MST-IEFPCM\_MST & Physical (QM) & Standard & 1.27 [0.83, 1.65] & 0.98 [0.66, 1.34] & 0.24 [-0.27, 0.76] & 0.55 [0.17, 0.87] & 1.31 [0.72, 1.69] & 0.57 [0.28, 0.82] & 1.16 [0.88, 1.24] \\ + NULL0 & logD-NULL0-Bergazin & Empirical & Reference & 1.59 [1.22, 1.92] & 1.35 [1.01, 1.70] & 1.23 [0.80, 1.64] & 0.00 [0.00, 0.00] & 0.00 [0.00, 0.00] & nan [nan, nan] & 0.65 [0.44, 0.86] \\ + EC_RISM_wet + EC_RISM & logD-EC\_RISM\_wet-EC\_RISM & Physical (QM) & Standard & 1.69 [1.29, 2.06] & 1.43 [1.06, 1.82] & -1.43 [-1.82, -1.06] & 0.53 [0.21, 0.78] & 0.95 [0.55, 1.30] & 0.50 [0.21, 0.73] & 0.84 [0.64, 1.01] \\ + TFE-NHLBI-TZVP-QM + TZVP-QM & logD-TFE\_NHLBI\_TZVP\_QM-TZVP\_QM & Physical (QM) & Standard & 1.72 [1.30, 2.13] & 1.47 [1.12, 1.86] & 1.26 [0.77, 1.74] & 0.25 [0.01, 0.64] & 0.64 [0.07, 1.26] & 0.38 [0.02, 0.70] & 0.05 [-0.00, 0.17] \\ + TFE b3lypd3 + DFT_M05-2X_SMD & logD-TFE\_b3lypd3-DFT\_M05\_2X\_SMD & Physical (QM) & Standard & 2.15 [1.55, 2.71] & 1.78 [1.30, 2.31] & 1.78 [1.30, 2.31] & 0.32 [0.04, 0.66] & 0.80 [0.27, 1.30] & 0.41 [0.05, 0.71] & 0.42 [0.28, 0.68] \\ + MD (CGenFF/TIP3P) + Gaussian_corrected & logD-MD\_CGenFF\_TIP3P-Gaussian\_corrected & Physical (MM) + QM+LEC & Standard & 2.27 [1.96, 2.55] & 2.13 [1.79, 2.45] & 1.84 [1.24, 2.33] & 0.62 [0.34, 0.85] & 1.53 [0.91, 2.19] & 0.62 [0.35, 0.83] & 0.88 [0.74, 1.00] \\ + TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_... & logD-TFE\_SMD\_solvent\_opt-DFT\_M06\_2X\_SMD\... & Physical (QM) & Standard & 4.54 [2.08, 7.16] & 2.92 [1.87, 4.59] & 2.88 [1.79, 4.57] & 0.25 [0.10, 0.77] & 1.92 [0.51, 4.47] & 0.55 [0.21, 0.81] & 0.55 [0.38, 0.73] \\ +\end{longtable} +\end{center} + +Notes + +- RMSE: Root mean square error + +- MAE: Mean absolute error + +- ME: Mean error + +- R2: R-squared, square of Pearson correlation coefficient + +- m: slope of the line fit to predicted vs experimental logD values + +- $\tau$: Kendall rank correlation coefficient + +- ES: error slope calculated from the QQ Plots of model uncertainty predictions + +- Mean and 95\% confidence intervals of RMSE, MAE, ME, R2, and m were calculated by bootstrapping with 10000 samples. + +- 95\% confidence intervals of ES were calculated by bootstrapping with 1000 samples.\end{document} diff --git a/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf new file mode 100644 index 00000000..f12a48bc Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/error_for_each_logD.pdf b/physical_property/logD/analysis_outputs_all_submissions/error_for_each_logD.pdf new file mode 100644 index 00000000..f04cb70b Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/error_for_each_logD.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..a01de60d Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..c0d10bcc Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/REF0_ChemAxon.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/REF0_ChemAxon.pdf new file mode 100644 index 00000000..3d390f54 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..1914ae53 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..88ce0254 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..ee656a46 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..3fc4e042 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..83502847 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..2a9ae34a Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf new file mode 100644 index 00000000..7a825cdc Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/REF0_ChemAxon.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..d33bba98 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..6a8c3b06 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..3ede9aa3 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..205ea661 Binary files /dev/null and b/physical_property/logD/analysis_outputs_all_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_all_submissions/logD_submission_collection.csv b/physical_property/logD/analysis_outputs_all_submissions/logD_submission_collection.csv new file mode 100644 index 00000000..4208584a --- /dev/null +++ b/physical_property/logD/analysis_outputs_all_submissions/logD_submission_collection.csv @@ -0,0 +1,177 @@ +,method_name,category,Molecule ID,logD (calc),logD SEM (calc),logD (exp),logD SEM (exp),$\Delta$logD error (calc - exp),logD model uncertainty +0,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM25,-0.38,0.18,-0.09,0.01,-0.29000000000000004,2.3999880195383225 +1,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM26,-3.45,0.11,-0.87,0.06,-2.58,2.3999886092463867 +2,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM27,-0.52,0.13,1.56,0.11,-2.08,2.3979742881675223 +3,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM28,1.85,0.13,1.18,0.08,0.6700000000000002,1.5 +4,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM29,-0.85,0.15,1.61,0.11,-2.46,2.3970721226093223 +5,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM30,1.07,0.14,2.76,0.19,-1.6899999999999997,2.3986675007101423 +6,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM31,0.02,0.13,1.96,0.14,-1.94,2.3983151936120537 +7,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM32,1.27,0.16,2.44,0.17,-1.17,2.39855103740242 +8,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM33,5.45,0.16,2.96,0.21,2.49,1.5 +9,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM34,1.23,0.14,2.83,0.2,-1.6,2.39892838869771 +10,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM35,-1.99,0.36,0.87,0.06,-2.86,2.399809083536587 +11,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM36,-1.03,0.6,0.76,0.05,-1.79,2.399648783489643 +12,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM37,-1.43,0.75,1.45,0.1,-2.88,2.3997498106923096 +13,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM38,-1.94,0.41,1.03,0.07,-2.9699999999999998,2.3995443305472417 +14,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM39,0.08,0.41,1.89,0.13,-1.8099999999999998,2.399608172679832 +15,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM40,-1.18,0.31,1.82,0.13,-3.0,2.3996953304337554 +16,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM41,-3.23,0.18,-0.42,0.03,-2.81,2.3999972682711226 +17,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM42,-0.71,0.13,0.99,0.07,-1.7,2.399997478807581 +18,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM43,-1.77,0.16,0.42,0.03,-2.19,2.399994399413065 +19,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM44,-3.46,0.22,0.06,0.0,-3.52,2.3999104647733165 +20,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM45,-0.42,0.41,1.06,0.07,-1.48,2.3998867609877292 +21,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM46,-2.2,0.29,0.69,0.05,-2.89,2.3999411369110883 +22,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM25,1.21,0.0,-0.09,0.01,1.3,0.0 +23,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM26,0.01,0.0,-0.87,0.06,0.88,0.0 +24,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM27,0.27,0.0,1.56,0.11,-1.29,0.0 +25,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM28,-0.23,0.0,1.18,0.08,-1.41,0.0 +26,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM29,-0.03,0.0,1.61,0.11,-1.6400000000000001,0.0 +27,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM30,1.52,0.0,2.76,0.19,-1.2399999999999998,0.0 +28,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM31,0.76,0.0,1.96,0.14,-1.2,0.0 +29,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM32,0.77,0.0,2.44,0.17,-1.67,0.0 +30,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM33,3.12,0.0,2.96,0.21,0.16000000000000014,0.0 +31,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM34,1.89,0.0,2.83,0.2,-0.9400000000000002,0.0 +32,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM35,-1.7,0.0,0.87,0.06,-2.57,0.0 +33,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM36,-0.36,0.0,0.76,0.05,-1.12,0.0 +34,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM37,0.06,0.0,1.45,0.1,-1.39,0.0 +35,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM38,-2.58,0.0,1.03,0.07,-3.6100000000000003,0.0 +36,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM39,-0.48,0.0,1.89,0.13,-2.37,0.0 +37,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM40,-1.67,0.0,1.82,0.13,-3.49,0.0 +38,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM41,-0.74,0.0,-0.42,0.03,-0.32,0.0 +39,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM42,0.78,0.0,0.99,0.07,-0.20999999999999996,0.0 +40,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM43,-0.14,0.0,0.42,0.03,-0.56,0.0 +41,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM44,-1.73,0.0,0.06,0.0,-1.79,0.0 +42,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM45,-0.52,0.0,1.06,0.07,-1.58,0.0 +43,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM46,-0.93,0.0,0.69,0.05,-1.62,0.0 +44,EC_RISM_wet + EC_RISM,Physical (QM),SM25,2.25,0.02344107623307046,-0.09,0.01,2.34,2.447871928239328 +45,EC_RISM_wet + EC_RISM,Physical (QM),SM26,0.51,0.023361138100292885,-0.87,0.06,1.38,2.43955836243046 +46,EC_RISM_wet + EC_RISM,Physical (QM),SM27,2.21,0.010000039446495069,1.56,0.11,0.6499999999999999,1.0500041024354871 +47,EC_RISM_wet + EC_RISM,Physical (QM),SM28,2.18,0.01,1.18,0.08,1.0000000000000002,1.05 +48,EC_RISM_wet + EC_RISM,Physical (QM),SM29,2.07,0.010000149489547966,1.61,0.11,0.45999999999999974,1.0500155469129884 +49,EC_RISM_wet + EC_RISM,Physical (QM),SM30,3.78,0.01000134536305702,2.76,0.19,1.02,1.05013991775793 +50,EC_RISM_wet + EC_RISM,Physical (QM),SM31,3.27,0.010000000433838209,1.96,0.14,1.31,1.0500000451191742 +51,EC_RISM_wet + EC_RISM,Physical (QM),SM32,2.59,0.010000027305881223,2.44,0.17,0.1499999999999999,1.0500028398116472 +52,EC_RISM_wet + EC_RISM,Physical (QM),SM33,5.27,0.01,2.96,0.21,2.3099999999999996,1.05 +53,EC_RISM_wet + EC_RISM,Physical (QM),SM34,5.2700000000000005,0.010000013696641566,2.83,0.2,2.4400000000000004,1.0500014244507228 +54,EC_RISM_wet + EC_RISM,Physical (QM),SM35,0.95,0.010000571108298259,0.87,0.06,0.07999999999999996,1.0500593952630188 +55,EC_RISM_wet + EC_RISM,Physical (QM),SM36,2.59,0.010001314864335895,0.76,0.05,1.8299999999999998,1.050136745890933 +56,EC_RISM_wet + EC_RISM,Physical (QM),SM37,2.14,0.0100001107628115,1.45,0.1,0.6900000000000002,1.0500115193323958 +57,EC_RISM_wet + EC_RISM,Physical (QM),SM38,2.29,0.010002027036018357,1.03,0.07,1.26,1.050210811745909 +58,EC_RISM_wet + EC_RISM,Physical (QM),SM39,4.12,0.010091902371153578,1.89,0.13,2.2300000000000004,1.0595578465999722 +59,EC_RISM_wet + EC_RISM,Physical (QM),SM40,3.61,0.01000134536305702,1.82,0.13,1.7899999999999998,1.05013991775793 +60,EC_RISM_wet + EC_RISM,Physical (QM),SM41,1.64,0.023142696321745242,-0.42,0.03,2.06,2.416840417461505 +61,EC_RISM_wet + EC_RISM,Physical (QM),SM42,4.44,0.023308600664929967,0.99,0.07,3.45,2.4340944691527167 +62,EC_RISM_wet + EC_RISM,Physical (QM),SM43,3.34,0.020713282071424373,0.42,0.03,2.92,2.164181335428135 +63,EC_RISM_wet + EC_RISM,Physical (QM),SM44,0.51,0.02169725306161642,0.06,0.0,0.45,2.266514318408108 +64,EC_RISM_wet + EC_RISM,Physical (QM),SM45,1.8,0.022575502951253247,1.06,0.07,0.74,2.357852306930338 +65,EC_RISM_wet + EC_RISM,Physical (QM),SM46,1.63,0.020713297459104524,0.69,0.05,0.94,2.1641829357468705 +66,NULL0,Empirical,SM25,0.0,0.0,-0.09,0.01,0.09,1.0 +67,NULL0,Empirical,SM26,0.0,0.0,-0.87,0.06,0.87,1.0 +68,NULL0,Empirical,SM27,0.0,0.0,1.56,0.11,-1.56,1.0 +69,NULL0,Empirical,SM28,0.0,0.0,1.18,0.08,-1.18,1.0 +70,NULL0,Empirical,SM29,0.0,0.0,1.61,0.11,-1.61,1.0 +71,NULL0,Empirical,SM30,0.0,0.0,2.76,0.19,-2.76,1.0 +72,NULL0,Empirical,SM31,0.0,0.0,1.96,0.14,-1.96,1.0 +73,NULL0,Empirical,SM32,0.0,0.0,2.44,0.17,-2.44,1.0 +74,NULL0,Empirical,SM33,0.0,0.0,2.96,0.21,-2.96,1.0 +75,NULL0,Empirical,SM34,0.0,0.0,2.83,0.2,-2.83,1.0 +76,NULL0,Empirical,SM35,0.0,0.0,0.87,0.06,-0.87,1.0 +77,NULL0,Empirical,SM36,0.0,0.0,0.76,0.05,-0.76,1.0 +78,NULL0,Empirical,SM37,0.0,0.0,1.45,0.1,-1.45,1.0 +79,NULL0,Empirical,SM38,0.0,0.0,1.03,0.07,-1.03,1.0 +80,NULL0,Empirical,SM39,0.0,0.0,1.89,0.13,-1.89,1.0 +81,NULL0,Empirical,SM40,0.0,0.0,1.82,0.13,-1.82,1.0 +82,NULL0,Empirical,SM41,0.0,0.0,-0.42,0.03,0.42,1.0 +83,NULL0,Empirical,SM42,0.0,0.0,0.99,0.07,-0.99,1.0 +84,NULL0,Empirical,SM43,0.0,0.0,0.42,0.03,-0.42,1.0 +85,NULL0,Empirical,SM44,0.0,0.0,0.06,0.0,-0.06,1.0 +86,NULL0,Empirical,SM45,0.0,0.0,1.06,0.07,-1.06,1.0 +87,NULL0,Empirical,SM46,0.0,0.0,0.69,0.05,-0.69,1.0 +88,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM25,1.45,0.0,-0.09,0.01,1.54,1.62736237955926 +89,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM26,-3.13,0.0,-0.87,0.06,-2.26,2.506510219032152 +90,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM27,1.75,0.0,1.56,0.11,0.18999999999999995,1.0600000003220549 +91,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM28,0.83,0.0,1.18,0.08,-0.35,1.06 +92,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM29,1.23,0.0,1.61,0.11,-0.3800000000000001,1.060000014275805 +93,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM30,3.53,0.0,2.76,0.19,0.77,1.0600001445262448 +94,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM31,1.61,0.0,1.96,0.14,-0.34999999999999987,1.0600003022939144 +95,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM32,1.63,0.0,2.44,0.17,-0.81,1.0600000019066798 +96,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM33,4.27,0.0,2.96,0.21,1.3099999999999996,1.06 +97,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM34,2.39,0.0,2.83,0.2,-0.43999999999999995,1.0600002390356282 +98,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM35,0.77,0.0,0.87,0.06,-0.09999999999999998,1.060003863243295 +99,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM36,3.73,0.0,0.76,0.05,2.9699999999999998,1.060517639767404 +100,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM37,-1.31,0.0,1.45,0.1,-2.76,2.508112831230342 +101,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM38,0.48,0.0,1.03,0.07,-0.55,1.0600307528312118 +102,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM39,2.45,0.0,1.89,0.13,0.5600000000000003,1.062395960435093 +103,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM40,1.35,0.0,1.82,0.13,-0.47,1.0890629521889412 +104,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM41,-1.45,0.0,-0.42,0.03,-1.03,2.496029608152775 +105,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM42,1.16,0.0,0.99,0.07,0.16999999999999993,2.5024653266494132 +106,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM43,-1.16,0.0,0.42,0.03,-1.5799999999999998,2.50719309548738 +107,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM44,-0.6900000000000001,0.0,0.06,0.0,-0.75,1.7782049976993068 +108,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM45,1.68,0.0,1.06,0.07,0.6199999999999999,1.5093985461690262 +109,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM46,-0.95,0.0,0.69,0.05,-1.64,2.473467961947144 +110,REF0 ChemAxon,Empirical,SM25,1.78,0.0,-0.09,0.01,1.87,0.0 +111,REF0 ChemAxon,Empirical,SM26,-0.36,0.0,-0.87,0.06,0.51,0.0 +112,REF0 ChemAxon,Empirical,SM27,0.83,0.0,1.56,0.11,-0.7300000000000001,0.0 +113,REF0 ChemAxon,Empirical,SM28,0.1,0.0,1.18,0.08,-1.0799999999999998,0.0 +114,REF0 ChemAxon,Empirical,SM29,0.76,0.0,1.61,0.11,-0.8500000000000001,0.0 +115,REF0 ChemAxon,Empirical,SM30,2.81,0.0,2.76,0.19,0.050000000000000266,0.0 +116,REF0 ChemAxon,Empirical,SM31,0.89,0.0,1.96,0.14,-1.0699999999999998,0.0 +117,REF0 ChemAxon,Empirical,SM32,1.34,0.0,2.44,0.17,-1.0999999999999999,0.0 +118,REF0 ChemAxon,Empirical,SM33,3.4,0.0,2.96,0.21,0.43999999999999995,0.0 +119,REF0 ChemAxon,Empirical,SM34,1.47,0.0,2.83,0.2,-1.36,0.0 +120,REF0 ChemAxon,Empirical,SM35,-0.43,0.0,0.87,0.06,-1.3,0.0 +121,REF0 ChemAxon,Empirical,SM36,1.62,0.0,0.76,0.05,0.8600000000000001,0.0 +122,REF0 ChemAxon,Empirical,SM37,-0.31,0.0,1.45,0.1,-1.76,0.0 +123,REF0 ChemAxon,Empirical,SM38,-0.25,0.0,1.03,0.07,-1.28,0.0 +124,REF0 ChemAxon,Empirical,SM39,1.81,0.0,1.89,0.13,-0.07999999999999985,0.0 +125,REF0 ChemAxon,Empirical,SM40,-0.12,0.0,1.82,0.13,-1.94,0.0 +126,REF0 ChemAxon,Empirical,SM41,0.42,0.0,-0.42,0.03,0.84,0.0 +127,REF0 ChemAxon,Empirical,SM42,2.24,0.0,0.99,0.07,1.2500000000000002,0.0 +128,REF0 ChemAxon,Empirical,SM43,0.95,0.0,0.42,0.03,0.53,0.0 +129,REF0 ChemAxon,Empirical,SM44,-0.19,0.0,0.06,0.0,-0.25,0.0 +130,REF0 ChemAxon,Empirical,SM45,1.59,0.0,1.06,0.07,0.53,0.0 +131,REF0 ChemAxon,Empirical,SM46,0.34,0.0,0.69,0.05,-0.3499999999999999,0.0 +132,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM25,-2.33,1.0438236634602829,-0.09,0.01,-2.24,1.8941178615364531 +133,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM26,-0.99,0.6411256089625921,-0.87,0.06,-0.12,1.2498009743401473 +134,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM27,0.49,0.3600129619091414,1.56,0.11,-1.07,0.8000207390546263 +135,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM28,-0.6,0.36,1.18,0.08,-1.7799999999999998,0.8 +136,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM29,-0.54,0.3625241056999536,1.61,0.11,-2.1500000000000004,0.8040385691199258 +137,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM30,0.94,0.360471392816815,2.76,0.19,-1.8199999999999998,0.8007542285069041 +138,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM31,0.58,0.3600000000237929,1.96,0.14,-1.38,0.8000000000380687 +139,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM32,1.24,0.3600282398376334,2.44,0.17,-1.2,0.8000451837402134 +140,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM33,2.55,0.36,2.96,0.21,-0.41000000000000014,0.8 +141,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM34,1.46,0.3600002991428096,2.83,0.2,-1.37,0.8000004786284954 +142,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM35,-0.81,0.36000043227744816,0.87,0.06,-1.6800000000000002,0.8000006916439171 +143,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM36,0.19,0.3600807198221639,0.76,0.05,-0.5700000000000001,0.8001291517154623 +144,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM37,-1.62,0.3600049444450307,1.45,0.1,-3.0700000000000003,0.8000079111120492 +145,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM38,-3.9,0.9808838646523136,1.03,0.07,-4.93,1.7934141834437018 +146,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM39,-2.0,1.0099094360425362,1.89,0.13,-3.8899999999999997,1.8398550976680577 +147,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM40,-1.93,0.36120044198792783,1.82,0.13,-3.75,0.8019207071806846 +148,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM41,-1.03,0.36000012476185017,-0.42,0.03,-0.6100000000000001,0.8000001996189604 +149,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM42,0.23,0.36049300672398177,0.99,0.07,-0.76,0.8007888107583709 +150,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM43,-0.2,0.36001487301534085,0.42,0.03,-0.62,0.8000237968245454 +151,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM44,-1.63,0.3600704059249329,0.06,0.0,-1.69,0.8001126494798927 +152,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM45,-0.5,0.36000015704952115,1.06,0.07,-1.56,0.8000002512792339 +153,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM46,-1.8,0.3600000000432957,0.69,0.05,-2.49,0.8000000000692732 +154,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM25,-18.33,0.0,-0.09,0.01,-18.24,2.1562024272328912 +155,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM26,-0.42,0.0,-0.87,0.06,0.45,1.47 +156,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM27,0.01,0.0,1.56,0.11,-1.55,1.4700000000432958 +157,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM28,-0.79,0.0,1.18,0.08,-1.97,1.47 +158,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM29,-0.76,0.0,1.61,0.11,-2.37,1.4700000019788169 +159,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM30,1.17,0.0,2.76,0.19,-1.5899999999999999,1.470000786180085 +160,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM31,-0.13,0.0,1.96,0.14,-2.09,1.4700000037704035 +161,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM32,0.86,0.0,2.44,0.17,-1.58,1.4700000003770786 +162,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM33,2.61,0.0,2.96,0.21,-0.3500000000000001,1.47 +163,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM34,0.92,0.0,2.83,0.2,-1.9100000000000001,1.4700000002072229 +164,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM35,-0.79,0.0,0.87,0.06,-1.6600000000000001,1.4700000040648111 +165,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM36,-0.18,0.0,0.76,0.05,-0.94,1.470364134111271 +166,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM37,-1.11,0.0,1.45,0.1,-2.56,1.4855093562606103 +167,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM38,-2.75,0.0,1.03,0.07,-3.7800000000000002,1.470000000075239 +168,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM39,-1.22,0.0,1.89,0.13,-3.11,1.4720316802734248 +169,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM40,-2.18,0.0,1.82,0.13,-4.0,1.4704507142861698 +170,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM41,-2.25,0.0,-0.42,0.03,-1.83,2.073783637412573 +171,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM42,-1.16,0.0,0.99,0.07,-2.15,2.1392457493186248 +172,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM43,-2.93,0.0,0.42,0.03,-3.35,2.127134816087262 +173,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM44,-3.08,0.0,0.06,0.0,-3.14,2.00277016768521 +174,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM45,-1.52,0.0,1.06,0.07,-2.58,1.887624306191272 +175,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM46,-2.26,0.0,0.69,0.05,-2.9499999999999997,1.7097387894188616 diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..8196881c Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..23a72210 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..5573ea64 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..d652de3a Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..2e7a2bf4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..afd500a8 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/AbsoluteErrorPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..0e44e715 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..1c52f419 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..a36d7cfe Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/molecular_error_statistics_for_Physical_MM_methods.csv b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/molecular_error_statistics_for_Physical_MM_methods.csv new file mode 100644 index 00000000..686a5999 --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_MM/molecular_error_statistics_for_Physical_MM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +0,SM25,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004,0.29000000000000004 +3,SM28,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002,0.6700000000000002 +7,SM32,1.17,1.17,1.17,1.17,1.17,1.17 +20,SM45,1.48,1.48,1.48,1.48,1.48,1.48 +9,SM34,1.6,1.6,1.6,1.6,1.6,1.6 +5,SM30,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997,1.6899999999999997 +17,SM42,1.7,1.7,1.7,1.7,1.7,1.7 +11,SM36,1.79,1.79,1.79,1.79,1.79,1.79 +14,SM39,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998,1.8099999999999998 +6,SM31,1.94,1.94,1.94,1.94,1.94,1.94 +2,SM27,2.08,2.08,2.08,2.08,2.08,2.08 +18,SM43,2.19,2.19,2.19,2.19,2.19,2.19 +4,SM29,2.46,2.46,2.46,2.46,2.46,2.46 +8,SM33,2.49,2.49,2.49,2.49,2.49,2.49 +1,SM26,2.58,2.58,2.58,2.58,2.58,2.58 +16,SM41,2.81,2.81,2.81,2.81,2.81,2.81 +10,SM35,2.86,2.86,2.86,2.86,2.86,2.86 +12,SM37,2.88,2.88,2.88,2.88,2.88,2.88 +21,SM46,2.89,2.89,2.89,2.89,2.89,2.89 +13,SM38,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998,2.9699999999999998 +15,SM40,3.0,3.0,3.0,3.0,3.0,3.0 +19,SM44,3.52,3.52,3.52,3.52,3.52,3.52 diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..6b46d38a Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/MAE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..2fd46866 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv new file mode 100644 index 00000000..ab52fa14 --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/Physical_QM/molecular_error_statistics_for_Physical_QM_methods.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +8,SM33,0.9079999999999998,0.28600000000000014,1.7099999999999995,1.2139522231125899,0.30472938814627,1.884144368141677 +2,SM27,0.95,0.502,1.35,1.064903751519357,0.6421993459977984,1.3633854920747837 +1,SM26,1.018,0.404,1.7079999999999997,1.2651719250757978,0.4915689168366934,1.8569652662341316 +7,SM32,1.082,0.57,1.54,1.2168730418576952,0.7479572180278763,1.549877414507354 +16,SM41,1.1700000000000002,0.5780000000000001,1.762,1.3511402591885122,0.6340189271622733,1.8139349492195136 +10,SM35,1.218,0.404,2.032,1.562005121630528,0.74677975334097,2.1282434071318064 +6,SM31,1.2659999999999998,0.734,1.7560000000000002,1.3821070870232883,0.8930845424706442,1.813764041985616 +5,SM30,1.2879999999999998,0.9640000000000001,1.6139999999999997,1.3424902234280887,0.9802244640897307,1.6435145268600457 +3,SM28,1.302,0.766,1.7819999999999996,1.4254753593100091,0.9201195574489219,1.793694511336866 +17,SM42,1.348,0.30399999999999994,2.5340000000000003,1.8534076723700053,0.38042081961953655,2.8410878198323966 +4,SM29,1.4,0.6639999999999999,2.136,1.6300306745579975,0.8248151308020483,2.1525984298052436 +20,SM45,1.416,0.8559999999999999,2.012,1.582302120329743,0.9271030147723606,2.1453764238473396 +9,SM34,1.4200000000000004,0.8260000000000002,2.034,1.5846640022414846,0.8979198182465962,2.116246677492962 +11,SM36,1.486,0.828,2.2859999999999996,1.7107133015207427,0.8569013945606577,2.455056007507771 +19,SM44,1.564,0.818,2.3920000000000003,1.8267347919169876,0.9366002348921337,2.5824561951754377 +18,SM43,1.8059999999999998,0.788,2.824,2.1421157765162926,0.8823151364450231,2.9181603794171425 +21,SM46,1.9280000000000002,1.3519999999999999,2.5479999999999996,2.054273594242013,1.3932408262751994,2.6431799030712986 +12,SM37,2.0940000000000003,1.2440000000000002,2.8440000000000003,2.280539409876532,1.4817557153593166,2.850915642385793 +14,SM39,2.432,1.432,3.4019999999999997,2.6723622508933924,1.765548073545436,3.4432687957811248 +15,SM40,2.7,1.44,3.7980000000000005,3.0221714048015214,1.8936367127831042,3.8027647836804213 +13,SM38,2.8259999999999996,1.4259999999999997,4.206,3.2715592612697697,1.833701175219125,4.247792367807071 +0,SM25,5.132,1.5559999999999998,11.719999999999999,8.333695458798575,1.6073083089438691,14.18407557791483 diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf new file mode 100644 index 00000000..0a332cde Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/RMSE_vs_molecule_ID_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf new file mode 100644 index 00000000..54d62325 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_MAE_comparison_between_method_categories.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf new file mode 100644 index 00000000..91caf464 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_distribution_ridge_plot_all_methods.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_statistics.csv b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_statistics.csv new file mode 100644 index 00000000..2096cccd --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/MolecularStatisticsTables/molecular_error_statistics.csv @@ -0,0 +1,23 @@ +,Molecule ID,MAE,MAE_lower_CI,MAE_upper_CI,RMSE,RMSE_lower_CI,RMSE_upper_CI +7,SM32,1.0966666666666667,0.6649999999999999,1.4733333333333334,1.2091870547327792,0.8335266442452015,1.4932905499823759 +2,SM27,1.1383333333333332,0.64,1.611666666666667,1.2907685049354642,0.788056681548563,1.6873006055037536 +8,SM33,1.1716666666666666,0.4566666666666667,1.9066666666666665,1.503800740346495,0.6028266749240613,2.0726392192886185 +3,SM28,1.1966666666666665,0.7400000000000001,1.6466666666666667,1.3297117482121203,0.8297489580188296,1.6954891133042798 +1,SM26,1.2783333333333333,0.5666666666666668,2.0116666666666663,1.5631005512548874,0.7202545846203734,2.134510092113254 +5,SM30,1.3549999999999998,1.0483333333333331,1.6433333333333333,1.4063842528507868,1.0964184116172682,1.6637758262458313 +6,SM31,1.3783333333333332,0.9283333333333331,1.803333333333333,1.4896699858246008,1.0603065594440129,1.839388485339625 +17,SM42,1.4066666666666665,0.5299999999999999,2.395,1.8287336237589844,0.7760584170451775,2.647800974393657 +20,SM45,1.4266666666666667,0.9466666666666668,1.9366666666666668,1.5657160236347671,1.033440854621105,2.059401207471078 +16,SM41,1.4433333333333334,0.7766666666666667,2.1550000000000002,1.6844385810510674,0.9450573174857351,2.2968420349108323 +9,SM34,1.45,0.9283333333333336,1.961666666666667,1.5872302920496448,1.048665819029113,2.03542378879682 +10,SM35,1.4916666666666665,0.6133333333333334,2.3516666666666666,1.842955054615639,0.9673331036066807,2.422595027375947 +11,SM36,1.5366666666666664,0.9583333333333334,2.19,1.7241809649801842,1.037753984975887,2.3572795054186226 +4,SM29,1.5766666666666669,0.8983333333333333,2.2416666666666667,1.7952065804989314,1.1515858630601543,2.2601290523625712 +18,SM43,1.87,1.0216666666666667,2.7183333333333333,2.1501705358722907,1.2033633421927614,2.8099347560634453 +19,SM44,1.8900000000000003,0.9966666666666667,2.8166666666666664,2.201332929537617,1.125625751896843,2.9614242069202668 +21,SM46,2.088333333333333,1.498333333333333,2.6449999999999996,2.215562381578697,1.6281738236441463,2.687914309150002 +12,SM37,2.225,1.4833333333333334,2.8516666666666666,2.3909098686483357,1.728371102126701,2.8577234529137585 +14,SM39,2.328333333333333,1.471666666666667,3.123333333333333,2.548976918948725,1.720910611662713,3.248522741185599 +15,SM40,2.75,1.696666666666667,3.6633333333333336,3.0184874799585746,2.118541164732625,3.680276257746602 +13,SM38,2.85,1.5983333333333334,4.0,3.2232592201062578,2.065768622087188,4.11952262606563 +0,SM25,4.325,1.125,9.983333333333333,7.608509490476217,1.3362759196114151,12.965788959154523 diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf new file mode 100644 index 00000000..d3d7f840 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/EC_RISM_wet_+_EC_RISM_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf new file mode 100644 index 00000000..cf2c6fc1 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/QQplot_dict.pickle b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/QQplot_dict.pickle new file mode 100644 index 00000000..336b9b5b Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/QQplot_dict.pickle differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf new file mode 100644 index 00000000..3a68d0ba Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf new file mode 100644 index 00000000..ffb319b7 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf new file mode 100644 index 00000000..157a99d1 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_IEFPCM_MST_+_IEFPCM_MST_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf new file mode 100644 index 00000000..cc5bdc85 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/QQPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD_QQ.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot.pdf new file mode 100644 index 00000000..80ee2664 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..55c57fb4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..512045da Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..66a5ab70 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..62be66eb Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..f2e80364 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/MAE_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot.pdf new file mode 100644 index 00000000..e0721567 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..c735798d Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..f8b0c452 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..3f0a2312 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..329fe54e Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..2cbb73d4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/RMSE_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf new file mode 100644 index 00000000..71b1ac28 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..ffedff20 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..6157499f Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..fc324d25 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..44df2a6f Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..8b81feba Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/Rsquared_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf new file mode 100644 index 00000000..75af8a9c Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendall_tau_vs_method_plot_physical_methods_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf new file mode 100644 index 00000000..2d336b7f Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf new file mode 100644 index 00000000..455666c1 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_colored_by_method_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf new file mode 100644 index 00000000..cf27dcf5 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Empirical_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf new file mode 100644 index 00000000..2ea78d3f Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_MM_QM_LEC_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf new file mode 100644 index 00000000..29fec1b4 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/kendalls_tau_vs_method_plot_for_Physical_QM_category.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics.csv b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics.csv new file mode 100644 index 00000000..e1509ec1 --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics.csv @@ -0,0 +1,7 @@ +method name,file name,category,type,RMSE,RMSE_lower_bound,RMSE_upper_bound,MAE,MAE_lower_bound,MAE_upper_bound,ME,ME_lower_bound,ME_upper_bound,R2,R2_lower_bound,R2_upper_bound,m,m_lower_bound,m_upper_bound,kendall_tau,kendall_tau_lower_bound,kendall_tau_upper_bound,ES,ES_lower_bound,ES_upper_bound +TFE IEFPCM MST + IEFPCM/MST,logD-TFE_IEFPCM_MST-IEFPCM_MST,Physical (QM),Standard,1.2715702533053015,0.8434022010448352,1.6537313181793685,0.9818181818181817,0.6645454545454544,1.3454545454545452,0.2427272727272727,-0.2863636363636364,0.7372727272727272,0.5463858543100745,0.16286264126673888,0.8754787370411129,1.3073698729337788,0.7032222178017297,1.701379763950352,0.567099567099567,0.26785714285714285,0.817351598173516,1.1577031161316225,0.8721292412475603,1.2643797499896177 +EC_RISM_wet + EC_RISM,logD-EC_RISM_wet-EC_RISM,Physical (QM),Standard,1.6920267567194514,1.2971209377975235,2.054173136015383,1.4318181818181817,1.0631818181818178,1.8099999999999996,-1.4318181818181817,-1.8090909090909089,-1.0631818181818178,0.534429811315914,0.20619953446998862,0.7781713857310909,0.9505925394859067,0.5537738438706725,1.301232570142008,0.500004725964924,0.20861731638973616,0.7382568809658341,0.8359704860390108,0.6421086146988386,1.0257624209200273 +TFE-NHLBI-TZVP-QM + TZVP-QM,logD-TFE_NHLBI_TZVP_QM-TZVP_QM,Physical (QM),Standard,1.7173896682836056,1.2934889394052183,2.1175661415021643,1.470909090909091,1.1172727272727274,1.8568181818181815,1.2581818181818183,0.765909090909091,1.7436363636363637,0.2543523095779817,0.00881787114675428,0.6410365295529773,0.6390157606431108,0.061667558306207196,1.2386123454324558,0.37662337662337664,0.013698630136986302,0.702325581395349,0.047205725597685405,-0.0,0.18066918615114141 +TFE b3lypd3 + DFT_M05-2X_SMD,logD-TFE_b3lypd3-DFT_M05_2X_SMD,Physical (QM),Standard,2.1486274688740252,1.5496832520932082,2.7100897335026315,1.7799999999999998,1.3045454545454545,2.3027272727272727,1.7799999999999998,1.3045454545454545,2.3027272727272727,0.3192339234671041,0.045495831026378546,0.6677573604858824,0.8006777614443499,0.28484998262108907,1.30200264292266,0.4112554112554112,0.04761904761904762,0.7155963302752294,0.4241316776721072,0.2666500546811192,0.6762739316416796 +MD (CGenFF/TIP3P) + Gaussian_corrected,logD-MD_CGenFF_TIP3P-Gaussian_corrected,Physical (MM) + QM+LEC,Standard,2.2724266164769165,1.969237277360312,2.5466519910730567,2.130454545454546,1.7972727272727271,2.4450000000000003,1.843181818181818,1.2559090909090909,2.3354545454545454,0.6178907028584649,0.3420872154310394,0.84256681805185,1.5257966025220564,0.9080381486802956,2.1865254146219595,0.6190476190476191,0.3542600896860987,0.8272727272727273,0.8785110123620862,0.7408391821365783,0.9991001841161732 +TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water,Physical (QM),Standard,4.535362569449507,2.1001839746252537,7.140797638155761,2.9159090909090906,1.8931818181818179,4.54090909090909,2.875,1.813636363636364,4.52,0.24858890827216865,0.10378211799719224,0.7677710411011331,1.9172620428702987,0.5231078240969124,4.3473402898318,0.5466390301186075,0.21531100478468898,0.8044444444444444,0.5468665642260893,0.3830031701205754,0.7362016695045476 diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statisticsLaTex/statistics.tex b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statisticsLaTex/statistics.tex new file mode 100644 index 00000000..321c4151 --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statisticsLaTex/statistics.tex @@ -0,0 +1,48 @@ +\documentclass{article} +\usepackage[a4paper,margin=0.005in,tmargin=0.5in,lmargin=0.5in,rmargin=0.5in,landscape]{geometry} +\usepackage{booktabs} +\usepackage{longtable} +\pagenumbering{gobble} +\begin{document} +\begin{center} +\scriptsize +\begin{longtable}{|ccccccccc|} +\toprule + method name & file name & category & type & RMSE & MAE & ME & R$^2$ & m & $\tau$ & ES \\ +\midrule +\endhead +\midrule +\multicolumn{11}{r}{{Continued on next page}} \\ +\midrule +\endfoot + +\bottomrule +\endlastfoot + TFE IEFPCM MST + IEFPCM/MST & logD-TFE\_IEFPCM\_MST-IEFPCM\_MST & Physical (QM) & Standard & 1.27 [0.84, 1.65] & 0.98 [0.66, 1.35] & 0.24 [-0.29, 0.74] & 0.55 [0.16, 0.88] & 1.31 [0.70, 1.70] & 0.57 [0.27, 0.82] & 1.16 [0.87, 1.26] \\ + EC_RISM_wet + EC_RISM & logD-EC\_RISM\_wet-EC\_RISM & Physical (QM) & Standard & 1.69 [1.30, 2.05] & 1.43 [1.06, 1.81] & -1.43 [-1.81, -1.06] & 0.53 [0.21, 0.78] & 0.95 [0.55, 1.30] & 0.50 [0.21, 0.74] & 0.84 [0.64, 1.03] \\ + TFE-NHLBI-TZVP-QM + TZVP-QM & logD-TFE\_NHLBI\_TZVP\_QM-TZVP\_QM & Physical (QM) & Standard & 1.72 [1.29, 2.12] & 1.47 [1.12, 1.86] & 1.26 [0.77, 1.74] & 0.25 [0.01, 0.64] & 0.64 [0.06, 1.24] & 0.38 [0.01, 0.70] & 0.05 [-0.00, 0.18] \\ + TFE b3lypd3 + DFT_M05-2X_SMD & logD-TFE\_b3lypd3-DFT\_M05\_2X\_SMD & Physical (QM) & Standard & 2.15 [1.55, 2.71] & 1.78 [1.30, 2.30] & 1.78 [1.30, 2.30] & 0.32 [0.05, 0.67] & 0.80 [0.28, 1.30] & 0.41 [0.05, 0.72] & 0.42 [0.27, 0.68] \\ + MD (CGenFF/TIP3P) + Gaussian_corrected & logD-MD\_CGenFF\_TIP3P-Gaussian\_corrected & Physical (MM) + QM+LEC & Standard & 2.27 [1.97, 2.55] & 2.13 [1.80, 2.45] & 1.84 [1.26, 2.34] & 0.62 [0.34, 0.84] & 1.53 [0.91, 2.19] & 0.62 [0.35, 0.83] & 0.88 [0.74, 1.00] \\ + TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_... & logD-TFE\_SMD\_solvent\_opt-DFT\_M06\_2X\_SMD\... & Physical (QM) & Standard & 4.54 [2.10, 7.14] & 2.92 [1.89, 4.54] & 2.88 [1.81, 4.52] & 0.25 [0.10, 0.77] & 1.92 [0.52, 4.35] & 0.55 [0.22, 0.80] & 0.55 [0.38, 0.74] \\ +\end{longtable} +\end{center} + +Notes + +- RMSE: Root mean square error + +- MAE: Mean absolute error + +- ME: Mean error + +- R2: R-squared, square of Pearson correlation coefficient + +- m: slope of the line fit to predicted vs experimental logD values + +- $\tau$: Kendall rank correlation coefficient + +- ES: error slope calculated from the QQ Plots of model uncertainty predictions + +- Mean and 95\% confidence intervals of RMSE, MAE, ME, R2, and m were calculated by bootstrapping with 10000 samples. + +- 95\% confidence intervals of ES were calculated by bootstrapping with 1000 samples.\end{document} diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf new file mode 100644 index 00000000..1e82dd49 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/StatisticsTables/statistics_bootstrap_distributions.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/error_for_each_logD.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/error_for_each_logD.pdf new file mode 100644 index 00000000..1b0b7d89 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/error_for_each_logD.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..99479a16 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..38939b20 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..54fced9a Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..315205f9 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..7b4e652b Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..404bfc9c Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlots/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf new file mode 100644 index 00000000..719b7d43 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/EC_RISM_wet_+_EC_RISM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf new file mode 100644 index 00000000..95989215 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/MD_CGenFF_TIP3P_+_Gaussian_corrected.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf new file mode 100644 index 00000000..c353795a Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-NHLBI-TZVP-QM_+_TZVP-QM.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf new file mode 100644 index 00000000..81c90266 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE-SMD-solvent-opt_+_DFT_M06-2X_SMD_explicit_water.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf new file mode 100644 index 00000000..9ea43f68 Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_IEFPCM_MST_+_IEFPCM_MST.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf new file mode 100644 index 00000000..55d28a7d Binary files /dev/null and b/physical_property/logD/analysis_outputs_ranked_submissions/logDCorrelationPlotsWithSEM/TFE_b3lypd3_+_DFT_M05-2X_SMD.pdf differ diff --git a/physical_property/logD/analysis_outputs_ranked_submissions/logD_submission_collection.csv b/physical_property/logD/analysis_outputs_ranked_submissions/logD_submission_collection.csv new file mode 100644 index 00000000..78eb6652 --- /dev/null +++ b/physical_property/logD/analysis_outputs_ranked_submissions/logD_submission_collection.csv @@ -0,0 +1,133 @@ +,method_name,category,Molecule ID,logD (calc),logD SEM (calc),logD (exp),logD SEM (exp),$\Delta$logD error (calc - exp),logD model uncertainty +0,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM25,-0.38,0.18,-0.09,0.01,-0.29000000000000004,2.3999880195383225 +1,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM26,-3.45,0.11,-0.87,0.06,-2.58,2.3999886092463867 +2,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM27,-0.52,0.13,1.56,0.11,-2.08,2.3979742881675223 +3,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM28,1.85,0.13,1.18,0.08,0.6700000000000002,1.5 +4,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM29,-0.85,0.15,1.61,0.11,-2.46,2.3970721226093223 +5,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM30,1.07,0.14,2.76,0.19,-1.6899999999999997,2.3986675007101423 +6,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM31,0.02,0.13,1.96,0.14,-1.94,2.3983151936120537 +7,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM32,1.27,0.16,2.44,0.17,-1.17,2.39855103740242 +8,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM33,5.45,0.16,2.96,0.21,2.49,1.5 +9,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM34,1.23,0.14,2.83,0.2,-1.6,2.39892838869771 +10,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM35,-1.99,0.36,0.87,0.06,-2.86,2.399809083536587 +11,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM36,-1.03,0.6,0.76,0.05,-1.79,2.399648783489643 +12,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM37,-1.43,0.75,1.45,0.1,-2.88,2.3997498106923096 +13,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM38,-1.94,0.41,1.03,0.07,-2.9699999999999998,2.3995443305472417 +14,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM39,0.08,0.41,1.89,0.13,-1.8099999999999998,2.399608172679832 +15,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM40,-1.18,0.31,1.82,0.13,-3.0,2.3996953304337554 +16,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM41,-3.23,0.18,-0.42,0.03,-2.81,2.3999972682711226 +17,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM42,-0.71,0.13,0.99,0.07,-1.7,2.399997478807581 +18,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM43,-1.77,0.16,0.42,0.03,-2.19,2.399994399413065 +19,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM44,-3.46,0.22,0.06,0.0,-3.52,2.3999104647733165 +20,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM45,-0.42,0.41,1.06,0.07,-1.48,2.3998867609877292 +21,MD (CGenFF/TIP3P) + Gaussian_corrected,Physical (MM) + QM+LEC,SM46,-2.2,0.29,0.69,0.05,-2.89,2.3999411369110883 +22,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM25,1.21,0.0,-0.09,0.01,1.3,0.0 +23,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM26,0.01,0.0,-0.87,0.06,0.88,0.0 +24,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM27,0.27,0.0,1.56,0.11,-1.29,0.0 +25,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM28,-0.23,0.0,1.18,0.08,-1.41,0.0 +26,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM29,-0.03,0.0,1.61,0.11,-1.6400000000000001,0.0 +27,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM30,1.52,0.0,2.76,0.19,-1.2399999999999998,0.0 +28,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM31,0.76,0.0,1.96,0.14,-1.2,0.0 +29,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM32,0.77,0.0,2.44,0.17,-1.67,0.0 +30,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM33,3.12,0.0,2.96,0.21,0.16000000000000014,0.0 +31,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM34,1.89,0.0,2.83,0.2,-0.9400000000000002,0.0 +32,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM35,-1.7,0.0,0.87,0.06,-2.57,0.0 +33,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM36,-0.36,0.0,0.76,0.05,-1.12,0.0 +34,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM37,0.06,0.0,1.45,0.1,-1.39,0.0 +35,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM38,-2.58,0.0,1.03,0.07,-3.6100000000000003,0.0 +36,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM39,-0.48,0.0,1.89,0.13,-2.37,0.0 +37,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM40,-1.67,0.0,1.82,0.13,-3.49,0.0 +38,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM41,-0.74,0.0,-0.42,0.03,-0.32,0.0 +39,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM42,0.78,0.0,0.99,0.07,-0.20999999999999996,0.0 +40,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM43,-0.14,0.0,0.42,0.03,-0.56,0.0 +41,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM44,-1.73,0.0,0.06,0.0,-1.79,0.0 +42,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM45,-0.52,0.0,1.06,0.07,-1.58,0.0 +43,TFE-NHLBI-TZVP-QM + TZVP-QM,Physical (QM),SM46,-0.93,0.0,0.69,0.05,-1.62,0.0 +44,EC_RISM_wet + EC_RISM,Physical (QM),SM25,2.25,0.02344107623307046,-0.09,0.01,2.34,2.447871928239328 +45,EC_RISM_wet + EC_RISM,Physical (QM),SM26,0.51,0.023361138100292885,-0.87,0.06,1.38,2.43955836243046 +46,EC_RISM_wet + EC_RISM,Physical (QM),SM27,2.21,0.010000039446495069,1.56,0.11,0.6499999999999999,1.0500041024354871 +47,EC_RISM_wet + EC_RISM,Physical (QM),SM28,2.18,0.01,1.18,0.08,1.0000000000000002,1.05 +48,EC_RISM_wet + EC_RISM,Physical (QM),SM29,2.07,0.010000149489547966,1.61,0.11,0.45999999999999974,1.0500155469129884 +49,EC_RISM_wet + EC_RISM,Physical (QM),SM30,3.78,0.01000134536305702,2.76,0.19,1.02,1.05013991775793 +50,EC_RISM_wet + EC_RISM,Physical (QM),SM31,3.27,0.010000000433838209,1.96,0.14,1.31,1.0500000451191742 +51,EC_RISM_wet + EC_RISM,Physical (QM),SM32,2.59,0.010000027305881223,2.44,0.17,0.1499999999999999,1.0500028398116472 +52,EC_RISM_wet + EC_RISM,Physical (QM),SM33,5.27,0.01,2.96,0.21,2.3099999999999996,1.05 +53,EC_RISM_wet + EC_RISM,Physical (QM),SM34,5.2700000000000005,0.010000013696641566,2.83,0.2,2.4400000000000004,1.0500014244507228 +54,EC_RISM_wet + EC_RISM,Physical (QM),SM35,0.95,0.010000571108298259,0.87,0.06,0.07999999999999996,1.0500593952630188 +55,EC_RISM_wet + EC_RISM,Physical (QM),SM36,2.59,0.010001314864335895,0.76,0.05,1.8299999999999998,1.050136745890933 +56,EC_RISM_wet + EC_RISM,Physical (QM),SM37,2.14,0.0100001107628115,1.45,0.1,0.6900000000000002,1.0500115193323958 +57,EC_RISM_wet + EC_RISM,Physical (QM),SM38,2.29,0.010002027036018357,1.03,0.07,1.26,1.050210811745909 +58,EC_RISM_wet + EC_RISM,Physical (QM),SM39,4.12,0.010091902371153578,1.89,0.13,2.2300000000000004,1.0595578465999722 +59,EC_RISM_wet + EC_RISM,Physical (QM),SM40,3.61,0.01000134536305702,1.82,0.13,1.7899999999999998,1.05013991775793 +60,EC_RISM_wet + EC_RISM,Physical (QM),SM41,1.64,0.023142696321745242,-0.42,0.03,2.06,2.416840417461505 +61,EC_RISM_wet + EC_RISM,Physical (QM),SM42,4.44,0.023308600664929967,0.99,0.07,3.45,2.4340944691527167 +62,EC_RISM_wet + EC_RISM,Physical (QM),SM43,3.34,0.020713282071424373,0.42,0.03,2.92,2.164181335428135 +63,EC_RISM_wet + EC_RISM,Physical (QM),SM44,0.51,0.02169725306161642,0.06,0.0,0.45,2.266514318408108 +64,EC_RISM_wet + EC_RISM,Physical (QM),SM45,1.8,0.022575502951253247,1.06,0.07,0.74,2.357852306930338 +65,EC_RISM_wet + EC_RISM,Physical (QM),SM46,1.63,0.020713297459104524,0.69,0.05,0.94,2.1641829357468705 +66,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM25,1.45,0.0,-0.09,0.01,1.54,1.62736237955926 +67,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM26,-3.13,0.0,-0.87,0.06,-2.26,2.506510219032152 +68,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM27,1.75,0.0,1.56,0.11,0.18999999999999995,1.0600000003220549 +69,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM28,0.83,0.0,1.18,0.08,-0.35,1.06 +70,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM29,1.23,0.0,1.61,0.11,-0.3800000000000001,1.060000014275805 +71,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM30,3.53,0.0,2.76,0.19,0.77,1.0600001445262448 +72,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM31,1.61,0.0,1.96,0.14,-0.34999999999999987,1.0600003022939144 +73,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM32,1.63,0.0,2.44,0.17,-0.81,1.0600000019066798 +74,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM33,4.27,0.0,2.96,0.21,1.3099999999999996,1.06 +75,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM34,2.39,0.0,2.83,0.2,-0.43999999999999995,1.0600002390356282 +76,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM35,0.77,0.0,0.87,0.06,-0.09999999999999998,1.060003863243295 +77,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM36,3.73,0.0,0.76,0.05,2.9699999999999998,1.060517639767404 +78,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM37,-1.31,0.0,1.45,0.1,-2.76,2.508112831230342 +79,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM38,0.48,0.0,1.03,0.07,-0.55,1.0600307528312118 +80,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM39,2.45,0.0,1.89,0.13,0.5600000000000003,1.062395960435093 +81,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM40,1.35,0.0,1.82,0.13,-0.47,1.0890629521889412 +82,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM41,-1.45,0.0,-0.42,0.03,-1.03,2.496029608152775 +83,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM42,1.16,0.0,0.99,0.07,0.16999999999999993,2.5024653266494132 +84,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM43,-1.16,0.0,0.42,0.03,-1.5799999999999998,2.50719309548738 +85,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM44,-0.6900000000000001,0.0,0.06,0.0,-0.75,1.7782049976993068 +86,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM45,1.68,0.0,1.06,0.07,0.6199999999999999,1.5093985461690262 +87,TFE IEFPCM MST + IEFPCM/MST,Physical (QM),SM46,-0.95,0.0,0.69,0.05,-1.64,2.473467961947144 +88,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM25,-2.33,1.0438236634602829,-0.09,0.01,-2.24,1.8941178615364531 +89,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM26,-0.99,0.6411256089625921,-0.87,0.06,-0.12,1.2498009743401473 +90,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM27,0.49,0.3600129619091414,1.56,0.11,-1.07,0.8000207390546263 +91,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM28,-0.6,0.36,1.18,0.08,-1.7799999999999998,0.8 +92,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM29,-0.54,0.3625241056999536,1.61,0.11,-2.1500000000000004,0.8040385691199258 +93,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM30,0.94,0.360471392816815,2.76,0.19,-1.8199999999999998,0.8007542285069041 +94,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM31,0.58,0.3600000000237929,1.96,0.14,-1.38,0.8000000000380687 +95,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM32,1.24,0.3600282398376334,2.44,0.17,-1.2,0.8000451837402134 +96,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM33,2.55,0.36,2.96,0.21,-0.41000000000000014,0.8 +97,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM34,1.46,0.3600002991428096,2.83,0.2,-1.37,0.8000004786284954 +98,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM35,-0.81,0.36000043227744816,0.87,0.06,-1.6800000000000002,0.8000006916439171 +99,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM36,0.19,0.3600807198221639,0.76,0.05,-0.5700000000000001,0.8001291517154623 +100,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM37,-1.62,0.3600049444450307,1.45,0.1,-3.0700000000000003,0.8000079111120492 +101,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM38,-3.9,0.9808838646523136,1.03,0.07,-4.93,1.7934141834437018 +102,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM39,-2.0,1.0099094360425362,1.89,0.13,-3.8899999999999997,1.8398550976680577 +103,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM40,-1.93,0.36120044198792783,1.82,0.13,-3.75,0.8019207071806846 +104,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM41,-1.03,0.36000012476185017,-0.42,0.03,-0.6100000000000001,0.8000001996189604 +105,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM42,0.23,0.36049300672398177,0.99,0.07,-0.76,0.8007888107583709 +106,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM43,-0.2,0.36001487301534085,0.42,0.03,-0.62,0.8000237968245454 +107,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM44,-1.63,0.3600704059249329,0.06,0.0,-1.69,0.8001126494798927 +108,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM45,-0.5,0.36000015704952115,1.06,0.07,-1.56,0.8000002512792339 +109,TFE b3lypd3 + DFT_M05-2X_SMD,Physical (QM),SM46,-1.8,0.3600000000432957,0.69,0.05,-2.49,0.8000000000692732 +110,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM25,-18.33,0.0,-0.09,0.01,-18.24,2.1562024272328912 +111,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM26,-0.42,0.0,-0.87,0.06,0.45,1.47 +112,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM27,0.01,0.0,1.56,0.11,-1.55,1.4700000000432958 +113,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM28,-0.79,0.0,1.18,0.08,-1.97,1.47 +114,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM29,-0.76,0.0,1.61,0.11,-2.37,1.4700000019788169 +115,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM30,1.17,0.0,2.76,0.19,-1.5899999999999999,1.470000786180085 +116,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM31,-0.13,0.0,1.96,0.14,-2.09,1.4700000037704035 +117,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM32,0.86,0.0,2.44,0.17,-1.58,1.4700000003770786 +118,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM33,2.61,0.0,2.96,0.21,-0.3500000000000001,1.47 +119,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM34,0.92,0.0,2.83,0.2,-1.9100000000000001,1.4700000002072229 +120,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM35,-0.79,0.0,0.87,0.06,-1.6600000000000001,1.4700000040648111 +121,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM36,-0.18,0.0,0.76,0.05,-0.94,1.470364134111271 +122,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM37,-1.11,0.0,1.45,0.1,-2.56,1.4855093562606103 +123,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM38,-2.75,0.0,1.03,0.07,-3.7800000000000002,1.470000000075239 +124,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM39,-1.22,0.0,1.89,0.13,-3.11,1.4720316802734248 +125,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM40,-2.18,0.0,1.82,0.13,-4.0,1.4704507142861698 +126,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM41,-2.25,0.0,-0.42,0.03,-1.83,2.073783637412573 +127,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM42,-1.16,0.0,0.99,0.07,-2.15,2.1392457493186248 +128,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM43,-2.93,0.0,0.42,0.03,-3.35,2.127134816087262 +129,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM44,-3.08,0.0,0.06,0.0,-3.14,2.00277016768521 +130,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM45,-1.52,0.0,1.06,0.07,-2.58,1.887624306191272 +131,TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water,Physical (QM),SM46,-2.26,0.0,0.69,0.05,-2.9499999999999997,1.7097387894188616 diff --git a/physical_property/logD/calculate_logD/README.md b/physical_property/logD/calculate_logD/README.md new file mode 100644 index 00000000..a54e5f38 --- /dev/null +++ b/physical_property/logD/calculate_logD/README.md @@ -0,0 +1,6 @@ +## What's here + +- [`calc_logD.nb`](calc_logD.nb) - Wolfram Mathematica notebook used to calculate distribution coefficients (log *D*7.4) found in [`logD_submission_collection.csv`](logD_submission_collection.csv). The notebook takes the partition coefficients (log *P*) and acidity constants from the SAMPL7 log *P* and pKa submission collection files and calculates the distribution coefficients according to the model described in https://doi.org/10.1007/s10822-016-9954-8. Because the charge state information for the protonated and deprotonated species isn't available for the individual models, eq. 4 for acids was used for all compounds. This choice was made because of the broad consensus of different pKa models that had also submitted macrostate pKa values including the charge state information. The distribution coefficients are only calculated for combinations of ranked models submitted by participants from the same participant organization. Notebook created by Nicolas Tielker. Notebook output was used as the basis for the general SAMPL anaysis. +- [`logD_submission_collection.csv`](logD_submission_collection.csv) - Contains analysis of log *D*7.4 predictions generated from log *P* and pKa predictions. +- [`logD_predictions/`](logD_predictions/) - Contains SAMPL style submission files created from the log *D* data found in [`logD_submission_collection.csv`](logD_submission_collection.csv), also contains two reference predictions. Files were generated using a notebook like [`../analysis_different_pKa_logP_combos/make_input.ipynb`](../analysis_different_pKa_logP_combos/make_input.ipynb) These were used as input for the general SAMPL analysis scripts found in the directory above. +- [`user-map.csv`](user-map.csv) - manually created user map of all log *D* estimate files. Used as input for the general SAMPL analysis scripts. diff --git a/physical_property/logD/calc_logD.nb b/physical_property/logD/calculate_logD/calc_logD.nb similarity index 69% rename from physical_property/logD/calc_logD.nb rename to physical_property/logD/calculate_logD/calc_logD.nb index 5c39d81d..93afc67a 100644 --- a/physical_property/logD/calc_logD.nb +++ b/physical_property/logD/calculate_logD/calc_logD.nb @@ -1,19 +1,12 @@ -(* Content-type: application/vnd.wolfram.mathematica *) - -(*** Wolfram Notebook File ***) -(* http://www.wolfram.com/nb *) - -(* CreatedBy='Mathematica 12.0' *) - (*CacheID: 234*) (* Internal cache information: NotebookFileLineBreakTest NotebookFileLineBreakTest -NotebookDataPosition[ 158, 7] -NotebookDataLength[ 41761, 958] -NotebookOptionsPosition[ 40559, 932] -NotebookOutlinePosition[ 41101, 951] -CellTagsIndexPosition[ 41058, 948] +NotebookDataPosition[ 0, 0] +NotebookDataLength[ 54914, 1248] +NotebookOptionsPosition[ 49221, 1172] +NotebookOutlinePosition[ 49711, 1190] +CellTagsIndexPosition[ 49668, 1187] WindowFrame->Normal*) (* Beginning of Notebook Content *) @@ -23,37 +16,47 @@ Cell[BoxData[{ RowBox[{"SetDirectory", "[", RowBox[{"NotebookDirectory", "[", "]"}], "]"}], ";"}], "\n", RowBox[{ - RowBox[{"exp", "=", - RowBox[{ + RowBox[{ + RowBox[{"exp", "=", RowBox[{ - "Import", "[", - "\"\<../experimental_data/Experimental_Properties_of_SAMPL7_Compounds.\ -csv\>\"", "]"}], "[", - RowBox[{"[", RowBox[{ - RowBox[{"2", ";;"}], ",", - RowBox[{"{", - RowBox[{"1", ",", "7", ",", "8"}], "}"}]}], "]"}], "]"}]}], - ";"}], "\[IndentingNewLine]", - RowBox[{ + "Import", "[", + "\"\<../../experimental_data/Experimental_Properties_of_SAMPL7_\ +Compounds.csv\>\"", "]"}], "[", + RowBox[{"[", + RowBox[{ + RowBox[{"2", ";;"}], ",", + RowBox[{"{", + RowBox[{"1", ",", "7", ",", "8"}], "}"}]}], "]"}], "]"}]}], ";"}], + "\[IndentingNewLine]", RowBox[{"(*", RowBox[{ "Set", " ", "active", " ", "directory", " ", "and", " ", "import", " ", - "experimental", " ", "logD", " ", "data"}], "*)"}]}]}], "Input", + "experimental", " ", "logD", " ", "data"}], + "*)"}]}], "\[IndentingNewLine]", "exp"}], "Input", CellChangeTimes->{{3.8115299356460733`*^9, 3.811529955884295*^9}, { 3.822720374371791*^9, 3.822720408954632*^9}, {3.823066797094819*^9, 3.8230667997537003`*^9}, {3.8230669395343513`*^9, 3.823066946458387*^9}, { 3.8235820182085996`*^9, 3.8235820192650223`*^9}, {3.8250565297613287`*^9, - 3.82505655441084*^9}}, + 3.82505655441084*^9}, {3.826916292123234*^9, 3.826916300273614*^9}}, CellLabel->"In[1]:=",ExpressionUUID->"e07bf09a-837b-4f9f-8ae3-5348be5ddf17"], +Cell[BoxData[""], "Input", + CellChangeTimes->{3.82691631945011*^9}, + NumberMarks->False,ExpressionUUID->"aabee575-ae6b-41e4-b5f0-da1db60272a3"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826916317330329*^9, + 3.82691631780154*^9}},ExpressionUUID->"78eb6dc2-16ad-4001-9098-\ +f6012ebe6827"], + Cell[BoxData[{ RowBox[{ RowBox[{"pka", "=", RowBox[{ RowBox[{ "Import", "[", - "\"\<../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_\ + "\"\<../../pKa/analysis/macrostate_analysis/analysis_outputs_ranked_\ submissions/pKa_submission_collection.csv\>\"", "]"}], "[", RowBox[{"[", RowBox[{"All", ",", @@ -61,20 +64,12 @@ submissions/pKa_submission_collection.csv\>\"", "]"}], "[", RowBox[{ "1", ",", "2", ",", "3", ",", "4", ",", "5", ",", "6", ",", "7", ",", "11"}], "}"}]}], "]"}], "]"}]}], ";"}], "\[IndentingNewLine]", - RowBox[{ - RowBox[{"pka", "=", - RowBox[{"GatherBy", "[", - RowBox[{"pka", ",", - RowBox[{ - RowBox[{"#", "[", - RowBox[{"[", "2", "]"}], "]"}], "&"}]}], "]"}]}], - ";"}], "\[IndentingNewLine]", RowBox[{ RowBox[{"logP", "=", RowBox[{ RowBox[{ "Import", "[", - "\"\<../logP/analysis/analysis_outputs_ranked_submissions/logP_\ + "\"\<../../logP/analysis/analysis_outputs_ranked_submissions/logP_\ submission_collection.csv\>\"", "]"}], "[", RowBox[{"[", RowBox[{"All", ",", @@ -82,6 +77,14 @@ submission_collection.csv\>\"", "]"}], "[", RowBox[{ "1", ",", "2", ",", "3", ",", "4", ",", "5", ",", "6", ",", "7", ",", "11"}], "}"}]}], "]"}], "]"}]}], ";"}], "\[IndentingNewLine]", + RowBox[{ + RowBox[{"pka", "=", + RowBox[{"GatherBy", "[", + RowBox[{"pka", ",", + RowBox[{ + RowBox[{"#", "[", + RowBox[{"[", "2", "]"}], "]"}], "&"}]}], "]"}]}], + ";"}], "\[IndentingNewLine]", RowBox[{ RowBox[{"logP", "=", RowBox[{"GatherBy", "[", @@ -143,10 +146,28 @@ submission_collection.csv\>\"", "]"}], "[", 3.823074787720675*^9}, 3.8230791607452707`*^9, {3.8236631253101797`*^9, 3.823663148058075*^9}, {3.8236937160189795`*^9, 3.8236938249806705`*^9}, { 3.8250570116931624`*^9, 3.8250570303660994`*^9}, {3.8250570615801506`*^9, - 3.8250571382860146`*^9}, {3.8250577383213468`*^9, 3.825057740022767*^9}}, - CellLabel->"In[3]:=",ExpressionUUID->"8416a644-c7b6-49ed-b9a7-00441e0da565"], + 3.8250571382860146`*^9}, {3.8250577383213468`*^9, 3.825057740022767*^9}, { + 3.826916324033813*^9, 3.826916330881905*^9}, {3.8269169221293173`*^9, + 3.826916968353787*^9}, {3.826920451822174*^9, 3.826920453142448*^9}, { + 3.826920630361574*^9, 3.826920633481082*^9}, {3.826920822624713*^9, + 3.826920853585017*^9}, {3.826920899640647*^9, 3.826920907016102*^9}, { + 3.826920950054283*^9, 3.8269209575036993`*^9}, {3.826920987786429*^9, + 3.826921020937668*^9}, {3.826921053563636*^9, 3.8269210759462643`*^9}, { + 3.8269211819890957`*^9, 3.826921190881535*^9}, {3.826921313837212*^9, + 3.8269213145074587`*^9}, {3.8269213497653437`*^9, 3.826921350515676*^9}, { + 3.8269294038804893`*^9, 3.8269294166358128`*^9}}, + CellLabel-> + "In[125]:=",ExpressionUUID->"8416a644-c7b6-49ed-b9a7-00441e0da565"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826916351675564*^9, 3.82691637213054*^9}, + 3.826929432951046*^9},ExpressionUUID->"7e9772ff-a5eb-4fea-abe4-\ +9be5751b52ec"], -Cell[CellGroupData[{ +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826929431090887*^9, + 3.826929431091745*^9}},ExpressionUUID->"980770e8-8c7c-4c9b-9c96-\ +921e9bac7453"], Cell[BoxData[{ RowBox[{ @@ -214,38 +235,10 @@ Cell[BoxData[{ 3.82368752851817*^9, 3.8236875342090707`*^9}, {3.8250572677204437`*^9, 3.8250573166948385`*^9}, {3.825057354982473*^9, 3.8250573764850693`*^9}, { 3.825057420355569*^9, 3.825057485812785*^9}, {3.825061458999426*^9, - 3.8250614795426702`*^9}},ExpressionUUID->"8379e00e-31fd-45ee-947a-\ -f7bdd9f9030f"], - -Cell[BoxData[ - TagBox[ - TagBox[GridBox[{ - {"\<\"logP_prediction_Iorga_Beckstein_CGenFF/pKa_prediction_Iorga_\ -Beckstein_1\"\>"}, - {"\<\"logP-IEFPCMMST-1/pKa-IEFPCMMST-1\"\>"}, - {"\<\"logP-ECRISM-1/pKa-ECRISM-1\"\>"}, - {"\<\"logp-nhlbi-1/pka-nhlbi-1c\"\>"}, - {"\<\"logP_RodriguezPaluch_SMD_2/pKa_RodriguezPaluch_SMD_1\"\>"}, - {"\<\"logP-EvrimArslan-6/pKa-VA-2-charge-correction\"\>"} - }, - GridBoxAlignment->{"Columns" -> {{Left}}, "Rows" -> {{Baseline}}}, - GridBoxSpacings->{"Columns" -> { - Offset[0.27999999999999997`], { - Offset[0.5599999999999999]}, - Offset[0.27999999999999997`]}, "Rows" -> { - Offset[0.2], { - Offset[0.4]}, - Offset[0.2]}}], - Column], - Function[BoxForm`e$, - TableForm[BoxForm`e$]]]], "Output", - CellChangeTimes->{{3.825057364814217*^9, 3.825057376783749*^9}, { - 3.8250574223433933`*^9, 3.8250574722635593`*^9}, 3.8250589329926434`*^9, - 3.825060262803254*^9}, + 3.8250614795426702`*^9}, {3.826916395563149*^9, 3.826916400235808*^9}, { + 3.826918126179839*^9, 3.826918131457*^9}, 3.8269182552180223`*^9}, CellLabel-> - "Out[12]//TableForm=",ExpressionUUID->"1205d5b1-89f5-45e5-a9c1-\ -06eb476de438"] -}, Open ]], + "In[282]:=",ExpressionUUID->"8379e00e-31fd-45ee-947a-f7bdd9f9030f"], Cell[BoxData[{ RowBox[{ @@ -336,7 +329,8 @@ Cell[BoxData[{ "]"}], ",", RowBox[{"{", RowBox[{"ii", ",", - RowBox[{"Length", "@", "logD"}]}], "}"}]}], "]"}]}], ";"}], "\n", + RowBox[{"Length", "@", "logD"}]}], "}"}]}], "]"}]}], + ";"}], "\[IndentingNewLine]", RowBox[{ RowBox[{"(*", RowBox[{ @@ -399,8 +393,10 @@ Cell[BoxData[{ 3.8239415413394456`*^9}, {3.825057597050869*^9, 3.8250576724434853`*^9}, { 3.8250577829726515`*^9, 3.825057784093907*^9}, {3.8250578468296676`*^9, 3.8250578568350163`*^9}, 3.8250579676140537`*^9, {3.825058059146756*^9, - 3.8250580849243584`*^9}, {3.8250586839387293`*^9, 3.825058691180912*^9}}, - CellLabel->"In[13]:=",ExpressionUUID->"9f3db9fc-4bd0-4901-a278-800e3ef14180"], + 3.8250580849243584`*^9}, {3.8250586839387293`*^9, 3.825058691180912*^9}, { + 3.8269164096988907`*^9, 3.826916446521607*^9}}, + CellLabel-> + "In[285]:=",ExpressionUUID->"9f3db9fc-4bd0-4901-a278-800e3ef14180"], Cell[BoxData[{ RowBox[{ @@ -554,6 +550,7 @@ Cell[BoxData[{ " ", "the", " ", "other", " ", "branch", " ", "only", " ", "for", " ", RowBox[{"\"\\"", "."}]}]}], "*)"}], "\[IndentingNewLine]", + RowBox[{ RowBox[{ RowBox[{"logD", "[", @@ -604,6 +601,7 @@ Cell[BoxData[{ RowBox[{"[", RowBox[{"ii", ",", "2"}], "]"}], "]"}], ",", "jj", ",", "6"}], "]"}], "]"}], "-", "7.4"}], ")"}]}], ")"}]}], "/", + RowBox[{"(", RowBox[{"1", "+", RowBox[{"10", "^", @@ -645,6 +643,7 @@ Cell[BoxData[{ RowBox[{"[", RowBox[{"ii", ",", "2"}], "]"}], "]"}], ",", "jj", ",", "6"}], "]"}], "]"}], "-", "7.4"}], ")"}]}], ")"}]}], "/", + RowBox[{"(", RowBox[{"1", "+", RowBox[{"10", "^", @@ -771,9 +770,14 @@ Cell[BoxData[{ 3.8250588568068843`*^9}, {3.8250589120544124`*^9, 3.825059012864532*^9}, { 3.8250591321885853`*^9, 3.825059141005411*^9}, {3.825061624401218*^9, 3.8250617547800684`*^9}, {3.8250636546437287`*^9, - 3.8250636672786117`*^9}, {3.825063698811824*^9, - 3.8250637007755604`*^9}},ExpressionUUID->"8dcda01e-2c9e-4dc5-8077-\ -5ebe246709ad"], + 3.8250636672786117`*^9}, {3.825063698811824*^9, 3.8250637007755604`*^9}}, + CellLabel-> + "In[288]:=",ExpressionUUID->"8dcda01e-2c9e-4dc5-8077-5ebe246709ad"], + +Cell[BoxData["0.8000000000692732`"], "Input", + NumberMarks->False, + CellLabel-> + "In[106]:=",ExpressionUUID->"8c5188f9-101e-4c8a-96a9-05815eea9909"], Cell[BoxData[{ RowBox[{ @@ -895,14 +899,14 @@ Cell[BoxData[{ 3.8250587265442257`*^9}, {3.8250588359787335`*^9, 3.8250588568068843`*^9}, {3.8250589120544124`*^9, 3.825059012864532*^9}, { 3.8250591321885853`*^9, 3.825059141005411*^9}, {3.8250599290648427`*^9, - 3.8250600019288597`*^9}, {3.825060669366186*^9, - 3.825060676597318*^9}},ExpressionUUID->"b8094c35-5837-436f-8f85-\ -46a68c4f4313"], + 3.8250600019288597`*^9}, {3.825060669366186*^9, 3.825060676597318*^9}}, + CellLabel-> + "In[291]:=",ExpressionUUID->"b8094c35-5837-436f-8f85-46a68c4f4313"], Cell[BoxData[ RowBox[{ RowBox[{"Export", "[", - RowBox[{"\"\<./analysis/logD_submission_collection.csv\>\"", ",", + RowBox[{"\"\<./logD_submission_collection_ranked_only.csv\>\"", ",", RowBox[{"Prepend", "[", RowBox[{"logD", ",", RowBox[{"{", @@ -927,17 +931,252 @@ Cell[BoxData[ 3.8236950204267473`*^9}, {3.823937660332825*^9, 3.8239376630256453`*^9}, { 3.8239380771485143`*^9, 3.8239380793477545`*^9}, 3.8239392401103687`*^9, { 3.8239454404906864`*^9, 3.8239454533620615`*^9}, {3.825058923544533*^9, - 3.825058923791282*^9}, {3.8250600067264957`*^9, 3.8250600068515368`*^9}}, - CellLabel->"In[24]:=",ExpressionUUID->"a980ffde-5f1c-41eb-859a-645f6f681bab"] + 3.825058923791282*^9}, {3.8250600067264957`*^9, 3.8250600068515368`*^9}, + 3.826916577906931*^9, {3.826916669386735*^9, 3.826916677209249*^9}}, + CellLabel-> + "In[296]:=",ExpressionUUID->"a980ffde-5f1c-41eb-859a-645f6f681bab"], + +Cell[BoxData[""], "Input",ExpressionUUID->"3bff58ae-3b3c-448e-abda-55839fc41921"], + +Cell[TextData[StyleBox["", + FontSize->36, + Background->RGBColor[0.87, 0.94, 1]]], "Text", + CellChangeTimes->{{3.82692946933809*^9, 3.826929510173562*^9}, + 3.827591933120921*^9},ExpressionUUID->"df58675c-2edd-40a5-b990-\ +47cdab947cbc"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826929586208441*^9, 3.8269295919606524`*^9}, { + 3.827006998276416*^9, 3.8270069985993853`*^9}, {3.827007093184782*^9, + 3.827007103592743*^9}, + 3.827591931657317*^9},ExpressionUUID->"b74e55a7-d1b9-41c4-9a63-\ +38a9439472dd"], + +Cell[BoxData[""], "WolframAlphaShortInput", + CellChangeTimes->{ + 3.827592059939522*^9},ExpressionUUID->"f0339fc1-fa06-4372-a6e5-\ +26969dd5c3e8"], + +Cell[BoxData[""], "Input",ExpressionUUID->"265ab251-10cf-43c9-aab4-8efde0bfdb91"], + +Cell[BoxData["\[IndentingNewLine]"], "Input", + CellChangeTimes->{ + 3.826929618228241*^9, {3.8270071913292627`*^9, 3.827007298119886*^9}, + 3.827591935610704*^9},ExpressionUUID->"46025181-a9bc-49f3-8c42-\ +8aac84ff2e6c"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8269296598855467`*^9, {3.827591937383362*^9, + 3.827591939544375*^9}},ExpressionUUID->"a76dfec1-ca21-4cd9-a015-\ +8f76f7008849"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.8269296770043583`*^9, + 3.82759194232228*^9},ExpressionUUID->"63b0d55f-6fbb-4f3b-a9af-9ce1536327eb"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.826929694317877*^9, + 3.827591945385524*^9},ExpressionUUID->"5d8e94f4-5668-40d1-9dc1-\ +68714914e083"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826929708040038*^9, 3.826929711902231*^9}, { + 3.827007367019712*^9, 3.8270073714260073`*^9}, + 3.827591947774633*^9},ExpressionUUID->"61e9f7d8-e1c9-4123-b3ee-\ +977760a163c9"], + +Cell[TextData[StyleBox["", + FontSize->36, + Background->RGBColor[0.87, 0.94, 1]]], "Text", + CellChangeTimes->{{3.826929518730135*^9, 3.826929556051631*^9}, + 3.8275919498748503`*^9},ExpressionUUID->"7c9679ce-82d8-4b8f-9256-\ +ea3dfae70fd8"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8269297969527893`*^9, {3.826929858514142*^9, 3.82692993363743*^9}, + 3.827591952664847*^9},ExpressionUUID->"1f1448ea-1f7b-49fb-9dd0-\ +f0ed771a638c"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.826929981860368*^9, + 3.827591955290135*^9},ExpressionUUID->"12f8e219-168f-46d6-a588-\ +2743451d1ab3"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8269298290777283`*^9, {3.8269299664422903`*^9, 3.826930004416153*^9}, + 3.827591957455488*^9},ExpressionUUID->"36ec0f60-2eec-4b05-834c-\ +d58a6c424efd"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.8269300283551903`*^9, + 3.827591961371389*^9},ExpressionUUID->"1225d561-d4ed-4bda-ac1c-\ +24e5da6752e8"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.826930049749216*^9, + 3.827591964659107*^9},ExpressionUUID->"699b63f0-f4a6-4e42-874f-\ +e8086b422b95"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.8269300644437943`*^9, + 3.827591967590932*^9},ExpressionUUID->"3af13b5b-adb4-4d97-a1a4-\ +aaf6f5722c40"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826930080242112*^9, 3.826930089844138*^9}, + 3.827591970347205*^9},ExpressionUUID->"18265f42-a7f8-465b-aeb8-\ +81ca2ed8e36f"], + +Cell[BoxData[""], "Input",ExpressionUUID->"8cbe8342-e134-4e40-a186-d1641776e002"], + +Cell[TextData[StyleBox["", + FontSize->36, + Background->RGBColor[0.87, 0.94, 1]]], "Text", + CellChangeTimes->{{3.8269301792476*^9, 3.8269302645365353`*^9}, { + 3.8269303226785107`*^9, 3.826930325878237*^9}, + 3.8275919720139837`*^9},ExpressionUUID->"2e9913c3-b57a-4d79-a001-\ +6d1b80c2245d"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.826930173670493*^9},ExpressionUUID->"bd87e015-ebc9-4825-abca-\ +929462561f38"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8269303496872997`*^9, {3.8269306141232*^9, 3.826930641389192*^9}, + 3.826930774504507*^9, + 3.8275919750871353`*^9},ExpressionUUID->"f520b937-a4ad-4e95-bcdd-\ +3cf8940f10d8"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826930383782683*^9, 3.8269304129501677`*^9}, + 3.8269304950810623`*^9, 3.826930569798012*^9, 3.826930702634601*^9, { + 3.826930744221424*^9, 3.8269307489997787`*^9}, 3.826930819371628*^9, + 3.826930861992489*^9, + 3.827591978615923*^9},ExpressionUUID->"bc35d8b7-1fb3-4d17-92d7-\ +152d4199118e"], + +Cell[BoxData[ + TagBox[ + TagBox["\[Placeholder]", + Column], + Function[BoxForm`e$, + TableForm[BoxForm`e$]]]], "Input", + CellChangeTimes->{ + 3.826930616339159*^9, {3.827591977071006*^9, + 3.827591984437624*^9}},ExpressionUUID->"d3b17180-80aa-441f-81fe-\ +e449de576492"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.826930667719997*^9, {3.827591980876238*^9, + 3.827591988529654*^9}},ExpressionUUID->"88753f31-4a3c-4b17-a7b2-\ +777b8f5d4975"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.826930394157631*^9, 3.826930414157803*^9}, { + 3.826930811224065*^9, 3.8269308115884323`*^9}, + 3.827591982325759*^9},ExpressionUUID->"ea5e54d0-cef1-4d86-8dd7-\ +aa13f0a050ed"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.826930362368541*^9, {3.82693087895774*^9, 3.826930907486905*^9}, { + 3.826980256555278*^9, 3.8269803641956987`*^9}, + 3.827591991165473*^9},ExpressionUUID->"2011a0b8-be94-408d-8d4b-\ +2d8fdf6123f3"], + +Cell["", "Text", + CellChangeTimes->{{3.827015668180274*^9, 3.8270156697732897`*^9}, + 3.8275919953422937`*^9},ExpressionUUID->"55df889b-8a33-4617-b027-\ +4e0ca9ba3288"], + +Cell[BoxData[""], "Input",ExpressionUUID->"409b74f1-b510-426e-aa9e-f07ee9ffeac6"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8270156553239403`*^9, {3.827015762617956*^9, 3.82701588995434*^9}, { + 3.827233801028504*^9, 3.827233809515671*^9}, + 3.8275919979394903`*^9},ExpressionUUID->"4c12c169-d766-4d50-99b4-\ +865df32d95b5"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.827592021739484*^9, + 3.827592021740005*^9}},ExpressionUUID->"bfdee08f-6ea5-4c6d-bf72-\ +da5c7e3496b6"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.827233769910811*^9, + 3.827592019688282*^9},ExpressionUUID->"100f10b2-8625-4bca-9fde-\ +9a519fc0ec9f"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.827592001039926*^9, + 3.827592001041284*^9}},ExpressionUUID->"fb841cc2-16ed-4923-9a30-\ +bc5a75a58d13"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{3.827233781101843*^9, + 3.827592018065812*^9},ExpressionUUID->"1562e734-6db1-47f9-b448-\ +8417ab4bc8a3"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.827592003598214*^9, + 3.827592003599695*^9}},ExpressionUUID->"2983adc4-906c-4806-a44c-\ +40e24b622344"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.827015689025117*^9, {3.827015877497109*^9, 3.827015879373479*^9}, { + 3.827233888426614*^9, 3.827233902644916*^9}, {3.827592006032984*^9, + 3.827592016455473*^9}},ExpressionUUID->"fabcd13b-f64e-462e-9ce0-\ +93beab49a689"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.827592011806964*^9, + 3.827592011807536*^9}},ExpressionUUID->"a8e89c68-0ed4-412d-ac40-\ +c90a39217adb"], + +Cell[BoxData["\[IndentingNewLine]"], "Input", + CellChangeTimes->{ + 3.827015703640147*^9, {3.8275920091173162`*^9, + 3.8275920149984818`*^9}},ExpressionUUID->"e7f3d7c2-d786-4f2d-972c-\ +e72a1fd9d536"], + +Cell[BoxData["\[IndentingNewLine]"], "Input", + CellChangeTimes->{ + 3.8270157133472443`*^9, {3.827015930075873*^9, 3.827015944724019*^9}, + 3.8270160078509827`*^9, {3.827234096753356*^9, 3.827234128749672*^9}, + 3.827591923503611*^9},ExpressionUUID->"75acb2d9-7d9a-435e-9168-\ +e6974de3ac37"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.8275919178728313`*^9, + 3.827591921486858*^9}},ExpressionUUID->"a131fed6-bd82-436b-8f34-\ +726cb9feaf42"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{ + 3.8272341250601263`*^9, {3.827234267481167*^9, 3.827234270620459*^9}, + 3.827591915881618*^9},ExpressionUUID->"181b32a8-9567-4406-8b3b-\ +bda3f9db48d4"], + +Cell[BoxData[""], "Input", + CellChangeTimes->{{3.827592025333098*^9, + 3.827592025333497*^9}},ExpressionUUID->"fa5a0042-6a54-4b50-aec3-\ +2dca27b53c77"] }, -WindowSize->{1141.2, 580.1999999999999}, -WindowMargins->{{0, Automatic}, {Automatic, 0}}, +WindowSize->{1680, 997}, +WindowMargins->{{0, Automatic}, {Automatic, 1}}, TaggingRules->{ "WelcomeScreenSettings" -> {"FEStarting" -> False}, "TryRealOnly" -> False}, -Magnification:>0.75 Inherited, -FrontEndVersion->"12.2 for Microsoft Windows (64-bit) (December 12, 2020)", +FrontEndVersion->"12.2 for Mac OS X x86 (64-bit) (December 12, 2020)", StyleDefinitions->"Default.nb", -ExpressionUUID->"0ba00511-db7c-44ab-b44d-b3cc093d815c" +ExpressionUUID->"b7bab8b5-8d2c-420d-90ae-81d863db40de" ] (* End of Notebook Content *) @@ -950,16 +1189,60 @@ CellTagsIndex->{} *) (*NotebookFileOutline Notebook[{ -Cell[558, 20, 1066, 27, 51, "Input",ExpressionUUID->"e07bf09a-837b-4f9f-8ae3-5348be5ddf17"], -Cell[1627, 49, 3745, 97, 139, "Input",ExpressionUUID->"8416a644-c7b6-49ed-b9a7-00441e0da565"], -Cell[CellGroupData[{ -Cell[5397, 150, 2828, 67, 79, "Input",ExpressionUUID->"8379e00e-31fd-45ee-947a-f7bdd9f9030f"], -Cell[8228, 219, 1063, 27, 88, "Output",ExpressionUUID->"1205d5b1-89f5-45e5-a9c1-06eb476de438"] -}, Open ]], -Cell[9306, 249, 7140, 153, 93, "Input",ExpressionUUID->"9f3db9fc-4bd0-4901-a278-800e3ef14180"], -Cell[16449, 404, 16316, 371, 264, "Input",ExpressionUUID->"8dcda01e-2c9e-4dc5-8077-5ebe246709ad"], -Cell[32768, 777, 6049, 122, 108, "Input",ExpressionUUID->"b8094c35-5837-436f-8f85-46a68c4f4313"], -Cell[38820, 901, 1735, 29, 52, "Input",ExpressionUUID->"a980ffde-5f1c-41eb-859a-645f6f681bab"] +Cell[400, 13, 1160, 28, 94, "Input",ExpressionUUID->"e07bf09a-837b-4f9f-8ae3-5348be5ddf17"], +Cell[1563, 43, 143, 2, 30, "Input",ExpressionUUID->"aabee575-ae6b-41e4-b5f0-da1db60272a3"], +Cell[1709, 47, 151, 3, 30, "Input",ExpressionUUID->"78eb6dc2-16ad-4001-9098-f6012ebe6827"], +Cell[1863, 52, 4406, 107, 199, "Input",ExpressionUUID->"8416a644-c7b6-49ed-b9a7-00441e0da565"], +Cell[6272, 161, 174, 3, 30, "Input",ExpressionUUID->"7e9772ff-a5eb-4fea-abe4-9be5751b52ec"], +Cell[6449, 166, 152, 3, 30, "Input",ExpressionUUID->"980770e8-8c7c-4c9b-9c96-921e9bac7453"], +Cell[6604, 171, 2971, 69, 115, "Input",ExpressionUUID->"8379e00e-31fd-45ee-947a-f7bdd9f9030f"], +Cell[9578, 242, 7216, 156, 136, "Input",ExpressionUUID->"9f3db9fc-4bd0-4901-a278-800e3ef14180"], +Cell[16797, 400, 16379, 374, 388, "Input",ExpressionUUID->"8dcda01e-2c9e-4dc5-8077-5ebe246709ad"], +Cell[33179, 776, 149, 3, 30, "Input",ExpressionUUID->"8c5188f9-101e-4c8a-96a9-05815eea9909"], +Cell[33331, 781, 6071, 122, 157, "Input",ExpressionUUID->"b8094c35-5837-436f-8f85-46a68c4f4313"], +Cell[39405, 905, 1814, 31, 73, "Input",ExpressionUUID->"a980ffde-5f1c-41eb-859a-645f6f681bab"], +Cell[41222, 938, 81, 0, 30, "Input",ExpressionUUID->"3bff58ae-3b3c-448e-abda-55839fc41921"], +Cell[41306, 940, 237, 5, 61, "Text",ExpressionUUID->"df58675c-2edd-40a5-b990-47cdab947cbc"], +Cell[41546, 947, 279, 5, 30, "Input",ExpressionUUID->"b74e55a7-d1b9-41c4-9a63-38a9439472dd"], +Cell[41828, 954, 145, 3, 31, "WolframAlphaShortInput",ExpressionUUID->"f0339fc1-fa06-4372-a6e5-26969dd5c3e8"], +Cell[41976, 959, 81, 0, 30, "Input",ExpressionUUID->"265ab251-10cf-43c9-aab4-8efde0bfdb91"], +Cell[42060, 961, 221, 4, 52, "Input",ExpressionUUID->"46025181-a9bc-49f3-8c42-8aac84ff2e6c"], +Cell[42284, 967, 180, 4, 30, "Input",ExpressionUUID->"a76dfec1-ca21-4cd9-a015-8f76f7008849"], +Cell[42467, 973, 149, 2, 30, "Input",ExpressionUUID->"63b0d55f-6fbb-4f3b-a9af-9ce1536327eb"], +Cell[42619, 977, 150, 3, 30, "Input",ExpressionUUID->"5d8e94f4-5668-40d1-9dc1-68714914e083"], +Cell[42772, 982, 227, 4, 30, "Input",ExpressionUUID->"61e9f7d8-e1c9-4123-b3ee-977760a163c9"], +Cell[43002, 988, 240, 5, 61, "Text",ExpressionUUID->"7c9679ce-82d8-4b8f-9256-ea3dfae70fd8"], +Cell[43245, 995, 201, 4, 30, "Input",ExpressionUUID->"1f1448ea-1f7b-49fb-9dd0-f0ed771a638c"], +Cell[43449, 1001, 150, 3, 30, "Input",ExpressionUUID->"12f8e219-168f-46d6-a588-2743451d1ab3"], +Cell[43602, 1006, 204, 4, 30, "Input",ExpressionUUID->"36ec0f60-2eec-4b05-834c-d58a6c424efd"], +Cell[43809, 1012, 152, 3, 30, "Input",ExpressionUUID->"1225d561-d4ed-4bda-ac1c-24e5da6752e8"], +Cell[43964, 1017, 150, 3, 30, "Input",ExpressionUUID->"699b63f0-f4a6-4e42-874f-e8086b422b95"], +Cell[44117, 1022, 152, 3, 30, "Input",ExpressionUUID->"3af13b5b-adb4-4d97-a1a4-aaf6f5722c40"], +Cell[44272, 1027, 175, 3, 30, "Input",ExpressionUUID->"18265f42-a7f8-465b-aeb8-81ca2ed8e36f"], +Cell[44450, 1032, 81, 0, 30, "Input",ExpressionUUID->"8cbe8342-e134-4e40-a186-d1641776e002"], +Cell[44534, 1034, 292, 6, 61, "Text",ExpressionUUID->"2e9913c3-b57a-4d79-a001-6d1b80c2245d"], +Cell[44829, 1042, 128, 3, 30, "Input",ExpressionUUID->"bd87e015-ebc9-4825-abca-929462561f38"], +Cell[44960, 1047, 228, 5, 30, "Input",ExpressionUUID->"f520b937-a4ad-4e95-bcdd-3cf8940f10d8"], +Cell[45191, 1054, 349, 6, 30, "Input",ExpressionUUID->"bc35d8b7-1fb3-4d17-92d7-152d4199118e"], +Cell[45543, 1062, 275, 9, 30, "Input",ExpressionUUID->"d3b17180-80aa-441f-81fe-e449de576492"], +Cell[45821, 1073, 178, 4, 30, "Input",ExpressionUUID->"88753f31-4a3c-4b17-a7b2-777b8f5d4975"], +Cell[46002, 1079, 227, 4, 30, "Input",ExpressionUUID->"ea5e54d0-cef1-4d86-8dd7-aa13f0a050ed"], +Cell[46232, 1085, 251, 5, 30, "Input",ExpressionUUID->"2011a0b8-be94-408d-8d4b-2d8fdf6123f3"], +Cell[46486, 1092, 169, 3, 35, "Text",ExpressionUUID->"55df889b-8a33-4617-b027-4e0ca9ba3288"], +Cell[46658, 1097, 81, 0, 30, "Input",ExpressionUUID->"409b74f1-b510-426e-aa9e-f07ee9ffeac6"], +Cell[46742, 1099, 253, 5, 30, "Input",ExpressionUUID->"4c12c169-d766-4d50-99b4-865df32d95b5"], +Cell[46998, 1106, 152, 3, 30, InheritFromParent,ExpressionUUID->"bfdee08f-6ea5-4c6d-bf72-da5c7e3496b6"], +Cell[47153, 1111, 150, 3, 30, "Input",ExpressionUUID->"100f10b2-8625-4bca-9fde-9a519fc0ec9f"], +Cell[47306, 1116, 152, 3, 30, InheritFromParent,ExpressionUUID->"fb841cc2-16ed-4923-9a30-bc5a75a58d13"], +Cell[47461, 1121, 150, 3, 30, "Input",ExpressionUUID->"1562e734-6db1-47f9-b448-8417ab4bc8a3"], +Cell[47614, 1126, 152, 3, 30, InheritFromParent,ExpressionUUID->"2983adc4-906c-4806-a44c-40e24b622344"], +Cell[47769, 1131, 274, 5, 30, "Input",ExpressionUUID->"fabcd13b-f64e-462e-9ce0-93beab49a689"], +Cell[48046, 1138, 152, 3, 30, InheritFromParent,ExpressionUUID->"a8e89c68-0ed4-412d-ac40-c90a39217adb"], +Cell[48201, 1143, 201, 4, 52, "Input",ExpressionUUID->"e7f3d7c2-d786-4f2d-972c-e72a1fd9d536"], +Cell[48405, 1149, 295, 5, 52, "Input",ExpressionUUID->"75acb2d9-7d9a-435e-9168-e6974de3ac37"], +Cell[48703, 1156, 154, 3, 30, InheritFromParent,ExpressionUUID->"a131fed6-bd82-436b-8f34-726cb9feaf42"], +Cell[48860, 1161, 202, 4, 30, "Input",ExpressionUUID->"181b32a8-9567-4406-8b3b-bda3f9db48d4"], +Cell[49065, 1167, 152, 3, 30, InheritFromParent,ExpressionUUID->"fa5a0042-6a54-4b50-aec3-2dca27b53c77"] } ] *) diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-EC_RISM_wet-EC_RISM.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-EC_RISM_wet-EC_RISM.csv new file mode 100644 index 00000000..d8771671 --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-EC_RISM_wet-EC_RISM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,2.25,0.02344107623307046,2.447871928239328 +SM26,SM26,0.51,0.023361138100292885,2.4395583624304598 +SM27,SM27,2.21,0.010000039446495069,1.0500041024354871 +SM28,SM28,2.18,0.01,1.05 +SM29,SM29,2.07,0.010000149489547966,1.0500155469129884 +SM30,SM30,3.78,0.01000134536305702,1.05013991775793 +SM31,SM31,3.27,0.010000000433838209,1.0500000451191742 +SM32,SM32,2.59,0.010000027305881223,1.0500028398116472 +SM33,SM33,5.27,0.01,1.05 +SM34,SM34,5.2700000000000005,0.010000013696641566,1.0500014244507228 +SM35,SM35,0.95,0.010000571108298259,1.0500593952630188 +SM36,SM36,2.59,0.010001314864335895,1.050136745890933 +SM37,SM37,2.14,0.0100001107628115,1.0500115193323958 +SM38,SM38,2.29,0.010002027036018357,1.050210811745909 +SM39,SM39,4.12,0.010091902371153578,1.0595578465999722 +SM40,SM40,3.61,0.01000134536305702,1.05013991775793 +SM41,SM41,1.64,0.023142696321745242,2.416840417461505 +SM42,SM42,4.44,0.023308600664929967,2.4340944691527167 +SM43,SM43,3.34,0.020713282071424373,2.164181335428135 +SM44,SM44,0.51,0.02169725306161642,2.2665143184081082 +SM45,SM45,1.8,0.022575502951253247,2.357852306930338 +SM46,SM46,1.63,0.020713297459104524,2.1641829357468705 + +Name: +EC_RISM_wet + EC_RISM +Category: +Physical (QM) +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv new file mode 100644 index 00000000..76fec8ad --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-0.38,0.18,2.3999880195383225 +SM26,SM26,-3.45,0.11,2.3999886092463867 +SM27,SM27,-0.52,0.13,2.3979742881675223 +SM28,SM28,1.85,0.13,1.5 +SM29,SM29,-0.85,0.15,2.3970721226093223 +SM30,SM30,1.07,0.14,2.3986675007101423 +SM31,SM31,0.02,0.13,2.3983151936120537 +SM32,SM32,1.27,0.16,2.39855103740242 +SM33,SM33,5.45,0.16,1.5 +SM34,SM34,1.23,0.14,2.39892838869771 +SM35,SM35,-1.99,0.36,2.399809083536587 +SM36,SM36,-1.03,0.6,2.399648783489643 +SM37,SM37,-1.43,0.75,2.3997498106923096 +SM38,SM38,-1.94,0.41,2.3995443305472417 +SM39,SM39,0.08,0.41,2.399608172679832 +SM40,SM40,-1.18,0.31,2.3996953304337554 +SM41,SM41,-3.23,0.18,2.3999972682711226 +SM42,SM42,-0.71,0.13,2.399997478807581 +SM43,SM43,-1.77,0.16,2.399994399413065 +SM44,SM44,-3.46,0.22,2.3999104647733165 +SM45,SM45,-0.42,0.41,2.399886760987729 +SM46,SM46,-2.2,0.29,2.3999411369110883 + +Name: +MD (CGenFF/TIP3P) + Gaussian_corrected +Category: +Physical (MM) + QM+LEC +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-NULL0-Bergazin.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-NULL0-Bergazin.csv new file mode 100644 index 00000000..8f59a552 --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-NULL0-Bergazin.csv @@ -0,0 +1,33 @@ +Predictions: +SM25,SM25,0,0,1.0 +SM26,SM26,0,0,1.0 +SM27,SM27,0,0,1.0 +SM28,SM28,0,0,1.0 +SM29,SM29,0,0,1.0 +SM30,SM30,0,0,1.0 +SM31,SM31,0,0,1.0 +SM32,SM32,0,0,1.0 +SM33,SM33,0,0,1.0 +SM34,SM34,0,0,1.0 +SM35,SM35,0,0,1.0 +SM36,SM36,0,0,1.0 +SM37,SM37,0,0,1.0 +SM38,SM38,0,0,1.0 +SM39,SM39,0,0,1.0 +SM40,SM40,0,0,1.0 +SM41,SM41,0,0,1.0 +SM42,SM42,0,0,1.0 +SM43,SM43,0,0,1.0 +SM44,SM44,0,0,1.0 +SM45,SM45,0,0,1.0 +SM46,SM46,0,0,1.0 + + +Name: +NULL0 + +Category: +Empirical + +Ranked: +TRUE diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-REF0-ChemAxon-Bergazin.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-REF0-ChemAxon-Bergazin.csv new file mode 100644 index 00000000..c1e48a77 --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-REF0-ChemAxon-Bergazin.csv @@ -0,0 +1,32 @@ +Predictions: +SM25,SM25,1.78,0,0 +SM26,SM26,-0.36,0,0 +SM27,SM27,0.83,0,0 +SM28,SM28,0.1,0,0 +SM29,SM29,0.76,0,0 +SM30,SM30,2.81,0,0 +SM31,SM31,0.89,0,0 +SM32,SM32,1.34,0,0 +SM33,SM33,3.4,0,0 +SM34,SM34,1.47,0,0 +SM35,SM35,-0.43,0,0 +SM36,SM36,1.62,0,0 +SM37,SM37,-0.31,0,0 +SM38,SM38,-0.25,0,0 +SM39,SM39,1.81,0,0 +SM40,SM40,-0.12,0,0 +SM41,SM41,0.42,0,0 +SM42,SM42,2.24,0,0 +SM43,SM43,0.95,0,0 +SM44,SM44,-0.19,0,0 +SM45,SM45,1.59,0,0 +SM46,SM46,0.34,0,0 + +Name: +REF0 ChemAxon + +Category: +Empirical + +Ranked: +TRUE diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv new file mode 100644 index 00000000..21280594 --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_IEFPCM_MST-IEFPCM_MST.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,1.45,0.0,1.62736237955926 +SM26,SM26,-3.13,0.0,2.506510219032152 +SM27,SM27,1.75,0.0,1.0600000003220549 +SM28,SM28,0.83,0.0,1.06 +SM29,SM29,1.23,0.0,1.060000014275805 +SM30,SM30,3.53,0.0,1.0600001445262448 +SM31,SM31,1.61,0.0,1.0600003022939144 +SM32,SM32,1.63,0.0,1.0600000019066798 +SM33,SM33,4.27,0.0,1.06 +SM34,SM34,2.39,0.0,1.0600002390356282 +SM35,SM35,0.77,0.0,1.060003863243295 +SM36,SM36,3.73,0.0,1.060517639767404 +SM37,SM37,-1.31,0.0,2.508112831230342 +SM38,SM38,0.48,0.0,1.0600307528312118 +SM39,SM39,2.45,0.0,1.062395960435093 +SM40,SM40,1.35,0.0,1.0890629521889412 +SM41,SM41,-1.45,0.0,2.496029608152775 +SM42,SM42,1.16,0.0,2.502465326649413 +SM43,SM43,-1.16,0.0,2.5071930954873802 +SM44,SM44,-0.6900000000000001,0.0,1.7782049976993068 +SM45,SM45,1.68,0.0,1.5093985461690262 +SM46,SM46,-0.95,0.0,2.473467961947144 + +Name: +TFE IEFPCM MST + IEFPCM/MST +Category: +Physical (QM) +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv new file mode 100644 index 00000000..445ad0dd --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,1.21,0.0,0.0 +SM26,SM26,0.01,0.0,0.0 +SM27,SM27,0.27,0.0,0.0 +SM28,SM28,-0.23,0.0,0.0 +SM29,SM29,-0.03,0.0,0.0 +SM30,SM30,1.52,0.0,0.0 +SM31,SM31,0.76,0.0,0.0 +SM32,SM32,0.77,0.0,0.0 +SM33,SM33,3.12,0.0,0.0 +SM34,SM34,1.89,0.0,0.0 +SM35,SM35,-1.7,0.0,0.0 +SM36,SM36,-0.36,0.0,0.0 +SM37,SM37,0.06,0.0,0.0 +SM38,SM38,-2.58,0.0,0.0 +SM39,SM39,-0.48,0.0,0.0 +SM40,SM40,-1.67,0.0,0.0 +SM41,SM41,-0.74,0.0,0.0 +SM42,SM42,0.78,0.0,0.0 +SM43,SM43,-0.14,0.0,0.0 +SM44,SM44,-1.73,0.0,0.0 +SM45,SM45,-0.52,0.0,0.0 +SM46,SM46,-0.93,0.0,0.0 + +Name: +TFE-NHLBI-TZVP-QM + TZVP-QM +Category: +Physical (QM) +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv new file mode 100644 index 00000000..0e15fb6e --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-18.33,0.0,2.1562024272328912 +SM26,SM26,-0.42,0.0,1.47 +SM27,SM27,0.01,0.0,1.4700000000432958 +SM28,SM28,-0.79,0.0,1.47 +SM29,SM29,-0.76,0.0,1.4700000019788169 +SM30,SM30,1.17,0.0,1.470000786180085 +SM31,SM31,-0.13,0.0,1.4700000037704035 +SM32,SM32,0.86,0.0,1.4700000003770786 +SM33,SM33,2.61,0.0,1.47 +SM34,SM34,0.92,0.0,1.4700000002072229 +SM35,SM35,-0.79,0.0,1.4700000040648111 +SM36,SM36,-0.18,0.0,1.470364134111271 +SM37,SM37,-1.11,0.0,1.4855093562606103 +SM38,SM38,-2.75,0.0,1.470000000075239 +SM39,SM39,-1.22,0.0,1.4720316802734248 +SM40,SM40,-2.18,0.0,1.4704507142861698 +SM41,SM41,-2.25,0.0,2.073783637412573 +SM42,SM42,-1.16,0.0,2.1392457493186248 +SM43,SM43,-2.93,0.0,2.127134816087262 +SM44,SM44,-3.08,0.0,2.00277016768521 +SM45,SM45,-1.52,0.0,1.8876243061912719 +SM46,SM46,-2.26,0.0,1.7097387894188616 + +Name: +TFE-SMD-solvent-opt + DFT_M06-2X_SMD_explicit_water +Category: +Physical (QM) +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv new file mode 100644 index 00000000..5c54c045 --- /dev/null +++ b/physical_property/logD/calculate_logD/logD_predictions/logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv @@ -0,0 +1,30 @@ +Predictions: +SM25,SM25,-2.33,1.0438236634602829,1.894117861536453 +SM26,SM26,-0.99,0.6411256089625921,1.2498009743401473 +SM27,SM27,0.49,0.3600129619091414,0.8000207390546263 +SM28,SM28,-0.6,0.36,0.8 +SM29,SM29,-0.54,0.3625241056999536,0.8040385691199258 +SM30,SM30,0.94,0.360471392816815,0.8007542285069041 +SM31,SM31,0.58,0.3600000000237929,0.8000000000380687 +SM32,SM32,1.24,0.3600282398376334,0.8000451837402134 +SM33,SM33,2.55,0.36,0.8 +SM34,SM34,1.46,0.3600002991428096,0.8000004786284954 +SM35,SM35,-0.81,0.36000043227744816,0.8000006916439171 +SM36,SM36,0.19,0.3600807198221639,0.8001291517154623 +SM37,SM37,-1.62,0.3600049444450307,0.8000079111120492 +SM38,SM38,-3.9,0.9808838646523136,1.7934141834437018 +SM39,SM39,-2.0,1.0099094360425362,1.8398550976680577 +SM40,SM40,-1.93,0.36120044198792783,0.8019207071806846 +SM41,SM41,-1.03,0.36000012476185017,0.8000001996189604 +SM42,SM42,0.23,0.36049300672398177,0.8007888107583709 +SM43,SM43,-0.2,0.36001487301534085,0.8000237968245454 +SM44,SM44,-1.63,0.3600704059249329,0.8001126494798927 +SM45,SM45,-0.5,0.36000015704952115,0.8000002512792339 +SM46,SM46,-1.8,0.3600000000432957,0.8000000000692732 + +Name: +TFE b3lypd3 + DFT_M05-2X_SMD +Category: +Physical (QM) +Ranked: +True \ No newline at end of file diff --git a/physical_property/logD/analysis/logD_submission_collection.csv b/physical_property/logD/calculate_logD/logD_submission_collection.csv similarity index 95% rename from physical_property/logD/analysis/logD_submission_collection.csv rename to physical_property/logD/calculate_logD/logD_submission_collection.csv index dd9520a1..34774b5c 100644 --- a/physical_property/logD/analysis/logD_submission_collection.csv +++ b/physical_property/logD/calculate_logD/logD_submission_collection.csv @@ -65,28 +65,28 @@ 63,EC_RISM_wet,EC_RISM,logP-ECRISM-1,pKa-ECRISM-1,Physical (QM),QM,SM44,0.51,0.02169725306161642,0.06,0.,0.45,2.266514318408108 64,EC_RISM_wet,EC_RISM,logP-ECRISM-1,pKa-ECRISM-1,Physical (QM),QM,SM45,1.8,0.022575502951253247,1.06,0.07,0.74,2.357852306930338 65,EC_RISM_wet,EC_RISM,logP-ECRISM-1,pKa-ECRISM-1,Physical (QM),QM,SM46,1.6300000000000001,0.020713297459104524,0.69,0.05,0.9400000000000002,2.1641829357468705 -66,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM25,-1.87,0.,-0.09,0.01,-1.78,0. -67,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM26,-0.21,0.,-0.87,0.06,0.66,0. -68,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM27,-0.67,0.,1.56,0.11,-2.23,0. -69,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM28,0.23,0.,1.18,0.08,-0.95,0. -70,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM29,-0.39,0.,1.61,0.11,-2.,0. -71,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM30,-1.94,0.,2.76,0.19,-4.699999999999999,0. -72,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM31,-1.12,0.,1.96,0.14,-3.08,0. -73,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM32,-2.23,0.,2.44,0.17,-4.67,0. -74,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM33,-3.12,0.,2.96,0.21,-6.08,0. -75,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM34,-2.47,0.,2.83,0.2,-5.300000000000001,0. -76,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM35,-0.32,0.,0.87,0.06,-1.19,0. -77,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM36,-1.52,0.,0.76,0.05,-2.2800000000000002,0. -78,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM37,-1.5,0.,1.45,0.1,-2.95,0. -79,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM38,0.62,0.,1.03,0.07,-0.41000000000000003,0. -80,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM39,-1.42,0.,1.89,0.13,-3.3099999999999996,0. -81,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM40,-0.25,0.,1.82,0.13,-2.0700000000000003,0. -82,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM41,0.48,0.,-0.42,0.03,0.8999999999999999,0. -83,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM42,-1.34,0.,0.99,0.07,-2.33,0. -84,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM43,-0.32,0.,0.42,0.03,-0.74,0. -85,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM44,1.33,0.,0.06,0.,1.27,0. -86,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM45,0.,0.,1.06,0.07,-1.06,0. -87,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM46,0.63,0.,0.69,0.05,-0.05999999999999994,0. +66,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM25,1.21,0.,-0.09,0.01,1.3,0. +67,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM26,0.01,0.,-0.87,0.06,0.88,0. +68,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM27,0.27,0.,1.56,0.11,-1.29,0. +69,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM28,-0.23,0.,1.18,0.08,-1.41,0. +70,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM29,-0.03,0.,1.61,0.11,-1.6400000000000001,0. +71,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM30,1.52,0.,2.76,0.19,-1.2399999999999998,0. +72,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM31,0.76,0.,1.96,0.14,-1.2,0. +73,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM32,0.77,0.,2.44,0.17,-1.67,0. +74,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM33,3.12,0.,2.96,0.21,0.16000000000000014,0. +75,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM34,1.8900000000000001,0.,2.83,0.2,-0.94,0. +76,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM35,-1.7,0.,0.87,0.06,-2.57,0. +77,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM36,-0.36,0.,0.76,0.05,-1.12,0. +78,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM37,0.06,0.,1.45,0.1,-1.39,0. +79,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM38,-2.58,0.,1.03,0.07,-3.6100000000000003,0. +80,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM39,-0.48,0.,1.89,0.13,-2.37,0. +81,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM40,-1.67,0.,1.82,0.13,-3.49,0. +82,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM41,-0.74,0.,-0.42,0.03,-0.32,0. +83,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM42,0.78,0.,0.99,0.07,-0.20999999999999996,0. +84,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM43,-0.14,0.,0.42,0.03,-0.56,0. +85,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM44,-1.73,0.,0.06,0.,-1.79,0. +86,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM45,-0.52,0.,1.06,0.07,-1.58,0. +87,TFE-NHLBI-TZVP-QM,TZVP-QM,logp-nhlbi-1,pka-nhlbi-1c,Physical (QM),QM,SM46,-0.93,0.,0.69,0.05,-1.62,0. 88,TFE-SMD-solvent-opt,DFT_M06-2X_SMD_explicit_water,logP_RodriguezPaluch_SMD_2,pKa_RodriguezPaluch_SMD_1,Physical (QM),QM,SM25,-18.330000000000002,0.,-0.09,0.01,-18.240000000000002,2.1562024272328912 89,TFE-SMD-solvent-opt,DFT_M06-2X_SMD_explicit_water,logP_RodriguezPaluch_SMD_2,pKa_RodriguezPaluch_SMD_1,Physical (QM),QM,SM26,-0.42,0.,-0.87,0.06,0.45,1.47 90,TFE-SMD-solvent-opt,DFT_M06-2X_SMD_explicit_water,logP_RodriguezPaluch_SMD_2,pKa_RodriguezPaluch_SMD_1,Physical (QM),QM,SM27,0.01,0.,1.56,0.11,-1.55,1.4700000000432958 diff --git a/physical_property/logD/calculate_logD/user-map.csv b/physical_property/logD/calculate_logD/user-map.csv new file mode 100644 index 00000000..85f52adf --- /dev/null +++ b/physical_property/logD/calculate_logD/user-map.csv @@ -0,0 +1,6 @@ +1,logD-EC_RISM_wet-EC_RISM.csv +3,logD-TFE_SMD_solvent_opt-DFT_M06_2X_SMD_explicit_water.csv +4,logD-MD_CGenFF_TIP3P-Gaussian_corrected.csv +5,logD-TFE_IEFPCM_MST-IEFPCM_MST.csv +6,logD-TFE_b3lypd3-DFT_M05_2X_SMD.csv +8,logD-TFE_NHLBI_TZVP_QM-TZVP_QM.csv diff --git a/physical_property/logD/logD_analysis.py b/physical_property/logD/logD_analysis.py new file mode 100644 index 00000000..a6fc7c53 --- /dev/null +++ b/physical_property/logD/logD_analysis.py @@ -0,0 +1,1650 @@ +#!/usr/bin/env python + +# ============================================================================= +# GLOBAL IMPORTS +# ============================================================================= +import os +import glob +import io +import collections +import pickle +import pandas as pd +import numpy as np +import seaborn as sns +from matplotlib import pyplot as plt +import scipy.stats +from pylab import rcParams +import math + + + +# ============================================================================= +# CONSTANTS +# ============================================================================= + +# Paths to input data. +LOGD_SUBMISSIONS_DIR_PATH = './calculate_logD/logD_predictions' +EXPERIMENTAL_DATA_FILE_PATH = './logD_experimental_values.csv' +USER_MAP_FILE_PATH = './calculate_logD/user-map.csv' + +# ============================================================================= +# STATS FUNCTIONS +# ============================================================================= + +def r2(data): + x, y = data.T + slope, intercept, r_value, p_value, stderr = scipy.stats.linregress(x, y) + return r_value**2 + + +def slope(data): + x, y = data.T + slope, intercept, r_value, p_value, stderr = scipy.stats.linregress(x, y) + return slope + + +def me(data): + x, y = data.T + error = np.array(x) - np.array(y) + return error.mean() + + +def mae(data): + x, y = data.T + error = np.abs(np.array(x) - np.array(y)) + return error.mean() + + +def rmse(data): + x, y = data.T + error = np.array(x) - np.array(y) + rmse = np.sqrt((error**2).mean()) + return rmse + +def kendall_tau(data): + x, y = data.T + correlation, p_value = scipy.stats.kendalltau(x, y) + return correlation + + +def compute_bootstrap_statistics(samples, stats_funcs, percentile=0.95, n_bootstrap_samples=1000): + """Compute bootstrap confidence interval for the given statistics functions.""" + # Handle case where only a single function is passed. + #print("SAMPLES:\n", samples) + + try: + len(stats_funcs) + except TypeError: + stats_funcs = [stats_funcs] + + # Compute mean statistics. + statistics = [stats_func(samples) for stats_func in stats_funcs] + + # Generate bootstrap statistics. + bootstrap_samples_statistics = np.zeros((len(statistics), n_bootstrap_samples)) + for bootstrap_sample_idx in range(n_bootstrap_samples): + samples_indices = np.random.randint(low=0, high=len(samples), size=len(samples)) + for stats_func_idx, stats_func in enumerate(stats_funcs): + bootstrap_samples_statistics[stats_func_idx][bootstrap_sample_idx] = stats_func(samples[samples_indices]) + + # Compute confidence intervals. + percentile_index = int(np.floor(n_bootstrap_samples * (1 - percentile) / 2)) - 1 + bootstrap_statistics = [] + for stats_func_idx, samples_statistics in enumerate(bootstrap_samples_statistics): + samples_statistics.sort() + stat_lower_percentile = samples_statistics[percentile_index] + stat_higher_percentile = samples_statistics[-percentile_index+1] + confidence_interval = (stat_lower_percentile, stat_higher_percentile) + bootstrap_statistics.append([statistics[stats_func_idx], confidence_interval, samples_statistics]) + + return bootstrap_statistics + +# ============================================================================= +# STATS FUNCTIONS FOR QQ-PLOT AND ERROR SLOPE CALCULATION +# +# Methods from uncertain_check.py David L. Mobley wrote for the SAMPL4 analysis +# =============================================================================== + +def normal(y): + """Return unit normal distribution value at specified location.""" + return 1. / np.sqrt(2 * np.pi) * np.exp(-y ** 2 / 2.) + + +def compute_range_table(stepsize=0.001, maxextent=10): + """Compute integrals of the unit normal distribution and return these tabulated. + Returns: + -------- + - range: NumPy array giving integration range (x) where integration range runs -x to +x + - integral: NumPy arrange giving integrals over specified integration range. + + Arguments (optional): + --------------------- + - stepsize: Step size to advance integration range by each trial. Default: 0.001 + - maxextent: Maximum extent of integration range +""" + # Calculate integration range + x = np.arange(0, maxextent, stepsize) # Symmetric, so no need to do negative values. + + # Calculate distribution at specified x values + distrib = normal(x) + + integral = np.zeros(len(x), float) + for idx in range(1, len(x)): + integral[idx] = 2 * scipy.integrate.trapz(distrib[0:idx + 1], x[0:idx + 1]) # Factor of 2 handles symmetry + + return x, integral + + +def get_range(integral, rangetable, integraltable): + """Use rangetable and integral table provided (i.e. from compute_range_table) to find the smallest range of integration for which the integral is greater than the specified value (integral). Return this range as a float.""" + + idx = np.where(integraltable > integral)[0] + return rangetable[idx[0]] + + +# [DLM]Precompute integral of normal distribution so I can look up integration range which gives desired integral +# integral_range, integral = compute_range_table() + + +def fracfound_vs_error(calc, expt, dcalc, dexpt, integral_range, integral): + """ + Takes in calculated and experimental values, their uncertainties as well as + """ + # Fraction of Gaussian distribution we want to compute + X = np.arange(0, 1.0, 0.01) + Y = np.zeros(len(X)) + + for (i, x) in enumerate(X): + # Determine integration range which gives us this much probability + rng = get_range(x, integral_range, integral) + # print x, rng + + # Loop over samples and compute fraction of measurements found + y = 0. + # for n in range(0, len(DGcalc)): + # sigma_eff = sqrt( sigma_calc[n]**2 + sigma_expt[n]**2 ) + # absdiff = abs( DGcalc[n] - DGexpt[n]) + # #print absdiff, n, sigma_eff + # if absdiff < rng * sigma_eff: #If the difference falls within the specified range of sigma values, then this is within the range we're looking at; track it + # #print "Incrementing y for n=%s, x = %.2f" % (n, x) + # y += 1./len(DGcalc) + # Rewrite for speed + sigma_eff = np.sqrt(np.array(dcalc) ** 2 + np.array(dexpt) ** 2) + absdiff = np.sqrt((np.array(calc) - np.array(expt)) ** 2) + idx = np.where(absdiff < rng * sigma_eff)[0] + Y[i] = len(idx) * 1. / len(calc) + + # print Y + # raw_input() + + return X, Y + + +# Copied from David L. Mobley's scripts written for SAMPL4 analysis (added calculation uncertainty) +def bootstrap_exptnoise(calc1, expt1, exptunc1, returnunc=False): + """Take two datasets (equal length) of calculated and experimental values. Construct new datasets of equal length by picking, with replacement, a set of indices to use from both sets. Return the two new datasets. To take into account experimental uncertainties, random noise is added to the experimental set, distributed according to gaussians with variance taken from the experimental uncertainties. Approach suggested by J. Chodera. +Optionally, 'returnunc = True', which returns a third value -- experimental uncertainties corresponding to the data points actually used.""" + + # Make everything an array just in case + calc = np.array(calc1) + expt = np.array(expt1) + exptunc = np.array(exptunc1) + npoints = len(calc) + + # Pick random datapoint indices + idx = np.random.randint(0, npoints, + npoints) # Create an array consisting of npoints indices, where each index runs from 0 up to npoints. + + # Construct initial new datasets + newcalc = calc[idx] + newexpt = expt[idx] + newuncExp = exptunc[idx] + + # Add noise to experimental set + noise = np.random.normal(0., + exptunc) # For each data point, draw a random number from a normal distribution centered at 0, with standard devaitions given by exptunc + newexpt += noise + + if not returnunc: + return newcalc, newexpt + else: + return newcalc, newexpt, newuncExp + +# Modified from David L. Mobley's scripts written for SAMPL4 analysis (added bootstrapped values to the list of returned values ) +def getQQdata(calc, expt, dcalc, dexpt, boot_its): + """ + Takes calculated and experimental values and their uncertainties + + Parameters + ---------- + calc: predicted logD value + expt: experimental logD value + dcalc: predicted model uncertainty + dexp: experimental logD SEM + + Outputs + ------- + X: array of x axis values for QQ-plot + Y: array of y axis values for QQ-plot + slope: Error Slope (ES) of line fit to QQ-plot + slopes: Erros Slope (ES) of line fit to QQ-plot of bootstrapped datapoints + """ + integral_range, integral = compute_range_table() + X, Y = fracfound_vs_error(calc, expt, dcalc, dexpt, integral_range, integral) + xtemp = X[:, np.newaxis] + coeff, _, _, _ = np.linalg.lstsq(xtemp, Y,rcond=-1) + slope = coeff[0] + slopes = [] + for it in range(boot_its): + n_calc, n_expt, n_dexpt = bootstrap_exptnoise(calc, expt, dexpt, returnunc=True) + nX, nY = fracfound_vs_error(n_calc, n_expt, dcalc, n_dexpt, integral_range, integral) + a, _, _, _ = np.linalg.lstsq(xtemp, nY,rcond=-1) + slopes.append(a[0]) + return X, Y, slope, np.array(slopes).std(), slopes + +# ============================================================================= +# PLOTTING FUNCTIONS +# ============================================================================= + +def plot_correlation(x, y, data, title=None, color=None, kind='joint', ax=None): + # Extract only logD values. + data = data[[x, y]] + + # Find extreme values to make axes equal. + min_limit = np.ceil(min(data.min()) - 1) + max_limit = np.floor(max(data.max()) + 1) + axes_limits = np.array([min_limit, max_limit]) + + if kind == 'joint': + grid = sns.jointplot(x=x, y=y, data=data, + kind='reg', joint_kws={'ci': None}, #stat_func=None, + xlim=axes_limits, ylim=axes_limits, color=color) + ax = grid.ax_joint + grid.fig.subplots_adjust(top=0.95) + grid.fig.suptitle(title) + elif kind == 'reg': + #print("x_values",type(x_values)) + #print("y_values",type(y_values)) + #print("data",type(data)) + ax = sns.regplot(x=x, y=y, data=data, color=color, ax=ax) + ax.set_title(title) + + # Add diagonal line. + ax.plot(axes_limits, axes_limits, ls='--', c='black', alpha=0.8, lw=0.7) + + # Add shaded area for 0.5-1 logD error. + palette = sns.color_palette('BuGn_r') + ax.fill_between(axes_limits, axes_limits - 0.5, axes_limits + 0.5, alpha=0.2, color=palette[2]) + ax.fill_between(axes_limits, axes_limits - 1, axes_limits + 1, alpha=0.2, color=palette[3]) + + +def plot_correlation_with_SEM(x_lab, y_lab, x_err_lab, y_err_lab, data, title=None, color=None, ax=None): + # Extract only logD values. + x_error = data.loc[:, x_err_lab] + y_error = data.loc[:, y_err_lab] + x_values = data.loc[:, x_lab] + y_values = data.loc[:, y_lab] + data = data[[x_lab, y_lab]] + + # Find extreme values to make axes equal. + min_limit = np.ceil(min(data.min()) - 1) + max_limit = np.floor(max(data.max()) + 1) + axes_limits = np.array([min_limit, max_limit]) + + # Color + current_palette = sns.color_palette() + sns_blue = current_palette[0] + + # Plot + plt.figure(figsize=(6, 6)) + grid = sns.regplot(x=x_values, y=y_values, data=data, color=color, ci=None) + plt.errorbar(x=x_values, y=y_values, xerr=x_error, yerr=y_error, fmt="o", ecolor=sns_blue, capthick='2', + label='SEM', alpha=0.75) + plt.axis("equal") + + #if len(title) > 70: + # plt.title(title[:70]+"...") + #else: + plt.title(title) + + # Add diagonal line. + grid.plot(axes_limits, axes_limits, ls='--', c='black', alpha=0.8, lw=0.7) + + # Add shaded area for 0.5-1 logD error. + palette = sns.color_palette('BuGn_r') + grid.fill_between(axes_limits, axes_limits - 0.5, axes_limits + 0.5, alpha=0.2, color=palette[2]) + grid.fill_between(axes_limits, axes_limits - 1, axes_limits + 1, alpha=0.2, color=palette[3]) + + plt.xlim(axes_limits) + plt.ylim(axes_limits) + +def barplot_with_CI_errorbars(df, x_label, y_label, y_lower_label, y_upper_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:,delta_lower_yerr_label] = data.loc[:,y_label] - data.loc[:,y_lower_label] + data.loc[:,delta_upper_yerr_label] = data.loc[:,y_upper_label] - data.loc[:,y_label] + + # Color + current_palette = sns.color_palette() + sns_color = current_palette[2] + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 16 #14 + plt.rcParams['ytick.labelsize'] = 18 #16 + plt.rcParams['legend.fontsize'] = 16 + plt.rcParams['legend.handlelength'] = 2 + plt.rcParams['figure.autolayout'] = True + #plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=45, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=sns_color, capsize=3, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + + +def barplot_with_CI_errorbars_colored_by_label(df, x_label, y_label, y_lower_label, y_upper_label, color_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + color_label: str, column name of label that will determine the color of bars + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:, delta_lower_yerr_label] = data.loc[:, y_label] - data.loc[:, y_lower_label] + data.loc[:, delta_upper_yerr_label] = data.loc[:, y_upper_label] - data.loc[:, y_label] + + # Color + #current_palette = sns.color_palette() + #sns_color = current_palette[2] # Error bar color + + # Zesty colorblind-friendly color palette + color0 = "#0F2080" + color1 = "#F5793A" + #color2 = "#A95AA1" + color3 = "#85C0F9" + current_palette = [color0, color1, color3]#color2, color3] + error_color = 'gray' + + # Bar colors + if color_label == "category": + #category_list = ["Physical", "Empirical", "Mixed", "Other"] + category_list = ["Physical (MM) + QM+LEC", "Empirical", "Physical (QM)"]#["Physical", "Empirical"] + #elif color_label == "reassigned_category": + #category_list = ["Physical (MM) + QM+LEC", "Empirical", "Mixed", "Physical (QM)"] + #category_list = ["Physical (MM) + QM+LEC", "Empirical", "Physical (QM)"] + elif color_label == "type": + category_list = ["Standard", "Reference"] + else: + Exception("Error: Unsupported label used for coloring") + bar_color_dict = {} + for i, cat in enumerate(category_list): + bar_color_dict[cat] = current_palette[i] + #print("bar_color_dict:\n", bar_color_dict) + + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 18 #16 + plt.rcParams['legend.fontsize'] = 16 + plt.rcParams['legend.handlelength'] = 2 + # plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + #barlist = plt.bar(x, y) + fig, ax = plt.subplots(figsize=figsize) + barlist = ax.bar(x, y) + + plt.xticks(x, data[x_label], rotation=45, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=3, elinewidth=2, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + + # Reset color of bars based on color label + #print("data.columns:\n",data.columns) + #print("\nData:\n", data) + for i, c_label in enumerate(data.loc[:, color_label]): + barlist[i].set_color(bar_color_dict[c_label]) + + # create legend + from matplotlib.lines import Line2D + if color_label == 'category': + custom_lines = [Line2D([0], [0], color=bar_color_dict["Physical (MM) + QM+LEC"], lw=5), + Line2D([0], [0], color=bar_color_dict["Empirical"], lw=5), + Line2D([0], [0], color=bar_color_dict["Physical (QM)"], lw=5)] + #Line2D([0], [0], color=bar_color_dict["Mixed"], lw=5), + #Line2D([0], [0], color=bar_color_dict["Other"], lw=5)] + #elif color_label == 'reassigned_category': + # custom_lines = [Line2D([0], [0], color=bar_color_dict["Physical (MM) + QM+LEC"], lw=5), + # Line2D([0], [0], color=bar_color_dict["Empirical"], lw=5), + #Line2D([0], [0], color=bar_color_dict["Mixed"], lw=5), + # Line2D([0], [0], color=bar_color_dict["Physical (QM)"], lw=5)] + elif color_label == 'type': + custom_lines = [Line2D([0], [0], color=bar_color_dict["Standard"], lw=5), + Line2D([0], [0], color=bar_color_dict["Reference"], lw=5)] + ax.legend(custom_lines, category_list) + + +def barplot(df, x_label, y_label, title): + """Creates bar plot of a given dataframe. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + title: str, the title of the plot + + """ + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + #plt.tight_layout() + + # Plot + data = df + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=45) + plt.xlabel(x_label) + plt.ylabel(y_label) + if len(title) > 70: + plt.title(title[:70]+"...") + else: + plt.title(title) + plt.tight_layout() + +# ============================================================================ +# PLOTTING FUNCTIONS FOR QQ-PLOT +# +# Methods from uncertain_check.py David L. Mobley wrote for the SAMPL4 analysis +# ============================================================================= + + +def makeQQplot(X, Y, slope, title, xLabel ="Expected fraction within range" , yLabel ="Fraction of predictions within range", fileName = "QQplot.pdf", uncLabel = 'Model Unc.', leg = [1.02, 0.98, 2, 1], ax1 = None): + """ + Provided with experimental and calculated values (and their associated uncertainties) in the form of list like objects. + Provides the analysis to make a QQ-plot using the guassian integral methods David wrote for SAMPL4 that are included above. + Makes a files of the plot and returns the "error slope" as a float and the figure of the created plot + """ + if ax1 == None: + axReturn = False + # Get plot parameters for JCAMD + # plt.rcParams.update(JCAMDdict()) + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.rcParams['figure.figsize'] = 6, 6 + + # Set up plot + #fig1 = plt.figure(1, figsize=(6,6)) + #plt.ylim = (0,1) + #plt.xlim = (0,1) + #plt.xlabel(xLabel) + #plt.ylabel(yLabel) + #plt.title(title, fontsize=20) + #ax1 = fig1.add_subplot(111) + + # New way to plot with subplots + fig1, ax1 = plt.subplots(1,1) + ax1.set_xlim(0,1) + ax1.set_ylim(0,1) + ax1.set_xlabel(xLabel) + ax1.set_ylabel(yLabel) + ax1.set_title(title, fontsize=20) + + else: + axReturn = True + # Add data to plot + p1 = ax1.plot(X,Y,'bo', label = uncLabel) + + # Add x=y line + p2 = ax1.plot(X,X,'k-', label = r'$X=Y$') + + # X data needs to be a column vector to use linalg.lstsq + p3 = ax1.plot(X, slope*X, 'r-', label = 'Slope %.2f' % slope) + + # Build Legend + handles = [p1,p2,p3] + if leg != None: + ax1.legend(bbox_to_anchor = (leg[0], leg[1]), loc = leg[2], ncol = leg[3], borderaxespad = 0.) + + if axReturn: + return ax1 + else: + # Adjust spacing then save and close figure + plt.savefig(fileName) + plt.close(fig1) +## other + +def name_to_filename(id): + for ch in [' ','/']: + if ch in id: + id=id.replace(ch,"_") + for ch in ['(',')']: + if ch in id: + id=id.replace(ch,"") + return id + +# ============================================================================= +# UTILITY CLASSES +# ============================================================================= + +class IgnoredSubmissionError(Exception): + """Exception used to signal a submission that must be ignored.""" + pass + + +class BadFormatError(Exception): + """Exception used to signal a submission with unexpected formatting.""" + pass + + +class SamplSubmission: + """A generic SAMPL submission. + Parameters + ---------- + file_path : str + The path to the submission file. + Raises + ------ + IgnoredSubmission + If the submission ID is among the ignored submissions. + """ + # The IDs of submissions used for reference calculations + REF_SUBMISSIONS = ['REF0', 'NULL0'] + + + # Section of the submission file. + SECTIONS = {} + + # Sections in CSV format with columns names. + CSV_SECTIONS = {} + + def __init__(self, file_path, user_map): + file_name = os.path.splitext(os.path.basename(file_path))[0] + file_data = file_name.split('-') + + # Load predictions. + sections = self._load_sections(file_path) # From parent-class. + print(sections) + self.data = sections['Predictions'] # This is a list + self.data = pd.DataFrame(data=self.data) # Now a DataFrame + #self.name = sections['Name'][0] #want this to take the place of the 5 letter code + self.file_name = file_name + + self.method_name = sections['Name'][0] #want this to take the place of the 5 letter code + + # Check if this is a reference submission + self.reference_submission = False + #if self.method_name in self.REF_SUBMISSIONS: + if "REF" in self.method_name or "NULL" in self.method_name: + print("REF found: ", self.method_name) + self.reference_submission = True + + @classmethod + def _read_lines(cls, file_path): + """Generator to read the file and discard blank lines and comments.""" + with open(file_path, 'r', encoding='utf-8-sig') as f: + for line in f: + # Strip whitespaces. + line = line.strip() + # Don't return blank lines and comments. + if line != '' and line[0] != '#': + yield line + + @classmethod + def _load_sections(cls, file_path): + """Load the data in the file and separate it by sections.""" + #print("file_path",file_path) + sections = {} + current_section = None + for line in cls._read_lines(file_path): + # Check if this is a new section. + if line[:-1] in cls.SECTIONS: + current_section = line[:-1] + else: + if current_section is None: + import pdb + pdb.set_trace() + try: + sections[current_section].append(line) + except KeyError: + sections[current_section] = [line] + + # Check that all the sections have been loaded. + found_sections = set(sections.keys()) + if found_sections != cls.SECTIONS: + raise BadFormatError('Missing sections: {}.'.format(found_sections - cls.SECTIONS)) + + # Create a Pandas dataframe from the CSV format. + for section_name in cls.CSV_SECTIONS: + csv_str = io.StringIO('\n'.join(sections[section_name])) + columns = cls.CSV_SECTIONS[section_name] + id_column = columns[0] + #print("trying", sections) + section = pd.read_csv(csv_str, index_col=id_column, names=columns, skipinitialspace=True) + #section = pd.read_csv(csv_str, names=columns, skipinitialspace=True) + sections[section_name] = section + return sections + + @classmethod + def _create_comparison_dataframe(cls, column_name, submission_data, experimental_data): + """Create a single dataframe with submission and experimental data.""" + # Filter only the systems IDs in this submissions. + + + experimental_data = experimental_data[experimental_data.index.isin(submission_data.index)] # match by column index + # Fix the names of the columns for labelling. + submission_series = submission_data[column_name] + submission_series.name += ' (calc)' + experimental_series = experimental_data[column_name] + experimental_series.name += ' (expt)' + + # Concatenate the two columns into a single dataframe. + return pd.concat([experimental_series, submission_series], axis=1) + +# ============================================================================= +# LOGP PREDICTION CHALLENGE +# ============================================================================= + +class logDSubmission(SamplSubmission): + """A submission for logD challenge. + + Parameters + ---------- + file_path : str + The path to the submission file + + Raises + ------ + IgnoredSubmission + If the submission ID is among the ignored submissions. + + """ + # Section of the submission file. + SECTIONS = {"Predictions", + #"Participant name", + #"Participant organization", + "Name", + #"Compute time", + #"Computing and hardware", + #"Software", + "Category", + #"Method", + "Ranked"} + + # Sections in CSV format with columns names. + CSV_SECTIONS = {"Predictions": ("Molecule ID", "ID tag", "logD mean", "logD SEM", "logD model uncertainty")} + + + def __init__(self, file_path, user_map): + super().__init__(file_path, user_map) + + file_name = os.path.splitext(os.path.basename(file_path))[0] + file_data = file_name.split('-') + print(file_name) + + + # Load predictions. + sections = self._load_sections(file_path) # From parent-class. + self.data = sections['Predictions'] # This is a pandas DataFrame. + #self.participant = sections['Participant name'][0].strip() + self.method_name = sections['Name'][0] + self.category = sections['Category'][0] # New section for logD challenge. + self.ranked = sections['Ranked'][0].strip() =='True' + + # Check if this is a reference submission + self.reference_submission = False + if "REF" in self.method_name or "NULL" in self.method_name: + self.reference_submission = True + + + + + def compute_logD_statistics(self, experimental_data, stats_funcs): + data = self._create_comparison_dataframe('logD mean', self.data, experimental_data) + + # Create lists of stats functions to pass to compute_bootstrap_statistics. + stats_funcs_names, stats_funcs = zip(*stats_funcs.items()) + #bootstrap_statistics = compute_bootstrap_statistics(data.as_matrix(), stats_funcs, n_bootstrap_samples=10000) #10000 + + bootstrap_statistics = compute_bootstrap_statistics(data.to_numpy(), stats_funcs, n_bootstrap_samples=10000) + + # Return statistics as dict preserving the order. + return collections.OrderedDict((stats_funcs_names[i], + bootstrap_statistics[i]) + for i in range(len(stats_funcs))) + + def compute_logD_model_uncertainty_statistics(self,experimental_data): + + # Create a dataframe for data necessary for error slope analysis + expt_logD_series = experimental_data["logD mean"] + expt_logD_SEM_series = experimental_data["logD SEM"] + pred_logD_series = self.data["logD mean"] + pred_logD_SEM_series = self.data["logD SEM"] + pred_logD_mod_unc_series = self.data["logD model uncertainty"] + + # Concatenate the columns into a single dataframe. + data_exp = pd.concat([expt_logD_series, expt_logD_SEM_series], axis=1) + data_exp = data_exp.rename(index=str, columns={"logD mean": "logD mean (expt)", + "logD SEM": "logD SEM (expt)"}) + + data_mod_unc = pd.concat([data_exp, pred_logD_series, pred_logD_SEM_series, pred_logD_mod_unc_series], axis=1) + data_mod_unc = data_mod_unc.rename(index=str, columns={"logD mean (calc)": "logD mean (calc)", + "logD SEM": "logD SEM (calc)", + "logD model uncertainty": "logD model uncertainty"}) + #print("data_mod_unc:\n", data_mod_unc) + + # Compute QQ-Plot Error Slope (ES) + calc = data_mod_unc.loc[:, "logD mean (calc)"].values + expt = data_mod_unc.loc[:, "logD mean (expt)"].values + dcalc = data_mod_unc.loc[:, "logD model uncertainty"].values + dexpt = data_mod_unc.loc[:, "logD SEM (expt)"].values + n_bootstrap_samples = 1000 #1000 + + X, Y, error_slope, error_slope_std, slopes = getQQdata(calc, expt, dcalc, dexpt, boot_its=n_bootstrap_samples) + + QQplot_data = [X, Y, error_slope] + + # Compute 95% confidence intervals of Error Slope + percentile = 0.95 + percentile_index = int(np.floor(n_bootstrap_samples * (1 - percentile) / 2)) - 1 + + #for stats_func_idx, samples_statistics in enumerate(bootstrap_samples_statistics): + samples_statistics = np.asarray(slopes) + samples_statistics.sort() + stat_lower_percentile = samples_statistics[percentile_index] + stat_higher_percentile = samples_statistics[-percentile_index + 1] + confidence_interval = (stat_lower_percentile, stat_higher_percentile) + + model_uncertainty_statistics = [error_slope, confidence_interval, samples_statistics] + + + return model_uncertainty_statistics, QQplot_data + + +# ============================================================================= +# UTILITY FUNCTIONS +# ============================================================================= + + +def load_submissions(directory_path, user_map): + """Load submissions from a specified directory using a specified user map. + Optional argument: + ref_ids: List specifying submission IDs (alphanumeric, typically) of + reference submissions which are to be ignored/analyzed separately. + Returns: submissions + """ + submissions = [] + for file_path in glob.glob(os.path.join(directory_path, '*.csv')): + try: + submission = logDSubmission(file_path, user_map) + + except IgnoredSubmissionError: + continue + submissions.append(submission) + print(submissions) + return submissions + + + +def load_ranked_submissions(directory_path, user_map): + """ + Load submissions from a specified directory using a specified user map. + Optional argument: + ref_ids: List specifying submission IDs (alphanumeric, typically) of + reference submissions which are to be ignored/analyzed separately. + Returns: submissions + """ + submissions = [] + + for file_path in glob.glob(os.path.join(directory_path, '*.csv')): + try: + submission = logDSubmission(file_path, user_map) + except IgnoredSubmissionError: + continue + # only continue if submission is ranked + if not submission.ranked: + continue + if "REF" in submission.method_name or "NULL" in submission.method_name: + continue + + submissions.append(submission) + + method_names = [] + for submission in submissions: + method_names.append(submission.method_name) + print("Ranked submissions: \n", method_names) + + return submissions + + + +class logDSubmissionCollection: + """A collection of logD submissions.""" + + LOGP_CORRELATION_PLOT_BY_METHOD_PATH_DIR = 'logDCorrelationPlots' + LOGP_CORRELATION_PLOT_WITH_SEM_BY_METHOD_PATH_DIR = 'logDCorrelationPlotsWithSEM' + LOGP_CORRELATION_PLOT_BY_LOGP_PATH_DIR = 'error_for_each_logD.pdf' + ABSOLUTE_ERROR_VS_LOGP_PLOT_PATH_DIR = 'AbsoluteErrorPlots' + + + def __init__(self, submissions, experimental_data, output_directory_path, logD_submission_collection_file_path, + ignore_refcalcs = True, ranked_only = True, allow_multiple = True): + # Build collection dataframe from the beginning. + # Build full logD collection table. + + data = [] + + # Participant names we've found so far; tracked to ensure no one has more than one + # ranked submission + self.method_names_ranked = [] + + # Submissions for logD. + for submission in submissions: + if submission.reference_submission and ignore_refcalcs: + continue + + if ranked_only and not submission.ranked: + continue + # Store names associated with ranked submission, skip if they submitted multiple (only if we need to check for duplicate authors) + if submission.ranked and not allow_multiple: + if not submission.method_name in self.method_names_ranked: + self.method_names_ranked.append(submission.method_name) + else: + print(f"Error: {submission.method_name} submitted multiple ranked submissions.") + continue + + + for mol_ID, series in submission.data.iterrows(): + logD_mean_exp = experimental_data.loc[mol_ID, 'logD mean'] + logD_SEM_exp = experimental_data.loc[mol_ID, 'logD SEM'] + + logD_mean_pred = submission.data.loc[mol_ID, "logD mean"] + logD_SEM_pred = submission.data.loc[mol_ID, "logD SEM"] + print("logD_mean_pred \n",logD_mean_pred) + + logD_model_uncertainty = submission.data.loc[mol_ID, "logD model uncertainty"] + ranked = submission.ranked + + data.append({ + 'method_name': submission.method_name, + #'participant': submission.participant, + #'file name': submission.file_name, + 'category': submission.category, + 'Molecule ID': mol_ID, + 'logD (calc)': logD_mean_pred, + 'logD SEM (calc)': logD_SEM_pred, + 'logD (exp)': logD_mean_exp, + 'logD SEM (exp)': logD_SEM_exp, + '$\Delta$logD error (calc - exp)': logD_mean_pred - logD_mean_exp, + 'logD model uncertainty': logD_model_uncertainty + }) + + # Transform into Pandas DataFrame. + self.data = pd.DataFrame(data=data) + self.output_directory_path = output_directory_path + + print("\n SubmissionCollection: \n") + print(self.data) + + # Create general output directory. + os.makedirs(self.output_directory_path, exist_ok=True) + + # Save collection.data dataframe in a CSV file. + self.data.to_csv(logD_submission_collection_file_path) + + def generate_correlation_plots(self): + # logD correlation plots. + output_dir_path = os.path.join(self.output_directory_path, + self.LOGP_CORRELATION_PLOT_BY_METHOD_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + print("self.data \n",self.data) + print(print("self.data.method_name\n",self.data.method_name)) + for method_name in self.data.method_name.unique(): + # Skip NULL0 submission + if "NULL" in method_name: + continue + + data = self.data[self.data.method_name == method_name] + print("data \n",data) + title = '{}'.format(method_name) + + plt.close('all') + plot_correlation(x='logD (exp)', y='logD (calc)', + data=data, title=title, kind='joint') + plt.tight_layout() + # plt.show() + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + def generate_correlation_plots_with_SEM(self): + # logD correlation plots. + output_dir_path = os.path.join(self.output_directory_path, + self.LOGP_CORRELATION_PLOT_WITH_SEM_BY_METHOD_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + for method_name in self.data.method_name.unique(): + + # Skip NULL0 submission + if "NULL" in method_name: + continue + + data = self.data[self.data.method_name == method_name] + title = '{}'.format(method_name) + + plt.close('all') + plot_correlation_with_SEM(x_lab='logD (exp)', y_lab='logD (calc)', + x_err_lab='logD SEM (exp)', y_err_lab='logD SEM (calc)', + data=data, title=title) + plt.tight_layout() + # plt.show() + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + def generate_molecules_plot(self): + # Correlation plot by molecules. + plt.close('all') + data_ordered_by_mol_ID = self.data.sort_values(["Molecule ID"], ascending=["True"]) + sns.set(rc={'figure.figsize': (8.27,11.7)}) + sns.violinplot(y='Molecule ID', x='$\Delta$logD error (calc - exp)', data=data_ordered_by_mol_ID, + inner='point', linewidth=1, width=1.2) + plt.tight_layout() + # plt.show() + plt.savefig(os.path.join(self.output_directory_path, self.LOGP_CORRELATION_PLOT_BY_LOGP_PATH_DIR)) + + def generate_absolute_error_vs_molecule_ID_plot(self): + """ + For each method a bar plot is generated so that absolute errors of each molecule can be compared. + """ + # Setup output directory + output_dir_path = os.path.join(self.output_directory_path, + self.ABSOLUTE_ERROR_VS_LOGP_PLOT_PATH_DIR) + os.makedirs(output_dir_path, exist_ok=True) + + # Calculate absolute errors. + self.data["absolute error"] = np.NaN + self.data.loc[:, "absolute error"] = np.absolute(self.data.loc[:, "$\Delta$logD error (calc - exp)"]) + + # Create a separate plot for each submission. + for method_name in self.data.method_name.unique(): + data = self.data[self.data.method_name == method_name] + title = '{}'.format(method_name) + + plt.close('all') + barplot(df=data, x_label="Molecule ID", y_label="absolute error", title=title) + method_name = name_to_filename(method_name) + output_path = os.path.join(output_dir_path, '{}.pdf'.format(method_name)) + plt.savefig(output_path) + + +def generate_statistics_tables(submissions, stats_funcs, directory_path, file_base_name, + sort_stat=None, ordering_functions=None, + latex_header_conversions=None, ignore_refcalcs = True): + stats_names = list(stats_funcs.keys()) + ci_suffixes = ('', '_lower_bound', '_upper_bound') + + # Collect the records for the DataFrames. + statistics_csv = [] + statistics_latex = [] + statistics_plot = [] + + # Collect the records for QQ Plot + # Dictionary of receipt ID: [X, Y, error_slope] + QQplot_dict = {} + + for i, submission in enumerate(submissions): + method_name = submission.method_name + category = submission.category + file_name = submission.file_name + + + # Pull submission type + type = 'Standard' + if submission.reference_submission: + type = 'Reference' + + # Ignore reference calculation, if applicable + if submission.reference_submission and ignore_refcalcs: + continue + + print('\rGenerating bootstrap statistics for submission {} ({}/{})' + ''.format(method_name, i + 1, len(submissions)), end='') + + bootstrap_statistics = submission.compute_logD_statistics(experimental_data, stats_funcs) + + # Compute error slope + error_slope_bootstrap_statistics, QQplot_data = submission.compute_logD_model_uncertainty_statistics(experimental_data) + #print("error_slope_bootstrap_statistics:\n") + #print(error_slope_bootstrap_statistics) + + # Add data to to QQplot dictionary + QQplot_dict.update({method_name : QQplot_data}) + + # Add error slope and CI to bootstrap_statistics + bootstrap_statistics.update({'ES' : error_slope_bootstrap_statistics }) + #print("bootstrap_statistics:\n", bootstrap_statistics) + + # Organize data to construct CSV and PDF versions of statistics tables + record_csv = {} + record_latex = {} + for stats_name, (stats, (lower_bound, upper_bound), bootstrap_samples) in bootstrap_statistics.items(): + # For CSV and JSON we put confidence interval in separate columns. + for suffix, info in zip(ci_suffixes, [stats, lower_bound, upper_bound]): + record_csv[stats_name + suffix] = info + + # For the PDF, print bootstrap CI in the same column. + stats_name_latex = latex_header_conversions.get(stats_name, stats_name) + record_latex[stats_name_latex] = '{:.2f} [{:.2f}, {:.2f}]'.format(stats, lower_bound, upper_bound) + + # For the violin plot, we need all the bootstrap statistics series. + for bootstrap_sample in bootstrap_samples: + statistics_plot.append(dict(ID=method_name, category=category, + statistics=stats_name_latex, value=bootstrap_sample)) + + '''statistics_csv.append({'ID': method_name, 'name': file_name, 'category': category, 'type': type, **record_csv}) + escaped_name = file_name.replace('_', '\_') + statistics_latex.append({'ID': method_name, 'name': escaped_name, 'category': category, 'type':type, **record_latex})''' + + statistics_csv.append({'method name': method_name, 'file name': file_name, 'category': category, 'type': type, **record_csv}) + escaped_name = file_name.replace('_', '\_') + statistics_latex.append({'method name': method_name, 'file name': escaped_name, 'category': category, 'type':type, **record_latex}) + print() + print("statistics_csv:\n",statistics_csv) + print() + + + # Write QQplot_dict to a JSON file for plotting later + #print("QQplot_dict:\n", QQplot_dict) + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + os.makedirs(QQplot_directory_path, exist_ok=True) + QQplot_dict_filename = os.path.join(QQplot_directory_path, 'QQplot_dict.pickle') + + with open(QQplot_dict_filename, 'wb') as outfile: + pickle.dump(QQplot_dict, outfile) + + + # Convert dictionary to Dataframe to create tables/plots easily. + statistics_csv = pd.DataFrame(statistics_csv) + statistics_csv.set_index('method name', inplace=True) + statistics_latex = pd.DataFrame(statistics_latex) + statistics_plot = pd.DataFrame(statistics_plot) + + # Sort by the given statistics. + if sort_stat is not None: + statistics_csv.sort_values(by=sort_stat, inplace=True) + statistics_latex.sort_values(by=latex_header_conversions.get(sort_stat, sort_stat), + inplace=True) + + # Reorder columns that were scrambled by going through a dictionaries. + stats_names_csv = [name + suffix for name in stats_names for suffix in ci_suffixes] + #print("stats_names_csv:", stats_names_csv) + stats_names_latex = [latex_header_conversions.get(name, name) for name in stats_names] + #print("stats_names_latex:", stats_names_latex) + '''statistics_csv = statistics_csv[['name', "category", "type"] + stats_names_csv + ["ES", "ES_lower_bound", "ES_upper_bound"] ] + statistics_latex = statistics_latex[['ID', 'name'] + stats_names_latex + ["ES"]] ## Add error slope(ES)''' + + statistics_csv = statistics_csv[['file name', "category", "type"] + stats_names_csv + ["ES", "ES_lower_bound", "ES_upper_bound"] ] + statistics_latex = statistics_latex[['method name', 'file name', "category", "type"] + stats_names_latex + ["ES"]] ## Add error slope(ES) + + # Create CSV and JSON tables (correct LaTex syntax in column names). + os.makedirs(directory_path) + file_base_path = os.path.join(directory_path, file_base_name) + with open(file_base_path + '.csv', 'w') as f: + statistics_csv.to_csv(f) + '''with open(file_base_path + '.json', 'w') as f: + statistics_csv.to_json(f, orient='index')''' + + + # Create LaTex table. + latex_directory_path = os.path.join(directory_path, file_base_name + 'LaTex') + os.makedirs(latex_directory_path, exist_ok=True) + with open(os.path.join(latex_directory_path, file_base_name + '.tex'), 'w') as f: + f.write('\\documentclass{article}\n' + '\\usepackage[a4paper,margin=0.005in,tmargin=0.5in,lmargin=0.5in,rmargin=0.5in,landscape]{geometry}\n' + '\\usepackage{booktabs}\n' + '\\usepackage{longtable}\n' + '\\pagenumbering{gobble}\n' + '\\begin{document}\n' + '\\begin{center}\n' + '\\scriptsize\n') + statistics_latex.to_latex(f, column_format='|ccccccccc|', escape=False, index=False, longtable=True) + f.write('\end{center}\n' + '\nNotes\n\n' + '- RMSE: Root mean square error\n\n' + '- MAE: Mean absolute error\n\n' + '- ME: Mean error\n\n' + '- R2: R-squared, square of Pearson correlation coefficient\n\n' + '- m: slope of the line fit to predicted vs experimental logD values\n\n' + '- $\\tau$: Kendall rank correlation coefficient\n\n' + '- ES: error slope calculated from the QQ Plots of model uncertainty predictions\n\n' + '- Mean and 95\% confidence intervals of RMSE, MAE, ME, R2, and m were calculated by bootstrapping with 10000 samples.\n\n' + '- 95\% confidence intervals of ES were calculated by bootstrapping with 1000 samples.' + #'- Some logD predictions were submitted after the submission deadline to be used as a reference method.\n\n' + '\end{document}\n') + + # Violin plots by statistics across submissions. + plt.close('all') + fig, axes = plt.subplots(ncols=len(stats_names), figsize=(28, 10)) + for ax, stats_name in zip(axes, stats_names): + stats_name_latex = latex_header_conversions.get(stats_name, stats_name) + data = statistics_plot[statistics_plot.statistics == stats_name_latex] + # Plot ordering submission by statistics. + ordering_function = ordering_functions.get(stats_name, lambda x: x) + order = sorted(statistics_csv[stats_name].items(), key=lambda x: ordering_function(x[1])) + order = [method_name for method_name, value in order] + #sns.violinplot(x='value', y='ID', data=data, ax=ax, + sns.violinplot(x='value', y='ID', data=data, ax=ax, + order=order, palette='PuBuGn_r', linewidth=0.5, width=1) + ax.set_xlabel(stats_name_latex) + ax.set_ylabel('') + sns.set_style("whitegrid") + plt.tight_layout() + # plt.show() + plt.savefig(file_base_path + '_bootstrap_distributions.pdf') + + + + +def generate_performance_comparison_plots(statistics_filename, directory_path, ignore_refcalcs = False): + # Read statistics table + statistics_file_path = os.path.join(directory_path, statistics_filename) + df_statistics = pd.read_csv(statistics_file_path) + print("\n df_statistics \n", df_statistics) + + # RMSE comparison plot + barplot_with_CI_errorbars(df=df_statistics, x_label="method name", y_label="RMSE", y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", figsize=(28,10)) # figsize=(22,10) + plt.savefig(directory_path + "/RMSE_vs_method_plot.pdf") + + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label = "category", figsize=(28,10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + barplot_with_CI_errorbars_colored_by_label(df=df_statistics, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label = "type", figsize=(28,10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_colored_by_type.pdf") + + # MAE comparison plot + # Reorder based on MAE value + df_statistics_MAE = df_statistics.sort_values(by="MAE", inplace=False) + + barplot_with_CI_errorbars(df=df_statistics_MAE, x_label="method name", y_label="MAE", y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", figsize=(28,10)) + plt.savefig(directory_path + "/MAE_vs_method_plot.pdf") + + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_colored_by_type.pdf") + + + # Kendall's Tau comparison plot + # Reorder based on Kendall's Tau value + df_statistics_tau = df_statistics.sort_values(by="kendall_tau", inplace=False, ascending=False) + + barplot_with_CI_errorbars(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot.pdf") + + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="category", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_colored_by_method_category.pdf") + + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="type", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_colored_by_type.pdf") + + + + # R-squared comparison plot + # Reorder based on R-squared + df_statistics_R2 = df_statistics.sort_values(by="R2", inplace=False, ascending=False) + + barplot_with_CI_errorbars(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot.pdf") + + # R-squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_colored_by_method_category.pdf") + + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_colored_by_type.pdf") + + + + # Plot RMSE, MAE, Kendall's Tau, and R-squared comparison plots for each category separately + #category_list = ["Physical","Empirical", "Mixed", "Other"] + #category_list = ["Physical (MM) + QM+LEC", "Empirical", "Mixed", "Physical (QM)"] # Reassigned categories + category_list = ["Physical (MM) + QM+LEC", "Empirical", "Physical (QM)"] # Reassigned categories + + # New labels for file naming for reassigned categories + category_path_label_dict = {"Physical (MM) + QM+LEC": "Physical_MM_QM_LEC", + "Empirical": "Empirical", + #"Mixed": "Mixed", + "Physical (QM)": "Physical_QM"} + + + for category in category_list: + print("category: ",category) + #print("df_statistics.columns:\n", df_statistics.columns) + + # Take subsection of dataframe for each category + df_statistics_1category = df_statistics.loc[df_statistics['category'] == category] + df_statistics_MAE_1category = df_statistics_MAE.loc[df_statistics_MAE['category'] == category] + df_statistics_tau_1category = df_statistics_tau.loc[df_statistics_tau['category'] == category] + df_statistics_R2_1category = df_statistics_R2.loc[df_statistics_R2['category'] == category] + + # RMSE comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_1category, x_label="method name", y_label="RMSE", y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0.0,7.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # MAE comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_MAE_1category, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0.0, 7.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # Kendall's Tau comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_tau_1category, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.savefig(directory_path + "/kendalls_tau_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + # R-squared comparison plot for each category + barplot_with_CI_errorbars(df=df_statistics_R2_1category, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", figsize=(12, 10)) + plt.title("Method category: {}".format(category), fontdict={'fontsize': 22}) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_for_{}_category.pdf".format(category_path_label_dict[category])) + + + # Create plots for Physical methods (both MM and QM methods) + + df_statistics_MM = df_statistics.loc[df_statistics['category'] == "Physical (MM) + QM+LEC"] + df_statistics_QM = df_statistics.loc[df_statistics['category'] == "Physical (QM)"] + df_statistics_physical = pd.concat([df_statistics_MM, df_statistics_QM]) + + # RMSE comparison plot + # Reorder based on RMSE value + df_statistics_physical_RMSE = df_statistics_physical.sort_values(by="RMSE", inplace=False) + + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_RMSE, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_physical_methoods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # RMSE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_RMSE, x_label="method name", y_label="RMSE", + y_lower_label="RMSE_lower_bound", + y_upper_label="RMSE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/RMSE_vs_method_plot_physical_methods_colored_by_type.pdf") + + # MAE comparison plot + # Reorder based on MAE value + df_statistics_physical_MAE = df_statistics_physical.sort_values(by="MAE", inplace=False) + + # ME comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_physical_methoods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # MAE comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_MAE, x_label="method name", y_label="MAE", + y_lower_label="MAE_lower_bound", + y_upper_label="MAE_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0.0, 5.0) + plt.savefig(directory_path + "/MAE_vs_method_plot_physical_methods_colored_by_type.pdf") + + # Kendall's Tau comparison plot + # Reorder based on Tau value + df_statistics_physical_tau = df_statistics_physical.sort_values(by="kendall_tau", inplace=False, ascending=False) + + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="category", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendall_tau_vs_method_plot_physical_methoods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # Kendall's Tau comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_tau, x_label="method name", y_label="kendall_tau", + y_lower_label="kendall_tau_lower_bound", + y_upper_label="kendall_tau_upper_bound", color_label="type", + figsize=(28, 10)) + plt.savefig(directory_path + "/kendall_tau_vs_method_plot_physical_methods_colored_by_type.pdf") + + + # R-squared comparison plot + # Reorder based on R-squared value + df_statistics_physical_R2 = df_statistics_physical.sort_values(by="R2", inplace=False, ascending=False) + + # R-squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="category", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_physical_methoods_colored_by_method_category.pdf") + + # Do same graph with colorizing by reference calculation + if not ignore_refcalcs: + # R-Squared comparison plot with each category colored separately + barplot_with_CI_errorbars_colored_by_label(df=df_statistics_physical_R2, x_label="method name", y_label="R2", + y_lower_label="R2_lower_bound", + y_upper_label="R2_upper_bound", color_label="type", + figsize=(28, 10)) + plt.ylim(0, 1.0) + plt.savefig(directory_path + "/Rsquared_vs_method_plot_physical_methods_colored_by_type.pdf") + + + + +def generate_QQplots_for_model_uncertainty(input_file_name, directory_path): + + # Read QQplot data points from Pickle file + QQplot_dict_filename = os.path.join(directory_path, input_file_name) + with open(QQplot_dict_filename, 'rb') as handle: + QQplot_dict = pickle.load(handle) + + # Iterate through dictionary to create QQ Plot for each submission + for submission_ID, data in QQplot_dict.items(): + X, Y, slope = data + submission_ID = name_to_filename(submission_ID) + QQplot_output_filename = os.path.join(directory_path, "{}_QQ.pdf".format(submission_ID)) + makeQQplot(X, Y, slope, title=submission_ID, xLabel="Expected fraction within range", + yLabel="Fraction of predictions within range", fileName=QQplot_output_filename, + uncLabel='Model Unc.', leg=[0.05, 0.975, "upper left", 1], ax1=None) + # leg=[1.02, 0.98, 2, 1] + + # Replot first item of the dictionary to fix style + #submission_ID = list(QQplot_dict.keys())[0] # first submission ID + #print("Submission ID:", submission_ID) + #data = QQplot_dict.get(submission_ID) + #X, Y, slope = data + #makeQQplot(X, Y, slope, title=submission_ID, xLabel="Expected fraction within range", + # yLabel="Fraction of predictions within range", fileName=QQplot_output_filename, + # uncLabel='Model Unc.', leg=[0.05, 0.95, "upper left", 1], ax1=None) + + print("QQ Plots for model uncertainty generated.") + + +# ============================================================================= +# MAIN +# ============================================================================= + +if __name__ == '__main__': + + sns.set_style('whitegrid') + sns.set_context('paper') + + # Read experimental data. + with open(EXPERIMENTAL_DATA_FILE_PATH, 'r') as f: + # experimental_data = pd.read_json(f, orient='index') + names = ('Molecule ID', 'logD mean', 'logD SEM')#,'Assay Type', 'Experimental ID', 'Isomeric SMILES') + experimental_data = pd.read_csv(f, names=names, skiprows=1) + + # Convert numeric values to dtype float. + for col in experimental_data.columns[1:7]: + experimental_data[col] = pd.to_numeric(experimental_data[col], errors='coerce') + + + experimental_data.set_index("Molecule ID", inplace=True) + experimental_data["Molecule ID"] = experimental_data.index + print("Experimental data: \n", experimental_data) + + # Import user map. + with open(USER_MAP_FILE_PATH, 'r') as f: + user_map = pd.read_csv(f) + + # Configuration: statistics to compute. + stats_funcs = collections.OrderedDict([ + ('RMSE', rmse), + ('MAE', mae), + ('ME', me), + ('R2', r2), + ('m', slope), + ('kendall_tau', kendall_tau) + ]) + ordering_functions = { + 'ME': lambda x: abs(x), + 'R2': lambda x: -x, + 'm': lambda x: abs(1 - x), + 'kendall_tau': lambda x: -x + } + latex_header_conversions = { + 'R2': 'R$^2$', + 'RMSE': 'RMSE', + 'MAE': 'MAE', + 'ME': 'ME', + 'kendall_tau': '$\\tau$' + } + + # ========================================================================================== + # Analysis of ranked and non-ranked blind submissions WITH reference calculations + # ========================================================================================== + + # Load submissions data. + submissions_logD = load_submissions(LOGD_SUBMISSIONS_DIR_PATH, user_map) + print("done w/ submissions_logD") + + # Perform the analysis + output_directory_path='./analysis_outputs_all_submissions' + logD_submission_collection_file_path = '{}/logD_submission_collection.csv'.format(output_directory_path) + + collection_logD = logDSubmissionCollection(submissions_logD, + experimental_data, + output_directory_path, + logD_submission_collection_file_path, + ignore_refcalcs = False, + ranked_only = False, + allow_multiple = True) + + + # Generate plots and tables. + for collection in [collection_logD]: + #collection.generate_correlation_plots() + collection.generate_correlation_plots_with_SEM() + #collection.generate_molecules_plot() + #collection.generate_absolute_error_vs_molecule_ID_plot() + + """import shutil + + if os.path.isdir('{}/StatisticsTables'.format(output_directory_path)): + shutil.rmtree('{}/StatisticsTables'.format(output_directory_path)) + + + for submissions, type in zip([submissions_logD], ['logD']): + generate_statistics_tables(submissions, + stats_funcs, + directory_path=output_directory_path + '/StatisticsTables', + file_base_name='statistics', + sort_stat='RMSE', + ordering_functions=ordering_functions, + latex_header_conversions=latex_header_conversions, + ignore_refcalcs = False) + + # Generate RMSE, MAE, Kendall's Tau comparison plots. + statistics_directory_path = os.path.join(output_directory_path, "StatisticsTables") + generate_performance_comparison_plots(statistics_filename="statistics.csv", + directory_path=statistics_directory_path, + ignore_refcalcs = False) + + # Generate QQ-Plots for model uncertainty predictions + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + generate_QQplots_for_model_uncertainty(input_file_name="QQplot_dict.pickle", + directory_path=QQplot_directory_path)""" + + + #========================================================================================== + #========================================================================================== + # Analysis of ranked blind submissions only (no nonranked or ref) + #========================================================================================== + #========================================================================================== + + # Load submissions data. + ranked_submissions_logD = load_ranked_submissions(LOGD_SUBMISSIONS_DIR_PATH, user_map) + + # Perform the analysis + output_directory_path='./analysis_outputs_ranked_submissions' + logD_submission_collection_file_path = '{}/logD_submission_collection.csv'.format(output_directory_path) + + collection_logD = logDSubmissionCollection(ranked_submissions_logD, + experimental_data, + output_directory_path, + logD_submission_collection_file_path, + ignore_refcalcs = True, ranked_only = True, allow_multiple = False) + + #print("collection_logD: \n", collection_logD) + + # Generate plots and tables. + for collection in [collection_logD]: + #collection.generate_correlation_plots() + collection.generate_correlation_plots_with_SEM() + #collection.generate_molecules_plot() + #collection.generate_absolute_error_vs_molecule_ID_plot() + + + '''import shutil + + if os.path.isdir('{}/StatisticsTables'.format(output_directory_path)): + shutil.rmtree('{}/StatisticsTables'.format(output_directory_path)) + + + for ranked_submissions, type in zip([ranked_submissions_logD], ['logD']): + generate_statistics_tables(ranked_submissions, + stats_funcs, + directory_path = output_directory_path + '/StatisticsTables', + file_base_name = 'statistics', + sort_stat = 'RMSE', + ordering_functions = ordering_functions, + latex_header_conversions = latex_header_conversions, + ignore_refcalcs = True) + + # Generate RMSE, MAE, Kendall's Tau comparison plots. + statistics_directory_path = os.path.join(output_directory_path, "StatisticsTables") + generate_performance_comparison_plots(statistics_filename="statistics.csv", + directory_path=statistics_directory_path, + ignore_refcalcs = True) + + # Generate QQ-Plots for model uncertainty predictions + QQplot_directory_path = os.path.join(output_directory_path, "QQPlots") + generate_QQplots_for_model_uncertainty(input_file_name="QQplot_dict.pickle", + directory_path=QQplot_directory_path)''' diff --git a/physical_property/logD/logD_analysis2.py b/physical_property/logD/logD_analysis2.py new file mode 100644 index 00000000..0b99c5f7 --- /dev/null +++ b/physical_property/logD/logD_analysis2.py @@ -0,0 +1,733 @@ +#!/usr/bin/env python + +# ============================================================================= +# GLOBAL IMPORTS +# ============================================================================= +import os +import numpy as np +import pandas as pd +from logD_analysis import mae, rmse#, barplot_with_CI_errorbars +from logD_analysis import compute_bootstrap_statistics +import shutil +import seaborn as sns +from matplotlib import pyplot as plt +from matplotlib import cm +import joypy + + +# ============================================================================= +# PLOTTING FUNCTIONS +# ============================================================================= + +def barplot_with_CI_errorbars(df, x_label, y_label, y_lower_label, y_upper_label, figsize=False): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + figsize: tuple, size in inches. Default value is False. + + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + data = df # Pandas DataFrame + data.loc[:,delta_lower_yerr_label] = data.loc[:,y_label] - data.loc[:,y_lower_label] + data.loc[:,delta_upper_yerr_label] = data.loc[:,y_upper_label] - data.loc[:,y_label] + + # Color + current_palette = sns.color_palette() + sns_color = current_palette[2] + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 20 # 18 + plt.rcParams['xtick.labelsize'] = 16 #14 + plt.rcParams['ytick.labelsize'] = 18 #16 + plt.rcParams['legend.fontsize'] = 16 + plt.rcParams['legend.handlelength'] = 2 + plt.rcParams['figure.autolayout'] = True + #plt.tight_layout() + + # If figsize is specified + if figsize != False: + plt.figure(figsize=figsize) + + # Plot + x = range(len(data[y_label])) + y = data[y_label] + plt.bar(x, y) + plt.xticks(x, data[x_label], rotation=90)#, horizontalalignment='right') + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=sns_color, capsize=3, capthick=True) + plt.xlabel(x_label) + plt.ylabel(y_label) + +def barplot_with_CI_errorbars_and_4groups(df1, df2, df3, x_label, y_label, y_lower_label, y_upper_label,group_labels): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + group_labels: List of 4 method category labels + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.tight_layout() + #plt.figure(figsize=(8, 6)) + bar_width = 0.2 + + # Color + #current_palette = sns.color_palette("deep") + + # Zesty colorblind-friendly color palette + color0 = "#0F2080" #dark blue for Physical (MM) + QM+LEC + color1 = "#F5793A" #orange for Empirical + #color2 = "#A95AA1" #purple + color3 = "#85C0F9" #light blue for Physical (QM) + current_palette = [color0, color1, color3]#, color2, color3] + error_color = 'gray' + + + fig, ax = plt.subplots(figsize=(8, 6)) + + # Plot 1st group of data + data = df1 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(x, y, label = "Physical (MM) + QM+LEC", width=bar_width, color=current_palette[0]) + plt.xticks(x, data[x_label], rotation=90) + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 2nd group of data + data = df2 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df2.shape[0]) + + x = range(len(data[y_label])) + y = data[y_label] + + ax.bar(index + bar_width, y, label = "Empirical", width=bar_width, color=current_palette[1]) + plt.xticks(index + bar_width/2, data[x_label], rotation=90) + plt.errorbar(index + bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 3nd group of data + data = df3 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df3.shape[0]) + + x = range(len(data[y_label])) + y = data[y_label] + + ax.bar(index + 2*bar_width, y, label="Physical (QM)", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + plt.xlabel(x_label) + plt.ylabel(y_label) + + # create legend + from matplotlib.lines import Line2D + custom_lines = [Line2D([0], [0], color=current_palette[0], lw=5), #dark blue for Physical (MM) + QM+LEC + Line2D([0], [0], color=current_palette[1], lw=5), #orange for Empirical + Line2D([0], [0], color=current_palette[2], lw=5)] #light blue for Physical (QM) + + ax.legend(custom_lines, group_labels) + + +def barplot_with_CI_errorbars_and_4groups_ranked(df1, df3, x_label, y_label, y_lower_label, y_upper_label,group_labels): + """Creates bar plot of a given dataframe with asymmetric error bars for y axis. + Args: + df: Pandas Dataframe that should have columns with columnnames specified in other arguments. + x_label: str, column name of x axis categories + y_label: str, column name of y axis values + y_lower_label: str, column name of lower error values of y axis + y_upper_label: str, column name of upper error values of y axis + group_labels: List of 4 method category labels + """ + # Column names for new columns for delta y_err which is calculated as | y_err - y | + delta_lower_yerr_label = "$\Delta$" + y_lower_label + delta_upper_yerr_label = "$\Delta$" + y_upper_label + + # Plot style + plt.close() + plt.style.use(["seaborn-talk", "seaborn-whitegrid"]) + plt.rcParams['axes.labelsize'] = 18 + plt.rcParams['xtick.labelsize'] = 14 + plt.rcParams['ytick.labelsize'] = 16 + plt.tight_layout() + #plt.figure(figsize=(8, 6)) + bar_width = 0.2 + + # Zesty colorblind-friendly color palette + color0 = "#0F2080" #dark blue for Physical (MM) + QM+LEC + color1 = "#F5793A" #orange for Empirical + #color2 = "#A95AA1" #purple + color3 = "#85C0F9" #light blue for Physical (QM) + current_palette = [color0, color1, color3]#, color2, color3] + error_color = 'gray' + + fig, ax = plt.subplots(figsize=(8, 6)) + + # Plot 1st group of data + data = df1 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + + x = range(len(data[y_label])) + y = data[y_label] + ax.bar(x, y, label = "Physical (MM) + QM+LEC", width=bar_width, color=current_palette[0]) + plt.xticks(x, data[x_label], rotation=90) + plt.errorbar(x, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + # Plot 3nd group of data + data = df3 # Pandas DataFrame + data[delta_lower_yerr_label] = data[y_label] - data[y_lower_label] + data[delta_upper_yerr_label] = data[y_upper_label] - data[y_label] + index = np.arange(df3.shape[0]) + + x = range(len(data[y_label])) + y = data[y_label] + # plt.bar(x, y) + ax.bar(index + 2*bar_width, y, label="Physical (QM)", width=bar_width, color=current_palette[2]) + plt.xticks(index + bar_width + bar_width / 2, data[x_label], rotation=90) + plt.errorbar(index + 2*bar_width, y, yerr=(data[delta_lower_yerr_label], data[delta_upper_yerr_label]), + fmt="none", ecolor=error_color, capsize=2, capthick=True, elinewidth=1) + + + + + plt.xlabel(x_label) + plt.ylabel(y_label) + + # create legend + from matplotlib.lines import Line2D + custom_lines = [Line2D([0], [0], color=current_palette[0], lw=5), + #Line2D([0], [0], color=current_palette[1], lw=5), + Line2D([0], [0], color=current_palette[2], lw=5)] + #, Line2D([0], [0], color=current_palette[3], lw=5)] + ax.legend(custom_lines, group_labels) + + +def ridge_plot(df, by, column, figsize, colormap): + plt.rcParams['axes.labelsize'] = 14 + plt.rcParams['xtick.labelsize'] = 14 + plt.tight_layout() + + # Make ridge plot + fig, axes = joypy.joyplot(data=df, by=by, column=column, figsize=figsize, colormap=colormap, linewidth=1) + # Add x-axis label + axes[-1].set_xlabel(column) + +def ridge_plot_wo_overlap(df, by, column, figsize, colormap): + plt.rcParams['axes.labelsize'] = 14 + plt.rcParams['xtick.labelsize'] = 14 + plt.tight_layout() + + # Make ridge plot + fig, axes = joypy.joyplot(data=df, by=by, column=column, figsize=figsize, colormap=colormap, linewidth=1, overlap=0) + # Add x-axis label + axes[-1].set_xlabel(column) + + +# ============================================================================= +# CONSTANTS +# ============================================================================= + +# Paths to input data. +LOGD_COLLECTION_PATH_RANKED_SUBMISSIONS = './analysis_outputs_ranked_submissions/logD_submission_collection.csv' +LOGD_COLLECTION_PATH_ALL_SUBMISSIONS = './analysis_outputs_all_submissions/logD_submission_collection.csv' + +# ============================================================================= +# UTILITY FUNCTIONS +# ============================================================================= + +def read_collection_file(collection_file_path): + """ + Function to read SAMPL6 collection CSV file that was created by logDubmissionCollection. + :param collection_file_path + :return: Pandas DataFrame + """ + + # Check if submission collection file already exists. + if os.path.isfile(collection_file_path): + print("Analysis will be done using the existing collection file: {}".format(collection_file_path)) + + collection_df = pd.read_csv(collection_file_path, index_col=0) + print("\n SubmissionCollection: \n") + print(collection_df) + else: + raise Exception("Collection file doesn't exist: {}".format(collection_file_path)) + + return collection_df + + +def calc_MAE_for_molecules_across_all_predictions(collection_df, directory_path, file_base_name): + """ + Calculate mean absolute error for each molecule for all methods. + :param collection_df: Pandas DataFrame of submission collection. + :param directory_path: Directory for outputs + :param file_base_name: Filename for outputs + :return: + """ + # Create list of Molecule IDs + mol_IDs= list(set(collection_df["Molecule ID"].values)) # List of unique IDs + mol_IDs.sort() + print(mol_IDs) + + # List for keeping records of stats values for each molecule + molecular_statistics = [] + + # Slice the dataframe for each molecule to calculate MAE + for mol_ID in mol_IDs: + collection_df_mol_slice = collection_df.loc[collection_df["Molecule ID"] == mol_ID] + + # 2D array of matched calculated and experimental pKas + data = collection_df_mol_slice[["logD (calc)", "logD (exp)"]].values + + # Calculate mean absolute error + #MAE_value = mae(data) + + # Calculate MAE and RMSE and their 95% confidence intervals + bootstrap_statistics = compute_bootstrap_statistics(samples=data, stats_funcs=[mae, rmse], percentile=0.95, + n_bootstrap_samples=10000) + MAE = bootstrap_statistics[0][0] + MAE_lower_CI = bootstrap_statistics[0][1][0] + MAE_upper_CI = bootstrap_statistics[0][1][1] + print("{} MAE: {} [{}, {}]".format(mol_ID, MAE, MAE_lower_CI, MAE_upper_CI)) + + RMSE = bootstrap_statistics[1][0] + RMSE_lower_CI = bootstrap_statistics[1][1][0] + RMSE_upper_CI = bootstrap_statistics[1][1][1] + print("{} RMSE: {} [{}, {}]\n".format(mol_ID, RMSE, RMSE_lower_CI, RMSE_upper_CI)) + + # Record in CSV file + molecular_statistics.append({'Molecule ID': mol_ID, 'MAE': MAE, 'MAE_lower_CI': MAE_lower_CI, + 'MAE_upper_CI': MAE_upper_CI, 'RMSE': RMSE, 'RMSE_lower_CI': RMSE_lower_CI, + 'RMSE_upper_CI': RMSE_upper_CI}) + + + + # Convert dictionary to Dataframe to create tables/plots easily and save as CSV. + molecular_statistics_df = pd.DataFrame(molecular_statistics) + #molecular_statistics_df.set_index('Molecule ID', inplace=True) + # Sort values by MAE values + molecular_statistics_df.sort_values(by='MAE', inplace=True) + # Create CSV + os.makedirs(directory_path) + file_base_path = os.path.join(directory_path, file_base_name) + with open(file_base_path + '.csv', 'w') as f: + molecular_statistics_df.to_csv(f) + + # Plot MAE and RMSE of each molecule across predictions as a bar plot + barplot_with_CI_errorbars(df = molecular_statistics_df, x_label = 'Molecule ID', + y_label = 'MAE', y_lower_label = 'MAE_lower_CI', y_upper_label = 'MAE_upper_CI', + figsize=(7.5, 6)) + plt.savefig(directory_path + "/MAE_vs_molecule_ID_plot.pdf") + + barplot_with_CI_errorbars(df=molecular_statistics_df, x_label = 'Molecule ID', + y_label = 'RMSE', y_lower_label = 'RMSE_lower_CI', y_upper_label = 'RMSE_upper_CI', + figsize=(7.5, 6)) + plt.savefig(directory_path + "/RMSE_vs_molecule_ID_plot.pdf") + + +def select_subsection_of_collection(collection_df, method_group): + """ + Returns a dataframe which is the subset of rows of collecion dataframe that match the requested method category + :param collection_df: Pandas DataFrame of submission collection. + :param method_group: String that specifies with method group is requested. "Physical","Empirical","Mixed" or "Other" + :return: Pandas DataFrame of subsection of submission collection. + """ + + print("Looking for submissions of selected method group...") + print("Method group: {}".format(method_group)) + + print("Collection_df:\n",collection_df) + + # Filter collection dataframe based on method category + #collection_df_of_selected_method_group = collection_df.loc[collection_df["reassigned category"] == method_group] + collection_df_of_selected_method_group = collection_df.loc[collection_df["category"] == method_group] + collection_df_of_selected_method_group = collection_df_of_selected_method_group.reset_index(drop=True) + print("collection_df_of_selected_method_group: \n {}".format(collection_df_of_selected_method_group)) + + return collection_df_of_selected_method_group + + +def calc_MAE_for_molecules_across_selected_predictions(collection_df, selected_method_group, directory_path, file_base_name): + """ + Calculates mean absolute error for each molecule across prediction method category + :param collection_df: Pandas DataFrame of submission collection. + + :param selected_method_group: "Physical", "Empirical", "Mixed", or "Other" + :param directory_path: Directory path for outputs + :param file_base_name: Output file name + :return: + """ + + # Create subsection of collection dataframe for selected methods + print("selected_method_group...", selected_method_group) + print("collection_df...", collection_df) + collection_df_subset = select_subsection_of_collection(collection_df=collection_df, method_group=selected_method_group) + + # category_path_label_dict ={ "Physical (MM) + QM+LEC": "Physical_MM", + # "Empirical": "Empirical", + # "Mixed": "Mixed", + # "Physical (QM)": "Physical_QM"} + + subset_directory_path = os.path.join(directory_path, category_path_label_dict[selected_method_group]) + print("calc_MAE_for_molecules_across_all_predictions STARTING") + # Calculate MAE using subsection of collection database + calc_MAE_for_molecules_across_all_predictions(collection_df=collection_df_subset, directory_path=subset_directory_path, file_base_name=file_base_name) + + +#def create_comparison_plot_of_molecular_MAE_of_method_categories(directory_path, group1, group2, group3, group4, file_base_name): +def create_comparison_plot_of_molecular_MAE_of_method_categories(directory_path, group1, group2, group3, file_base_name): + label1 = category_path_label_dict[group1] + label2 = category_path_label_dict[group2] + label3 = category_path_label_dict[group3] + #label4 = category_path_label_dict[group4] + + # Read molecular_error_statistics table + df_gr1 = pd.read_csv(directory_path + "/" + label1 + "/molecular_error_statistics_for_{}_methods.csv".format(label1)) + df_gr2 = pd.read_csv(directory_path + "/" + label2 + "/molecular_error_statistics_for_{}_methods.csv".format(label2)) + df_gr3 = pd.read_csv(directory_path + "/" + label3 + "/molecular_error_statistics_for_{}_methods.csv".format(label3)) + #df_gr4 = pd.read_csv(directory_path + "/" + label4 + "/molecular_error_statistics_for_{}_methods.csv".format(label4)) + + + # Reorder dataframes based on the order of molecular MAE statistic of Physical QM methods group + ordered_molecule_list = list(df_gr3["Molecule ID"]) + print("ordered_molecule_list: \n", ordered_molecule_list) + + df_gr2_reordered = df_gr2.set_index("Molecule ID") + df_gr2_reordered = df_gr2_reordered.reindex(index=df_gr3['Molecule ID']) #Reset row order based on index of df_gr3 + df_gr2_reordered = df_gr2_reordered.reset_index() + + df_gr1_reordered = df_gr1.set_index("Molecule ID") + df_gr1_reordered = df_gr1_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr3 + df_gr1_reordered = df_gr1_reordered.reset_index() + + + # Plot + # Molecular labels will be taken from 1st dataframe, so the second dataframe should have the same molecule ID order. + barplot_with_CI_errorbars_and_4groups(df1=df_gr1_reordered, df2=df_gr2_reordered, df3=df_gr3, #df4=df_gr4_reordered, + x_label="Molecule ID", y_label="MAE", + y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI", + group_labels=[group1, group2, group3])#, group4]) + plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + ".pdf") + print("completed barplot_with_CI_errorbars_and_4groups") + + # Same comparison plot with only QM results (only for presentation effects) + #barplot_with_CI_errorbars_and_1st_of_2groups(df1=df_qm, df2=df_empirical_reordered, x_label="Molecule ID", y_label="MAE", + # y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI") + #plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + "_only_QM.pdf") + +def create_comparison_plot_of_molecular_MAE_of_method_categories_ranked(directory_path, group1, group3, file_base_name): + label1 = category_path_label_dict[group1] + #label2 = category_path_label_dict[group2] + label3 = category_path_label_dict[group3] + #label4 = category_path_label_dict[group4] + + # Read molecular_error_statistics table + df_gr1 = pd.read_csv(directory_path + "/" + label1 + "/molecular_error_statistics_for_{}_methods.csv".format(label1)) + #df_gr2 = pd.read_csv(directory_path + "/" + label2 + "/molecular_error_statistics_for_{}_methods.csv".format(label2)) + df_gr3 = pd.read_csv(directory_path + "/" + label3 + "/molecular_error_statistics_for_{}_methods.csv".format(label3)) + #df_gr4 = pd.read_csv(directory_path + "/" + label4 + "/molecular_error_statistics_for_{}_methods.csv".format(label4)) + + + # Reorder dataframes based on the order of molecular MAE statistic of first group (Physical QM methods) + ordered_molecule_list = list(df_gr3["Molecule ID"]) + print("ordered_molecule_list: \n", ordered_molecule_list) + + #df_gr2_reordered = df_gr2.set_index("Molecule ID") + #df_gr2_reordered = df_gr2_reordered.reindex(index=df_gr1['Molecule ID']) #Reset row order based on index of df_gr1 + #df_gr2_reordered = df_gr2_reordered.reset_index() + + df_gr1_reordered = df_gr1.set_index("Molecule ID") + df_gr1_reordered = df_gr1_reordered.reindex(index=df_gr3['Molecule ID']) # Reset row order based on index of df_gr1 + df_gr1_reordered = df_gr1_reordered.reset_index() + print("df_gr1_reordered",df_gr1_reordered) + + # Plot + # Molecular labels will be taken from 1st dataframe, so the second dataframe should have the same molecule ID order. + barplot_with_CI_errorbars_and_4groups_ranked(df1=df_gr1_reordered, + #df2=df_gr2_reordered, + df3=df_gr3, #df4=df_gr4_reordered, + x_label="Molecule ID", y_label="MAE", + y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI", + group_labels=[group1, + #group2, + group3])#, group4]) + plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + ".pdf") + print("completed barplot_with_CI_errorbars_and_4groups") + + # Same comparison plot with only QM results (only for presentation effects) + #barplot_with_CI_errorbars_and_1st_of_2groups(df1=df_qm, df2=df_empirical_reordered, x_label="Molecule ID", y_label="MAE", + # y_lower_label="MAE_lower_CI", y_upper_label="MAE_upper_CI") + #plt.savefig(molecular_statistics_directory_path + "/" + file_base_name + "_only_QM.pdf") + + +def create_molecular_error_distribution_plots(collection_df, directory_path, file_base_name):#, subset_of_method_ids): + + # Ridge plot using all predictions + ridge_plot(df=collection_df, by = "Molecule ID", column = "$\Delta$logD error (calc - exp)", figsize=(4, 6), colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +"_all_methods.pdf") + + # Ridge plot using only consistently well-performing methods + '''collection_subset_df = collection_df[collection_df["method_name"].isin(subset_of_method_ids)].reset_index(drop=True) + ridge_plot(df=collection_subset_df, by = "Molecule ID", column = "$\Delta$logD error (calc - exp)", figsize=(4, 6), + colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +"_well_performing_methods.pdf")''' + + +def create_category_error_distribution_plots(collection_df, directory_path, file_base_name): + + # Ridge plot using all predictions + '''ridge_plot_wo_overlap(df=collection_df, by = "reassigned category", column = "$\Delta$logD error (calc - exp)", figsize=(4, 4), + colormap=cm.plasma)''' + ridge_plot_wo_overlap(df=collection_df, by = "category", column = "$\Delta$logD error (calc - exp)", figsize=(4, 4), + colormap=cm.plasma) + plt.savefig(directory_path + "/" + file_base_name +".pdf") + + +def calculate_summary_statistics_of_top_methods_of_each_category(statistics_df, categories, top, directory_path, file_base_name): + df_stat = pd.read_csv(statistics_df) + + data = [] + + for category in categories: + #print(category) + #is_cat = (df_stat["category"] == "Physical") + #print(is_cat) + #df_cat = df_stat[df_stat["reassigned_category"] == category].reset_index(drop=False) + df_cat = df_stat[df_stat["category"] == category].reset_index(drop=False) + + # Already ordered by RMSE + df_cat_top = df_cat.head(top).reset_index(drop=False) + RMSE_mean = df_cat_top["RMSE"].mean() + RMSE_std = df_cat_top["RMSE"].values.std(ddof=1) + + # Reorder by increasing MEA + df_cat = df_cat.sort_values(by="MAE", inplace=False, ascending=True) + df_cat_top = df_cat.head(top).reset_index(drop=False) + MAE_mean = df_cat_top["MAE"].mean() + MAE_std = df_cat_top["MAE"].values.std(ddof=1) + + # Reorder by decreasing Kendall's Tau + df_cat = df_cat.sort_values(by="kendall_tau", inplace=False, ascending=False) + df_cat_top = df_cat.head(top).reset_index(drop=False) + tau_mean = df_cat_top["kendall_tau"].mean() + tau_std = df_cat_top["kendall_tau"].values.std(ddof=1) + + # Reorder by decreasing R-Squared + df_cat = df_cat.sort_values(by="R2", inplace=False, ascending=False) + df_cat_top = df_cat.head(top).reset_index(drop=False) + r2_mean = df_cat_top["R2"].mean() + r2_std = df_cat_top["R2"].values.std(ddof=1) + + # Number of predictions, in case less than 10 + num_predictions =df_cat_top.shape[0] + + data.append({ + 'category': category, + 'RMSE_mean': RMSE_mean, + 'RMSE_std': RMSE_std, + 'MAE_mean': MAE_mean, + 'MAE_std': MAE_std, + 'kendall_tau_mean': tau_mean, + 'kendall_tau_std': tau_std, + 'R2_mean': r2_mean, + 'R2_std': r2_std, + 'N': num_predictions + }) + + # Transform into Pandas DataFrame. + df_stat_summary = pd.DataFrame(data=data) + file_name = os.path.join(directory_path, file_base_name) + df_stat_summary.to_csv(file_name, index=False) + + + +# ============================================================================= +# MAIN +# ============================================================================= + +if __name__ == '__main__': + + # ========================================================================================== + # Analysis of ALL submissions (ranked and nonranked), including reference calculations + # ========================================================================================== + + # Read collection file + collection_data = read_collection_file(collection_file_path = LOGD_COLLECTION_PATH_ALL_SUBMISSIONS) + + # Create new directory to store molecular statistics + output_directory_path = './analysis_outputs_all_submissions' + analysis_directory_name = 'MolecularStatisticsTables' + + if os.path.isdir('{}/{}'.format(output_directory_path, analysis_directory_name)): + shutil.rmtree('{}/{}'.format(output_directory_path, analysis_directory_name)) + + # Calculate MAE of each molecule across all predictions methods + molecular_statistics_directory_path = os.path.join(output_directory_path, "MolecularStatisticsTables") + calc_MAE_for_molecules_across_all_predictions(collection_df = collection_data, + directory_path = molecular_statistics_directory_path, + file_base_name = "molecular_error_statistics") + + + # Calculate MAE for each molecule across each method category + #list_of_method_categories = ["Physical (MM) + QM+LEC", "Empirical", "Mixed", "Physical (QM)"] + list_of_method_categories = ["Physical (MM) + QM+LEC", "Empirical", "Physical (QM)"] + # New labels for file naming for reassigned categories + category_path_label_dict = {"Physical (MM) + QM+LEC": "Physical_MM_QM_LEC", + "Empirical": "Empirical", + #"Mixed": "Mixed", + "Physical (QM)": "Physical_QM"} + + for category in list_of_method_categories: + category_file_label = category_path_label_dict[category] + calc_MAE_for_molecules_across_selected_predictions(collection_df=collection_data, + selected_method_group=category, + directory_path=molecular_statistics_directory_path, + file_base_name="molecular_error_statistics_for_{}_methods".format(category_file_label)) + + # Create comparison plot of MAE for each molecule across method categories + create_comparison_plot_of_molecular_MAE_of_method_categories(directory_path=molecular_statistics_directory_path, + group1='Physical (MM) + QM+LEC', + group2='Empirical', + #group3="Mixed", + group3='Physical (QM)', + file_base_name="molecular_MAE_comparison_between_method_categories") + + # Create molecular error distribution ridge plots for all methods and a subset of well performing methods (found consistently in the top 15 across 4 metrics) + #well_performing_method_ids = ["4K631", "006AC", "43M66", "5W956", "847L9", "HC032", "7RS67", "D4406"] + #well_performing_method_ids = ["Chemprop", "ClassicalGSG DB3", "COSMO-RS", + # "MD (CGenFF/TIP3P)", "TFE MLR"] + create_molecular_error_distribution_plots(collection_df=collection_data, + directory_path=molecular_statistics_directory_path, + #subset_of_method_ids=well_performing_method_ids, + file_base_name="molecular_error_distribution_ridge_plot") + + # Compare method categories + + # Calculate error distribution plots for each method category + category_comparison_directory_path = os.path.join(output_directory_path, "StatisticsTables/MethodCategoryComparison") + os.makedirs(category_comparison_directory_path, exist_ok=True) + create_category_error_distribution_plots(collection_df=collection_data, + directory_path=category_comparison_directory_path, + file_base_name="error_distribution_of_method_categories_ridge_plot") + + '''# Calculate mean and standard deviation of performance statistics of top 10 methods of each category. + statistics_table_path = os.path.join(output_directory_path, "StatisticsTables/statistics.csv") + calculate_summary_statistics_of_top_methods_of_each_category( + statistics_df= statistics_table_path, categories=list_of_method_categories, top=10, + directory_path=category_comparison_directory_path, file_base_name="summary_statistics_of_method_categories_top10.csv" + ) + + # Calculate mean and standard deviation of performance statistics of top 5 methods of each category. + statistics_table_path = os.path.join(output_directory_path, "StatisticsTables/statistics.csv") + calculate_summary_statistics_of_top_methods_of_each_category( + statistics_df= statistics_table_path, categories=list_of_method_categories, top=5, + directory_path=category_comparison_directory_path, file_base_name="summary_statistics_of_method_categories_top5.csv" + )''' + + # ========================================================================================== + # Repeat analysis for just ranked submissions + # ========================================================================================== + + # Read collection file + collection_data = read_collection_file(collection_file_path = LOGD_COLLECTION_PATH_RANKED_SUBMISSIONS) + + # Create new directory to store molecular statistics + output_directory_path = './analysis_outputs_ranked_submissions' + analysis_directory_name = 'MolecularStatisticsTables' + + if os.path.isdir('{}/{}'.format(output_directory_path, analysis_directory_name)): + shutil.rmtree('{}/{}'.format(output_directory_path, analysis_directory_name)) + + # Calculate MAE of each molecule across all predictions methods + molecular_statistics_directory_path = os.path.join(output_directory_path, "MolecularStatisticsTables") + calc_MAE_for_molecules_across_all_predictions(collection_df = collection_data, + directory_path = molecular_statistics_directory_path, + file_base_name = "molecular_error_statistics") + + + # Calculate MAE for each molecule across each method category + #list_of_method_categories = ["Physical (MM) + QM+LEC", "Empirical", "Mixed", "Physical (QM)"] + #list_of_method_categories = ["Physical (MM) + QM+LEC", "Empirical", "Physical (QM)"] + list_of_method_categories = ["Physical (MM) + QM+LEC", "Physical (QM)"] + # New labels for file naming for reassigned categories + category_path_label_dict = {"Physical (MM) + QM+LEC": "Physical_MM", + #"Empirical": "Empirical", + #"Mixed": "Mixed", + "Physical (QM)": "Physical_QM"} + + for category in list_of_method_categories: + category_file_label = category_path_label_dict[category] + calc_MAE_for_molecules_across_selected_predictions(collection_df=collection_data, + selected_method_group=category, + directory_path=molecular_statistics_directory_path, + file_base_name="molecular_error_statistics_for_{}_methods".format(category_file_label)) + + # Create comparison plot of MAE for each molecule across all method categories + create_comparison_plot_of_molecular_MAE_of_method_categories_ranked(directory_path=molecular_statistics_directory_path, + group1='Physical (MM) + QM+LEC', + #group2='Empirical', + #group3="Mixed", + group3='Physical (QM)', + file_base_name="molecular_MAE_comparison_between_method_categories") + + # Create molecular error distribution ridge plots for all methods and a subset of well performing methods + # (found consistently in the top X across 4 metrics) + #well_performing_method_ids = ["Chemprop", "ClassicalGSG DB3", "COSMO-RS", + # "MD (CGenFF/TIP3P)", "TFE MLR"] + create_molecular_error_distribution_plots(collection_df=collection_data, + directory_path=molecular_statistics_directory_path, + #subset_of_method_ids=well_performing_method_ids, + file_base_name="molecular_error_distribution_ridge_plot") + + # Compare method categories + + # Calculate error distribution plots for each method category + + category_comparison_directory_path = os.path.join(output_directory_path, "StatisticsTables/MethodCategoryComparison") + os.makedirs(category_comparison_directory_path, exist_ok=True) + create_category_error_distribution_plots(collection_df=collection_data, + directory_path=category_comparison_directory_path, + file_base_name="error_distribution_of_method_categories_ridge_plot") + + '''# Calculate mean and standard deviation of performance statistics of top 10 methods of each category. + statistics_table_path = os.path.join(output_directory_path, "StatisticsTables/statistics.csv") + calculate_summary_statistics_of_top_methods_of_each_category( + statistics_df= statistics_table_path, categories=list_of_method_categories, top=10, + directory_path=category_comparison_directory_path, file_base_name="summary_statistics_of_method_categories_top10.csv" + ) + + # Calculate mean and standard deviation of performance statistics of top 5 methods of each category. + statistics_table_path = os.path.join(output_directory_path, "StatisticsTables/statistics.csv") + calculate_summary_statistics_of_top_methods_of_each_category( + statistics_df= statistics_table_path, categories=list_of_method_categories, top=5, + directory_path=category_comparison_directory_path, file_base_name="summary_statistics_of_method_categories_top5.csv" + )''' diff --git a/physical_property/logD/logD_experimental_values.csv b/physical_property/logD/logD_experimental_values.csv new file mode 100644 index 00000000..ede83f54 --- /dev/null +++ b/physical_property/logD/logD_experimental_values.csv @@ -0,0 +1,23 @@ +Molecule ID,logD mean,logD SEM +SM25,-0.09,0.01 +SM26,-0.87,0.06 +SM27,1.56,0.11 +SM28,1.18,0.08 +SM29,1.61,0.11 +SM30,2.76,0.19 +SM31,1.96,0.14 +SM32,2.44,0.17 +SM33,2.96,0.21 +SM34,2.83,0.2 +SM35,0.87,0.06 +SM36,0.76,0.05 +SM37,1.45,0.1 +SM38,1.03,0.07 +SM39,1.89,0.13 +SM40,1.82,0.13 +SM41,-0.42,0.03 +SM42,0.99,0.07 +SM43,0.42,0.03 +SM44,0.06,0 +SM45,1.06,0.07 +SM46,0.69,0.05