multi-ltmle

Simulation code for Longitudinal TMLE (LTMLE) with multi-valued treatments.

Please cite the following paper if you use this repo:

@article{doi:10.1002/sim.10003,
  title={Targeted learning in observational studies with multi-valued treatments: An evaluation of antipsychotic drug treatment safety.},
  author={Poulos, Jason and Horvitz-Lennon, Marcela and Zelevinsky, Katya and Cristea-Platon, Tudor and Huijskens, Thomas and Tyagi, Pooja and Yan, Jiaju and Diaz, Jordi and Normand, Sharon-Lise},
  journal={Statistics in Medicine},
  year={2024},
   publisher={Wiley Online Library}
}

Prerequsites

R (tested on 4.0.1 using a 6.2.0 GCC compiler)

Required R packages located in package_list.R
The result of sessionInfo() is in session_info.txt

For use of 'tmle-lstm' as an estimator: R (4.3.3), python3 (3.10.12), and TensorFlow (2.18.0)

Instructions for installing Tensorflow on Linux (documentation here and here)

# Create virtual environment within directory
cd multi-ltmle

# Manually set up a virtual environment
python3.10 -m ensurepip --upgrade  # Ensure pip is installed/upgraded
python3.10 -m pip install virtualenv  # Install virtualenv if not already available
virtualenv myenv  # Create a virtual environment named 'myenv'
source myenv/bin/activate  # Activate the new environment

# Install TensorFlow
python3 -m pip install --upgrade pip  # Upgrade pip to the latest version
# For GPU users:
pip install tensorflow[and-cuda]
# For CPU users:
pip install tensorflow

# Verify the GPU installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

# Verify the CPU installation:
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

The following Python packages are required: numpy (tested on 1.19.5), pandas (1.1.5), and wandb (0.15.12)

python3 -m pip install pandas
python3 -m pip install wandb

Additional R package in package_list.R are required. Make sure to install these packages in the virtual environment where python3 and Tensorflow are installed.
Ensure that the path in use_python() in simulation.R corresponds to the virtual environment python path, which you can find using

which python

Test with R and Reticulate
Load the reticulate library and bind to the shared Python:

library(reticulate)
use_python("./myenv/bin/python", required = TRUE)
py_config()

Important Notes
- If the shared library is still not found, adjust the LD_LIBRARY_PATH as required:
```
export LD_LIBRARY_PATH=/path/to/libpython3.10.so:$LD_LIBRARY_PATH
```
- For consistent performance, ensure all dependencies align with the Python version used in your virtual environment.

Simulates longitudinal data for comparing the performance of multinomial TMLE with LSTM-based approaches.
- Key parameters:
  - estimator: Choose between 'tmle' or 'tmle-lstm'.
  - treatment.rule: Specify "static", "dynamic", "stochastic", or "all".
  - gbound/ybound: Bounds for propensity scores and initial predictions, respectively.
  - J: Number of treatments (J=6).
  - n: Sample size (default is 12500).
  - t.end: Number of time periods (must be between 4 and 36).
  - R: Number of simulation runs (default is 325).
  - target.gwt: Logical flag to adjust weights in the clever covariate (default is TRUE).
  - use.SL: Logical flag to enable Super Learner (default is TRUE).
  - scale.continuous: Logical flag for scaling continuous variables.
  - n.folds: Number of cross-validation folds for Super Learner (default is 5).

long_sim_plots.R

Aggregates and visualizes the output of simulation.R. Includes:
- Counterfactual risk estimations.
- Bias, coverage, and confidence interval widths.

Required File Modifications

Below is a list of files that require user modifications to match their environment or particular settings. Please ensure to make these changes before running the code.

`simulation.R`

Update the Python path used by use_python(). Modify it to point to the Python interpreter you wish to use. For example:
```
use_python("./myenv/bin/python", required=TRUE)
```

`train_lstm.py`

If using a GPU, configure the GPU settings by updating relevant CUDA paths to match your environment. The default settings are:

cuda_base = "/n/app/cuda/12.1-gcc-9.2.0"
os.environ.update({
    'CUDA_HOME': cuda_base,
    'CUDA_ROOT': cuda_base,
    'CUDA_PATH': cuda_base,
    'CUDNN_PATH': f"{cuda_base}/lib64/libcudnn.so",
    'LD_LIBRARY_PATH': f"{cuda_base}/lib64:{cuda_base}/extras/CUPTI/lib64:{os.environ.get('LD_LIBRARY_PATH', '')}",
    'PATH': f"{cuda_base}/bin:{os.environ.get('PATH', '')}",
    'CUDA_DEVICE_ORDER': 'PCI_BUS_ID',
    'CUDA_VISIBLE_DEVICES': '0,1',
    'TF_FORCE_GPU_ALLOW_GROWTH': 'true',
    'TF_XLA_FLAGS': '--tf_xla_enable_xla_devices',
    'XLA_FLAGS': f'--xla_gpu_cuda_data_dir={cuda_base}',
    'TF_GPU_THREAD_MODE': 'gpu_private',
    'TF_GPU_THREAD_COUNT': '2',
    'TF_CPP_MIN_LOG_LEVEL': '3'
})

If no GPU is used, the script will automatically fall back to CPU-based execution. No additional changes are required in this case.

Instructions

1. Install Required R Packages and Set Up Environment

Run the following command to install the required R packages:
```
Rscript package_list.R
```
Follow the Python installation instructions provided in the Prerequisites section.
Ensure you start R within the virtual environment where Python 3 and TensorFlow are installed.

2. Run Simulations

To execute simulations, use the following command:

Rscript simulation.R [arg1] [arg2] [arg3] [arg4]

Arguments:

[arg1]: Specifies the estimator. Options:
- "tmle": Targeted Maximum Likelihood Estimation.
- "tmle-lstm": Targeted Maximum Likelihood Estimation with Long Short-Term Memory.
[arg2]: A numeric value indicating the treatment rule:
- 1: Use all treatment rules.
[arg3]: Logical flag to indicate whether super learner estimation should be used:
- "TRUE" or "FALSE".
[arg4]: Logical flag for enabling MPI parallel programming:
- "TRUE" or "FALSE".

Examples:

Using the "tmle" estimator with super learner enabled and no MPI:
```
Rscript simulation.R 'tmle' 1 'TRUE' 'FALSE'
```
Using the "tmle-lstm" estimator without super learner or MPI:
```
Rscript simulation.R 'tmle-lstm' 1 'FALSE' 'FALSE'
```

3. Plot Simulation Results

To visualize simulation results, use the following command:

Rscript long_sim_plots.R [arg1]

Arguments:

[arg1]: Path to the output directory containing the simulation results.

Example:

To plot results from simulations saved in the outputs/20240215 directory:

Rscript long_sim_plots.R 'outputs/20240215'

Model Weights, Intermediate Results, and Visualizations

This section outlines the model weights, intermediate results, and visualizations available for analysis and evaluation. The results pertain to a single simulated longitudinal dataset (r=1) for 12,500 patients, estimating counterfactual diabetes risk under three regimes: static, dynamic, and stochastic. Data and results are saved in the ex_outputs/ directory.

Key Files and Descriptions:

Simulated Dataset
- Long format dataset: tmle_dat_long_R_1_n_12500_J_6.rds
Simulation Results
- TMLE simulation:
  longitudinal_simulation_results_estimator_tmle_treatment_rule_all_R_325_n_12500_J_6_n_folds_5_scale_continuous_FALSE_use_SL_TRUE.rds
- RNN-based model simulation:
  longitudinal_simulation_results_estimator_tmle_treatment_rule_all_R_325_n_12500_J_6_n_folds_5_scale_continuous_FALSE_use_SL_TRUE.rds
Validation Predictions
- Multiple binary and categorical treatments:
  - Binary C predictions: lstm_bin_C_preds.npy, lstm_bin_C_preds_info.npz
  - Categorical A predictions: lstm_cat_A_preds.npy, lstm_cat_A_preds_info.npz
  - Binary A predictions: lstm_bin_A_preds.npy, lstm_bin_A_preds_info.npz
Test Predictions
- Binary and categorical treatments:
  - Binary C predictions: test_bin_C_preds.npy, test_bin_C_preds_info.npz
  - Categorical A predictions: test_cat_A_preds.npy, test_cat_A_preds_info.npz
  - Binary A predictions: test_bin_A_preds.npy, test_bin_A_preds_info.npz
Model Weights
- Weights for RNN models with multiple binary and categorical treatments:
  - lstm_bin_A_model.h5
  - lstm_cat_A_model.h5
  - lstm_bin_C_model.h5
Descriptive Plots
- Graphical outputs summarizing the simulation and analysis:
  - Directed Acyclic Graph (DAG):
  - Treatment adherence:
  - Survival plots:
    - Observed:
    - Truth:
    - TMLE estimates:

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
ex_outputs		ex_outputs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
long_sim_plots.R		long_sim_plots.R
package_list.R		package_list.R
package_list.sb		package_list.sb
package_list.sh		package_list.sh
session_info.txt		session_info.txt
simulation.R		simulation.R
simulation_gpu.sb		simulation_gpu.sb
simulation_mpi.sb		simulation_mpi.sb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multi-ltmle

Prerequsites

Contents

package_list.R

src/misc_fns.R

src/simcausal_fns.R

src/simcausal_dgp.R

src/tmle_fns.R

src/tmle_IC.R

src/SL3_fns.R

src/tmle_fns_lstm.R

src/lstm.R

src/train_lstm.py

src/test_lstm.py

simulation.R

long_sim_plots.R

Required File Modifications

`simulation.R`

`train_lstm.py`

Instructions

1. Install Required R Packages and Set Up Environment

2. Run Simulations

Arguments:

Examples:

3. Plot Simulation Results

Arguments:

Example:

Model Weights, Intermediate Results, and Visualizations

Key Files and Descriptions:

About

Releases

Packages

Languages

License

jvpoulos/multi-ltmle

Folders and files

Latest commit

History

Repository files navigation

multi-ltmle

Prerequsites

Contents

package_list.R

src/misc_fns.R

src/simcausal_fns.R

src/simcausal_dgp.R

src/tmle_fns.R

src/tmle_IC.R

src/SL3_fns.R

src/tmle_fns_lstm.R

src/lstm.R

src/train_lstm.py

src/test_lstm.py

simulation.R

long_sim_plots.R

Required File Modifications

simulation.R

train_lstm.py

Instructions

1. Install Required R Packages and Set Up Environment

2. Run Simulations

Arguments:

Examples:

3. Plot Simulation Results

Arguments:

Example:

Model Weights, Intermediate Results, and Visualizations

Key Files and Descriptions:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`simulation.R`

`train_lstm.py`

Packages