added continuous test cases. Added warnings for untested functionalit…

…y. updated readme
joshuaspear · Jun 3, 2024 · 5404dc9 · 5404dc9
1 parent d7ecc1a
commit 5404dc9
Show file tree

Hide file tree

Showing 18 changed files with 69 additions and 20 deletions.
diff --git a/.gitignore b/.gitignore
@@ -615,4 +615,5 @@ MigrationBackup/
 
 /tmp
 /d3rlpy_data
-/d3rlpy_logs
+/d3rlpy_logs
+/propensity_output
diff --git a/README.md b/README.md
@@ -1,13 +1,9 @@
-# offline_rl_ope (BETA RELEASE)
+# offline_rl_ope
 
 **WARNING**
 - All IS methods implemented incorrectly in versions < 6.x
-- Per-decision weighted importance sampling was incorrectly implemented in versions < 5.X
-- Weighted importance sampling was incorrectly implemented in versions 1.X.X and 2.1.X, 2.2.X
 - Unit testing currently only running in Python 3.11. 3.10 will be supported in the future
-- Only 1 dimensional discrete action spaces are currently supported!
-
-**IMPORTANT: THIS IS A BETA RELEASE. FUNCTIONALITY IS STILL BEING TESTED** Feedback/contributions are welcome :) 
+- Not all functionality has been tested i.e., d3rlpy api and LowerBounds are still in beta
 
 ### Testing progress
 - [x] components/
@@ -21,11 +17,12 @@
 - [x] Metrics
   - [x] EffectiveSampleSize.py
   - [x] ValidWeightsProp.py
-- [ ] PropensityModels
+- [x] PropensityModels
 - [ ] LowerBounds
 - [ ] api/d3rlpy
 
-* Insufficient functionality to test i.e., currently only wrapper classes are implemented for the OPEEstimation/DirectMethod.py
+Insufficient functionality to test DirectMethod.py i.e., currently only wrapper classes are implemented for the OPEEstimation/DirectMethod.py
+
 
 #### Overview
 Basic unit testing has been implemented for all the core functionality of the package. The d3rlpy/api for importance sampling adds minimal additional functionality therefore, it is likely to function as expected however, no sepcific unit testing has been implemented! 
@@ -34,7 +31,7 @@ Basic unit testing has been implemented for all the core functionality of the pa
 * More documentation needs to be added however, please refer to examples/ for an illustration of the functionality
   * examples/static.py provides an illustration of the package being used for evaluation post training. Whilst the d3rlpy package is used for model training, the script is agnostic to the evaluation model used
   * examples/d3rlpy_training_api.py provides an illustration of how the package can be used to obtain incremental performance statistics during the training of d3rlpy models. It provides greater functionality to the native scorer metrics included in d3rlpy
-* The current focus has been on discrete action spaces. Continuous action spaces are intended to be addressed at a later date
+* For continuous action spaces, only deterministic policies are fully supported. Supprt for stochastic policies is in development
 
 ### Description
 * offline_rl_ope aims to provide flexible and efficient implementations of OPE algorithms for use when training offline RL models. The main audience is researchers developing smaller, non-distributed models i.e., those who do not want to use packages such as ray (https://github.com/ray-project/ray).

diff --git a/propensity_output/epoch_1_train_preds.pkl b/propensity_output/epoch_1_train_preds.pkl
diff --git a/propensity_output/epoch_1_val_preds.pkl b/propensity_output/epoch_1_val_preds.pkl
diff --git a/propensity_output/epoch_2_train_preds.pkl b/propensity_output/epoch_2_train_preds.pkl
diff --git a/propensity_output/epoch_2_val_preds.pkl b/propensity_output/epoch_2_val_preds.pkl
diff --git a/propensity_output/epoch_3_train_preds.pkl b/propensity_output/epoch_3_train_preds.pkl
diff --git a/propensity_output/epoch_3_val_preds.pkl b/propensity_output/epoch_3_val_preds.pkl
diff --git a/propensity_output/epoch_4_train_preds.pkl b/propensity_output/epoch_4_train_preds.pkl
diff --git a/propensity_output/epoch_4_val_preds.pkl b/propensity_output/epoch_4_val_preds.pkl
diff --git a/propensity_output/mdl_chkpnt_epoch_1.pt b/propensity_output/mdl_chkpnt_epoch_1.pt
diff --git a/propensity_output/mdl_chkpnt_epoch_2.pt b/propensity_output/mdl_chkpnt_epoch_2.pt
diff --git a/propensity_output/mdl_chkpnt_epoch_3.pt b/propensity_output/mdl_chkpnt_epoch_3.pt
diff --git a/propensity_output/mdl_chkpnt_epoch_4.pt b/propensity_output/mdl_chkpnt_epoch_4.pt
diff --git a/propensity_output/training_metric_df.csv b/propensity_output/training_metric_df.csv
diff --git a/src/offline_rl_ope/LowerBounds/__init__.py b/src/offline_rl_ope/LowerBounds/__init__.py
@@ -0,0 +1,3 @@
+from ..import logger
+
+logger.warn("LowerBound functionality still in beta")
diff --git a/src/offline_rl_ope/api/d3rlpy/__init__.py b/src/offline_rl_ope/api/d3rlpy/__init__.py
@@ -1 +1,4 @@
-from . import Scorers, Callbacks, Misc
+from . import Scorers, Callbacks, Misc
+from ...import logger
+
+logger.warn("api/d3rlpy functionality still in beta")
diff --git a/tests/base.py b/tests/base.py
@@ -213,6 +213,60 @@ def __post_init__(self):
         }
 )
 
+
+test_action_vals = [
+    [[0.9], [4], [0.001], [0]],
+    [[1], [0], [0.9]]
+]
+
+test_eval_action_vals = [
+    [[0.9], [0.9], [0.001], [0]],
+    [[1], [1], [0.9]]
+]
+
+
+test_configs.update(
+    {
+        "continuous_action": TestConfig(
+            test_state_vals=test_state_vals,
+            test_action_vals=test_action_vals,
+            test_action_probs=test_action_probs,
+            test_eval_action_vals=test_eval_action_vals,
+            test_eval_action_probs=test_eval_action_probs,
+            test_reward_values=test_reward_values,
+            test_dm_s_values=test_dm_s_values,
+            test_dm_sa_values=test_dm_sa_values
+            )
+        }
+)
+
+
+test_action_vals = [
+    [[0.9,1], [4,0.9], [0.001, 1], [0,-1.2]],
+    [[1,-0.8], [0,-1], [0.9,1]]
+]
+
+test_eval_action_vals = [
+    [[0.9,1], [1,0.9], [0.001, 1], [0,-1.2]],
+    [[1,-0.8], [0,-1], [1,1]]
+]
+
+test_configs.update(
+    {
+        "multi_continuous_action": TestConfig(
+            test_state_vals=test_state_vals,
+            test_action_vals=test_action_vals,
+            test_action_probs=test_action_probs,
+            test_eval_action_vals=test_eval_action_vals,
+            test_eval_action_probs=test_eval_action_probs,
+            test_reward_values=test_reward_values,
+            test_dm_s_values=test_dm_s_values,
+            test_dm_sa_values=test_dm_sa_values
+            )
+        }
+)
+
+
 test_configs_fmt = [[key,test_configs[key]] for key in test_configs.keys()]
 test_configs_fmt_class = [
     {"test_conf":test_configs[key]} for key in test_configs.keys()
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		from ..import logger

		logger.warn("LowerBound functionality still in beta")