From 48a8ef6d08d7e199b142afa4cc19988e680aba0e Mon Sep 17 00:00:00 2001 From: Daohang Shi Date: Mon, 10 Jul 2023 15:04:57 -0700 Subject: [PATCH] fix broken torchbench mimo_cmf_30x Summary: this test has been broken since 05/22. https://pxl.cl/2S3Fg ## fix 1. refresh eval data or skip model export here's an example of failed job complaining data expiration. > MlDataComponentValidatorWrapper (data_config.ml_data_config): No available partitions found for given query filter. Namespace = ad_delivery. Table = ctr_mbl_feed_model_af_cd_async_ai_cd_30_neg_ds_md. filterClause = pipeline = 'ctr_mbl_feed_model_af_cd_async_ai_cd_30_neg_ds_md' AND ds = '2023-03-09'. we do have `analyzer.refresh_dataset()` but still see such error because we didn't call `analyzer.refresh_eval_dataset()`. the 'no partition found' error was happening at `self._serialize_inference_model`. https://www.internalfb.com/code/fbsource/[c953aa6a0a497851da8ec4f4361d0202dd5c33f7]/fbcode/dper3/dper3_models/ads_ranking/base_models/mimo_nn/mimo_pytorch_model_builder_base.py?lines=1512-1524 ~~furthermore, we realized even `export_mode` can be None in torchbench. similar trick has been adopted before to speed up model instantiation for testing purpose https://www.internalfb.com/diff/D46180124?dst_version_fbid=100400963077454&transaction_fbid=6056114577843800~~ ~~## fix 2. use_synthetic_data = True~~ ~~It doesn't work here. TODO. address it later.~~ ## fix 3. simplify branching of getting model Now ALWAYS generate model/data at run time to catch changes in model instantiation part too. Reviewed By: bertmaher, xuzhao9 Differential Revision: D47157906 fbshipit-source-id: d7423e7329185d368361e7b40ef2ba5fa9f926c6 --- torchbenchmark/util/experiment/instantiator.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/torchbenchmark/util/experiment/instantiator.py b/torchbenchmark/util/experiment/instantiator.py index fbfce877ad..20b7bd5465 100644 --- a/torchbenchmark/util/experiment/instantiator.py +++ b/torchbenchmark/util/experiment/instantiator.py @@ -10,7 +10,7 @@ from torchbenchmark.util.model import BenchmarkModel from torchbenchmark import _list_model_paths, load_model_by_name, ModelTask -WORKER_TIMEOUT = 1800 # seconds +WORKER_TIMEOUT = 3600 # seconds BS_FIELD_NAME = "batch_size" @dataclasses.dataclass