test.txt problem #388

janghobaek2125 · 2024-07-12T11:25:47Z

I am training a model using the Kaggle dataset. The Kaggle dataset consists of train.txt and test.txt files.

The train.txt used during training is well preprocessed, and the training is completed successfully.

However, it seems that the dataset test.txt used for inference is not being properly preprocessed.

"What seems to be the problem?"

python data_utils.py --raw-data-file=/data/janghobaek/test.txt

janghobaek2125 · 2024-07-12T11:28:37Z

Traceback (most recent call last):
File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1912, in
run()
File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1108, in run
train_data, train_ld, test_data, test_ld = dp.make_criteo_data_and_loaders(args)
File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 520, in make_criteo_data_and_loaders
train_data = CriteoDataset(
File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 109, in init
file = data_utils.getCriteoAdData(
File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1138, in getCriteoAdData
total_per_file[i] = process_one_file(
File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1015, in process_one_file
X_int[i] = np.array(line[1:14], dtype=np.int32)
ValueError: invalid literal for int() with base 10: '5a9ed9b0'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test.txt problem #388

test.txt problem #388

janghobaek2125 commented Jul 12, 2024

janghobaek2125 commented Jul 12, 2024

test.txt problem #388

test.txt problem #388

Comments

janghobaek2125 commented Jul 12, 2024

janghobaek2125 commented Jul 12, 2024