Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test.txt problem #388

Open
janghobaek2125 opened this issue Jul 12, 2024 · 1 comment
Open

test.txt problem #388

janghobaek2125 opened this issue Jul 12, 2024 · 1 comment

Comments

@janghobaek2125
Copy link

I am training a model using the Kaggle dataset. The Kaggle dataset consists of train.txt and test.txt files.

The train.txt used during training is well preprocessed, and the training is completed successfully.

However, it seems that the dataset test.txt used for inference is not being properly preprocessed.

"What seems to be the problem?"

python data_utils.py --raw-data-file=/data/janghobaek/test.txt

@janghobaek2125
Copy link
Author

Traceback (most recent call last):
File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1912, in
run()
File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1108, in run
train_data, train_ld, test_data, test_ld = dp.make_criteo_data_and_loaders(args)
File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 520, in make_criteo_data_and_loaders
train_data = CriteoDataset(
File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 109, in init
file = data_utils.getCriteoAdData(
File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1138, in getCriteoAdData
total_per_file[i] = process_one_file(
File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1015, in process_one_file
X_int[i] = np.array(line[1:14], dtype=np.int32)
ValueError: invalid literal for int() with base 10: '5a9ed9b0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant