Skip to content

Commit

Permalink
num_train_samples in dataset shuffling
Browse files Browse the repository at this point in the history
  • Loading branch information
sichu2023 committed Aug 29, 2024
1 parent 8c0daad commit aaabf34
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ def setup(self, stage: str = "") -> None:
tokenizer=self._tokenizer,
)
self._train_ds = self._sample_and_shuffle_dataset(
_train_ds, num_train_samples, "train"
_train_ds, None, "train"
) # shuffle manually without cyclic MegatronPretrainingRandomSampler

# Create validation dataset
Expand Down

0 comments on commit aaabf34

Please sign in to comment.