Releases: NVIDIA/Megatron-Energon
Releases · NVIDIA/Megatron-Energon
4.0.0
What's Changed
- Enable adding of additional data by joining another dataset by @voegtlel and @philipp-fischer in #20
- Replace the dataset type in the dataset.yaml by sample type directly by @voegtlel and @philipp-fischer in #29
Breaking Changes
- Dataset checkpoints from <4.0.0 will not be compatible due to the structural simplification. Everything else (e.g. randomness and the interface compatibility) should remain the same.
Full Changelog: 3.0.1...4.0.0
3.0.1
What's Changed
- This fixes
AttributeError: module 'fsspec' has no attribute 'asyn'
see #26 by @philipp-fischer
Full Changelog: 3.0.0...3.0.1
3.0.0
What's Changed
- Allow for reproducible scaling with different micro batch size in #11 by @philipp-fischer
- Introduce sequence packing and sample restore in #12 by @voegtlel and @philipp-fischer
energon info
command in #21 by @voegtlel
Full Changelog: 2.3.0...3.0.0
2.3.0
What's Changed
- Support loading datasets via dict() & update SimilarityInterleaved by @paul-gibbons in #4
- Fix gc_init_worker for ShardedTensor by @philipp-fischer in #7
- Fix dataloader being stuck iterating samples in a certain condition by @voegtlel in #6
- Fix epath with relative usage by @voegtlel in #3
New Contributors
- @paul-gibbons made their first contribution in #4
Full Changelog: 2.2.0...2.3.0
2.2.0
Full Changelog: 2.1.1...2.2.0
Version 2.2.0 introduces support for video files in your dataset and adds new samples and dataset classes for video question answering (VidQA).
The new classes are: VidQASample
and VidQAWebdataset
.
If you have videos in your dataset shards, those should now automatically be decoded and yield a VideoData
object, which contains the frames and the audio data as well as some meta data.
2.1.1 First public release
You can find the installable package here: https://pypi.org/project/megatron-energon/2.1.1/