Releases: activeloopai/deeplake
1.3.1
🚀 New
- Auto infer-schema & auto-directory ingestion! (#696) @McCrearyD
- Added a hello objectron notebook (#694) @haiyangdeperci
- Added ability to specify region in S3 (#715) @kevinlu1211
- CSV parsing added to hub.auto (#711) @dhiganthrao
- Added genomelake hub backend benchmarks (#680) @DebadityaPal
- Added unit test for utils.py (#668) @hakanbakacak
🧭 What's Changed
- to_tensorflow now supports a new argument (key_list) that only passes certain tensors to it and speeds up iteration time in case multiple extra tensors are present. (#689) @AbhinavTuli
- Caching present within to_tensorflow has been improved to tensors with dynamic shapes (earlier it was saving only the current sample in the cache) (#689) @AbhinavTuli
- Adds the option to specify None as compressor while defining the schema (#689) @AbhinavTuli
- Adds the ability to slice dynamically shaped tensors and obtain a list instead of iterating over them one by one. (#689) @AbhinavTuli
- transform logic has been modified to work properly with multiple workers (#689) @AbhinavTuli
- Added tags to usage and crash reports (#697) @zomglings
- Added ipynb file with benchmark tests for dnafrag package (#676) @DebadityaPal
- Relaxed hub requirements (#659) @haiyangdeperci
- Updated Objectron dataset tensors from generic types to hub schema representations (#705) @haiyangdeperci
🐛 Bug Fixes
- Removed mutable default args in client/base.py (#699) @TakshPanchal
- Fixes windows environment encoding (#671) @haiyangdeperci
- Fix/windows setup (#650) @haiyangdeperci
- Fixed README links (#682) @DebadityaPal
- Any dataset copy test that got interrupted midway through the test affected all subsequent test runs. This has now been fixed. (#689) @AbhinavTuli
- Fixed issue with resize in mode='a' (#718) @kristinagrig06
🗂 Documentation
- Russian translation for README (#656) @george-zakharov
- Update schema docs (#654) @thisiseshan
- Add Tutorial for Working with Text on Hub (#672) @dhiganthrao
- include consent language in readme (#666) @mynameisvinn
🔗 Dependency Updates
- Bumped humbug dependency version to ">=0.1.6" (#673) @zomglings
- Update zarr requirement from <2.7,>=2.4 to >=2.4,<2.8 (#717) @dependabot-preview
- Bump boto3 from 1.17.33 to 1.17.36 (#716) @dependabot-preview
- Bump boto3 from 1.17.30 to 1.17.33 (#701) @dependabot-preview
- Bump tensorflow from 2.4.0 to 2.4.1 (#706) @dependabot-preview
- Bump sphinx from 3.5.2 to 3.5.3 (#707) @dependabot-preview
- Bump tiledb from 0.7.6 to 0.8.5 (#703) @dependabot-preview
- Bump flake8 from 3.8.4 to 3.9.0 (#686) @dependabot-preview
- [Security] Bump tensorflow from 2.3.1 to 2.4.0 (#332) @dependabot-preview
- Bump pytest-cov from 2.10.1 to 2.11.1 (#474) @dependabot-preview
- Bump boto3 from 1.17.22 to 1.17.30 (#693) @dependabot-preview
⚙️ Who Contributed
@AbhinavTuli, @DebadityaPal, @McCrearyD, @TakshPanchal, @dependabot-preview, @dependabot-preview[bot], @dhiganthrao, @george-zakharov, @haiyangdeperci, @hakanbakacak, @kevinlu1211, @kristinagrig06, @madhucharan, @mynameisvinn, @thisiseshan, @zomglings
1.3.0
🧭 What's Changed
- Version Control has been added to Hub Datasets! (#610) @AbhinavTuli
- to_tensorflow now properly supports Text datasets (#658) @AbhinavTuli
- Hub crash and system information reports using Bugout (#624) @zomglings
- Added support for multiple BBox and Classlabel, instead of Sequences. (#658) @AbhinavTuli
- CLI name has been changed from hub to activeloop (#631) @haiyangdeperci
- Notebook example for creating dataset for object detection and instance segmentation added(#629) @haritsahm
- Tutorial for working with Audio Added (#592) @mynameisvinn
🚀 New
- Hub version command cli (#628) @sparkingdark
- Automatic Release Drafter added to repository (#598) @Anselmoo
- Improve Directory Structure of Examples (#630) @SauravMaheshkar
- Put zarr, tileDB, and hub benchmarks in one file (#534) @DebadityaPal
- Refactored Dataset Class (#576) @DebadityaPal
- Add Github Actions CI pipeline (#372) @ADI10HERO
- Improve Directory Structure of Examples (#630) @SauravMaheshkar
🐛 Bug Fixes
- Removed Assertions from shape_detector.py and added exceptions (#616) @DebadityaPal
- Adds support for dataset views in sharded dataset (#557) @AbhinavTuli
- Advanced slicing added for Sharded Dataset (#558) @AbhinavTuli
🗂 Documentation
- README added in Korean (#621) @HyeongminLEE
- README added in Bahasa Indonesia (#645) @haritsahm
- README added in French (#640) @MargauxMasson
- README added in Turkish (#608) @hakanbakacak
- Chinese Readme Proofread and Update (#613) @Cynthia7979
- Change ds.commit() to ds.flush() throughout in README.md (#619) @galbwe
- Added explaination for local file system to docs (#634) @McCrearyD
- Replaced commit() with flush() in documentation. (#604) @dhiganthrao
- Add MinIO to Data Storage docs (#605) @gabriel-milan
- Updated example notebooks with pip (#585) @MojammelHossain
- Typos fixed (#591) @dPacc
🔗 Dependency Updates
Bump pytest from 6.2.1 to 6.2.2 (#496) @dependabot-preview
Bump ray from 1.0.0 to 1.2.0 (#554) @dependabot-preview
Bump boto3 from 1.16.39 to 1.17.20 (#646) @dependabot-preview
⚙️ Who Contributed
@ADI10HERO, @AbhinavTuli, @Anselmoo, @Cynthia7979, @DebadityaPal, @HyeongminLEE, @MargauxMasson, @McCrearyD, @MojammelHossain, @SauravMaheshkar, @dPacc, @davidbuniat, @dhiganthrao, @gabriel-milan, @galbwe, @haiyangdeperci, @hakanbakacak, @haritsahm, @imshashank, @mikayelh, @mynameisvinn, @sparkingdark and @zomglings
1.2.3
Release Notes
- Reverting shape checks for Mask schema to maintain backward compatibility.
1.2.2
Release Notes
- Hotfix for a bug that resulted in incorrect slicing of TensorView.
1.2.1
Release Notes
- Dataset copying has been added allowing you to copy your own and other users' datasets easily. Datasets can be copied across gcs, s3, aws, local storage and hub storage. #454 (@AbhinavTuli)
- Many improvements to the benchmarks #508 #512 #531 #545 #550 (@haiyangdeperci @DebadityaPal)
- Development Roadmap added #511 (@mynameisvinn)
- Improved message for Hub transforms by displaying shard size #523 (@DebadityaPal)
- All windows have now been fixed. #528 (@AbhinavTuli)
- Hub dataset filtering has been overhauled and a section has been added for the same in the documentation #539 (@AbhinavTuli)
- to_tensorflow issues with Datasets containing Sequences (such as coco) have been fixed #540 (@AbhinavTuli)
- Adds get_label parameter to .compute() and .numpy(), to directly retrieve string label from ClassLabel #489 (@DebadityaPal)
- Tutorial added for using Hub with Hugging Face transformers #536 (@DebadityaPal)
- Some unit tests have now been parameterized to cover multiple datatypes #527 (@drewpotter)
- From directory function has been implemented to directly ingest categorical image data #459 (@sparkingdark)
- Example use case added for creating a Hub dataset for Deep Learning prediction of crop yield #559 (@MargauxMasson)
- MPL Headers have been added to source files #494 (@KrishnaChaitanya1)
1.2.0
Release Notes
- Adds support for dataset filtering (#460)(@AbhinavTuli)
- Greatly improves to_tensorflow performance (#481) (@AbhinavTuli)
- Benchmarks added for Hub 1.x (#486) (@benchislett)
- Fixes a bug that caused issues on windows machines (#472)(@FayazRahman)
- Fixes a bug that caused issues with TF 2.4.0 (#478) (@DebadityaPal)
- Fixes docker build issue (#463) (@Darkborderman)
- Added Chinese readme (#458) (@EYH0602)
- Better automatic determination of Dataset mode depending on permissions (#466)(@edogrigqv2)
- CoLA dataset uploaded to Hub, upload script added to examples (#487)(@mynameisvinn)
- Fixes a bug with dataset slicing (#480) (@AbhinavTuli)
- Adds support for custom s3 endpoints (including MinIO) (#482) (@AbhinavTuli)
- Adds the ability to set a name to a dataset so it appears better on the visualizer (#468) (@AbhinavTuli)
1.1.3
Fixes an issue in to_pytorch when using a dataset that the user doesn't own.
1.1.0
Release Notes
- Custom s3storage with 5-10x faster than S3FS
- Faster pytorch dataset with current chunk logic
- Fixed caching with in-memory per process without LMDB
- Better Exception handling for loading a dataset, shape and type checks, casting
- Added examples, tutorials, and better GitHub issue handling
- Add the opportunity to fill in additional information about the dataset such as description, license, citation
- Native support with .compute() in the middle for nested tensors
Contributors include. @edogrigqv2 @AbhinavTuli @mynameisvinn @Anselmoo @sparkingdark @sanchitvj @Atom-101
Release v1.0.7
Private dataset support
Improved error handling and exceptions
Test coverage reached 73%->80%
Various bug fixes
Transform speedup ~2x, hence from_x convertors work faster
version 1.0.6
Fixes some issues with segmentation and RAM issues in transform