Data Splitter

A handy note-book to split data form samples and labels to test-train folders

Managing the way input data is stored can be tricky sometimes to easily load them into python intr for further processing, for example if we consider the famous Cats vs Dogs dataset for image processing, it is in images and labels format which requires some additional amount of work to load.

Or suppose if you have less RAM memory and need to load part of a data set rather than complete the dataset itself then, follow the above notebook to the same (may require slight tinkering to suit your needs) :)

Initial dir structure

  \DATA
  ├───images
  └───labels

images folder contains all samples
labels contains mapping of images to labels (in our case it's in XML so ,we will parse it)

To

   \SAMPLES
   ├───1
   ├───2
   ├───3
   ├───4
   ├───5
   └───6

Now we will convert this into train,test and validation folder by using split-folders lib

End Result

    \SPLIT
    ├───train
    │   ├───1
    │   ├───2
    │   ├───3
    │   ├───4
    │   ├───5
    │   └───6
    └───val
        ├───1
        ├───2
        ├───3
        ├───4
        ├───5
        └───6

Now we can load this for ML stuff :)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
split_data_into_folders .ipynb		split_data_into_folders .ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Splitter

About

Languages

sonyD4d/DataSplitter

Folders and files

Latest commit

History

Repository files navigation

Data Splitter

About

Topics

Resources

Stars

Watchers

Forks

Languages