Skip to content

A handy note-book to split data form samples and labels to test-train folders

Notifications You must be signed in to change notification settings

sonyD4d/DataSplitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Data Splitter

A handy note-book to split data form samples and labels to test-train folders


Managing the way input data is stored can be tricky sometimes to easily load them into python intr for further processing, for example if we consider the famous Cats vs Dogs dataset for image processing, it is in images and labels format which requires some additional amount of work to load.

Or suppose if you have less RAM memory and need to load part of a data set rather than complete the dataset itself then, follow the above notebook to the same (may require slight tinkering to suit your needs) :)


Initial dir structure

  \DATA
  ├───images
  └───labels
  • images folder contains all samples
  • labels contains mapping of images to labels (in our case it's in XML so ,we will parse it)

To

   \SAMPLES
   ├───1
   ├───2
   ├───3
   ├───4
   ├───5
   └───6

Now we will convert this into train,test and validation folder by using split-folders lib

End Result

    \SPLIT
    ├───train
    │   ├───1
    │   ├───2
    │   ├───3
    │   ├───4
    │   ├───5
    │   └───6
    └───val
        ├───1
        ├───2
        ├───3
        ├───4
        ├───5
        └───6

Now we can load this for ML stuff :)

About

A handy note-book to split data form samples and labels to test-train folders

Topics

Resources

Stars

Watchers

Forks