A handy note-book to split data form samples and labels to test-train folders
Managing the way input data is stored can be tricky sometimes to easily load them into python intr for further processing, for example if we consider the famous Cats vs Dogs dataset for image processing, it is in images and labels format which requires some additional amount of work to load.
Or suppose if you have less RAM memory and need to load part of a data set rather than complete the dataset itself then, follow the above notebook to the same (may require slight tinkering to suit your needs) :)
Initial dir structure
\DATA
├───images
└───labels
- images folder contains all samples
- labels contains mapping of images to labels (in our case it's in XML so ,we will parse it)
To
\SAMPLES
├───1
├───2
├───3
├───4
├───5
└───6
Now we will convert this into train,test and validation folder by using split-folders lib
End Result
\SPLIT
├───train
│ ├───1
│ ├───2
│ ├───3
│ ├───4
│ ├───5
│ └───6
└───val
├───1
├───2
├───3
├───4
├───5
└───6
Now we can load this for ML stuff :)