The project uses four json
files througout the flow to ensure smooth communication among different components.
status.json
: Stores information about the state of the training server and whether the project is currently in dev or live mode.cleanup.json
: Contains a flag depending upon which theclean
lambda function in the inference module performs S3 cleanup.training.json
: Stores training configuration which includes model parameters and dataset.inference.json
: Keeps information regarding the models which are currently available for inference.
In case your are interested, sample preview for each JSON file has been provided. Check the contents of this directory to view the format of those files.
This section will explain how the config files are initialized and setup for use in the project.
- Create a S3 bucket
- Inside the bucket, create the following folders
- training
- imageclassification
- textclassification
- Read through the sections below to setup each of the config file
- Location refers to the path of the file relative to the root of S3
- Initial Content refers to the content that should be stored inside the file before uploading to S3
The config file contains the following keys
status
: Can be eitheractive
orsleeping
. If the value is set toactive
, it means that the training server is currently busy and it won't accept any more training requests for the time being.token
: If the status is active then this key will hold the token of the user whose model is currently being trained on the server.task_type
: Can be eitherimageclassification
ortextclassification
. If the status is active then it will store the type of the task that is currently being trained on the server.dev_mode
: Can be a boolean value. If it is set to true then the platform will halt all operations. This key is set to true only when any maintainence work has to be done.
Click here to check a sample preview of status.json
status.json
{
"status": "sleeping",
"token": "",
"task_type": "",
"dev_mode": false
}
The config file contains the following keys
status
: Can be eitheractive
orsleeping
. If the value is set tosleeping
, the cleanup lambda function won't delete old models from S3.
Click here to check a sample preview of cleanup.json
cleanup.json
{
"status": "active"
}
At the top level, this config file contains the tokens of the models available for inference as keys. For each token, the file has the following set of keys
task_type
: Can be eitherimageclassification
ortextclassification
. It determines the model type.model_filename
: Path of the .pt (trained model) file in the S3 bucket.model_filename_onnx
: Path of the .onnx (trained model) file in the S3 bucket.classes
ormetadata_filename
: The name of this key depends upon the model type. If the model is an image classification model then the name will beclasses
else if the model belongs to text classification then the name will bemetadata_filename
downloadable
: Shows whether the model weights can be downloaded.accuracy
: Represents the best accuracy of the model on validation dataset.accuracy_plot
: Stores the base64 string of the accuracy chart plot created during the training process.created
: Stores the date and time when the model was created and made available for inference.
Click here to check a sample preview of inference.json
inference.json
{}
The config file contains the following keys
token
: Token assigned to the user so that he can access the model after it has been deployed.task_type
: Can be eitherimageclassification
ortextclassification
. It determines the model type.num_classes
: Total number of classes present in the dataset.optimizer
: Optimizer for training.learning_rate
: Learning rate for the optimizer.batch_size
: Batch size to use for training.epochs
: Number of epochs for which the model will be trained.data_split
: Ratio at which the dataset will be split into training and validation set.dataset
: Dataset to be used to training in the form of base64 strings.model
: Model type.reduce_lr_on_plateau
: Whether to add reduce LR on plateau scheduler to the training processfactor
: Factor at which the learning rate will be reduced.patience
: How many epochs to wait for before reducing the learning rate.min_lr
: Minimum value to which the learning rate can be reduced.
step_lr
: Whether to add step LR scheduler to the training processstep_size
: Frequency in epochs for changing the learning rate.gamma
: Factor at which the learning rate will be reduced.
Click here to check a sample preview of training.json
You don't need to create this file. It will be automatically created by the training server module.