Network flow generation is important under different network analysis scenarios. For different generation tasks, the datasets chosen and the network attributes selected are different. So please configure the settings for your specific dataset. The parameters for the dataset is recorded in the file /conf/params.json.
- file_path: the path for the train set
- dataset_features: the features in dataset
- dataset_dtypes: the data types for the features
- generated_features: the features to generate for new flows
To run the code:
python seq_gan.py
After training, the parameters of the generator and discriminator are stored under the sub-directory /conf. The record of generated network flows is stored under the sub-directory /target. The analysis of the statistical results are stored under the sub-directory /stats
Based on unsw-nb15 dataset, we used train set (/data/train.csv) to train NF-GAN model and generate 20000 network flows which are recorded in the target file (/target/traffic-1.csv). Then we compare the distributions of flows in the test set (/data/test.csv) and the generated flows. The result is as follows. The red line is the distribution of the generated flows whereas the blue line is the distribution of the real flows.