The dataset of security and non-security issues from Github open source repositories. Classification model oof github issues, whether they concern security or not.
We are going to train an NLP model that would detect security requirements on github issues that have security labels.
We need to collect a new dataset with github issues, making sure there are issues with "security" label among them.
We expect the model to predict whether requirements (sentences in natural English that bear meaning of a software requirement) are concerned with sequrity.
If the model is used on requirements, we assume: security related issues bear information on requirements for the software