GPT2SP Replication Package

GPT2SP

A web-based Agile story point estimator
View Demo »

Table of Contents

How to replicate
Acknowledgements
License

How to replicate

About the Datasets

All of the datasets for 16 different projects are available in the marked_data folder Each dataset has the following 5 columns:

issuekey: Issue ID
title: Issue Title
description: Issue Description
storypoint: Assigned Story Point of the Issue
split_mark: Represent whether the row was used as training, validation, or testing

issuekey	title	description	storypoint	split_mark
...	...	...	...	...

About the Models

Model Naming Convention

All of the models on HuggingFace Model Hub and Google Drive has the same naming convention as described in the following table:

Model ID	Model Specification	Experiment Scenario
#0	BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP)	Within-Project
#00	BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP)	Within-Repository
#000	BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP)	Cross-Repository
#2	Word-levelSP Tokenizer + Custom Pretrained GPT-2	Within-Project
#22	Word-levelSP Tokenizer + Custom Pretrained GPT-2	Within-Repository
#222	Word-levelSP Tokenizer + Custom Pretrained GPT-2	Cross-Repository
#6	WordPieceSP Tokenizer + Custom Pretrained GPT-2	Within-Project
#66	WordPieceSP Tokenizer + Custom Pretrained GPT-2	Within-Repository
#666	WordPieceSP Tokenizer + Custom Pretrained GPT-2	Cross-Repository
#7	SentencePieceSP Tokenizer + Custom Pretrained GPT-2	Within-Project
#77	SentencePieceSP Tokenizer + Custom Pretrained GPT-2	Within-Repository
#777	SentencePieceSP Tokenizer + Custom Pretrained GPT-2	Cross-Repository

How to access the models

Three different pre-trained tokenizers can be found in the all_tokenizers folder: Word-levelSP Tokenizer, WordPieceSP Tokenizer , and SentencePieceSP Tokenizer
All of the models included in our experiments can be found on the Model Hub provided by HuggingFace
For your information, the models can also be downloaded from this Google Drive

About the Model Training Process

All of the training scripts for different pre-trained tokenizers included in the experiments (RQ3) can be found in tokenizer_training_notebook.ipynb
The model training scripts can be found in model_training_notebook.ipynb that contains all of the model training process for our experiments (RQ1 + RQ2 + RQ3)

About the GPT2SP Web App

Access the GPT2SP web app here to interact with our GPT2SP model and navigate the datasets

Acknowledgements

Special thanks to DeepSE's developers for providing the datasets and the replication package.
Special thanks to developers from PyTorch, HuggingFace, Streamlit, Transformers Interpret for providing amazing frameworks for the community

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
abe0		abe0
all_tokenizers		all_tokenizers
corpus_tokenization_comparison		corpus_tokenization_comparison
custom_transformers_interpret		custom_transformers_interpret
data_model_analysis		data_model_analysis
logo		logo
sp_dataset		sp_dataset
xai_tokens		xai_tokens
.DS_Store		.DS_Store
GPT2SP.py		GPT2SP.py
GPT2SP_inspection_notebook.ipynb		GPT2SP_inspection_notebook.ipynb
LICENSE		LICENSE
README.md		README.md
model_training_notebook.ipynb		model_training_notebook.ipynb
tokenizer_training_notebook.ipynb		tokenizer_training_notebook.ipynb
vocab_and_tokenization_comparison.ipynb		vocab_and_tokenization_comparison.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT2SP Replication Package

GPT2SP

How to replicate

About the Datasets

About the Models

Model Naming Convention

How to access the models

About the Model Training Process

About the GPT2SP Web App

Acknowledgements

License

About

Releases

Packages

Contributors 2

Languages

License

awsm-research/gpt2sp

Folders and files

Latest commit

History

Repository files navigation

GPT2SP Replication Package

GPT2SP

How to replicate

About the Datasets

About the Models

Model Naming Convention

How to access the models

About the Model Training Process

About the GPT2SP Web App

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages