Bridge-i2i Problem Statement

Group 14 -- H2_B21_14 (IIT-Indore)

Awarded Silver Medal in the competition.

For technical details about the project, refer to Technical Report.pdf and Presentation.pdf.

Using the files

For Headline Generation (Task3 and Evaluation), first run the scripts Pegasus_FineTune.ipynb and T5_FineTune.ipynb to generate savedmodels for inference.

Submission output files

The two output files, Output1 and Output2 are stored in the folder Outputs within the base directory.

Headline Generation Model

Of the four models we stated in our report for headline generation, we went with the Pegasus standalone model, by the virtue of better metrics across majority of the metrics.

Running Time

Task 1 takes 4.18 minutes (Theme Classification)
Task 2 takes 4.25 minutes (Aspect based Sentiment Classification)
Task 3 takes 33 minutes (Headline Generation)
Preprocessing may take a variable amount of time, depending upon the number of requests to the Google Translation API by a given IP Address, and may be impacted by unavailibility of service due to high amount of API requests (in the scenario of running the code multiple times). {For stable runtimes of the translation process, the paid version of the API can be used. We, however, have used the open source version of the same, in accordance to the rules of the competition.}

Structure

The entire folder is arranged in the following manner:

1. Notebook Files:

There are five notebook scripts located in the base directory. A short description about them is as follows:

Task1&2.ipynb: Used to preprocess the dataset, and train distilBERT classifier and implement ABSA code for mobile brand and its corresponding sentiment analysis.
Task3.ipynb: Used to obtain metrics for different Headline Generation models, via Fine-tuned saved models.
Pegasus_FineTune.ipynb: Used to fine tune Pegasus for summarization task on the given dataset.
T5_FineTune.ipynb: Used to fine tune T5 model for summarization task on the given dataset.
Evaluation.ipynb: Used for evaluation of the testing data. Contains code for all three tasks.

2. Output Submission

Consists of two output files: Output1 and Output2. A description about them is as follows:

Output1.csv: Contains TextID, Predicted Labels, and generated headlines.
Output2.csv: Contains TextID, Predicted Labels and Extracted Brands and their sentiments.
Preprocessed_Text.csv: Consists of text data post preprocessing and translation. Located in the base directory

3. Saved Models

There are three folders containg saved models finetuned on the training dataset. They are:

T5: Contains T5 finetuned model for Headline Generation.
DistilBERT: Contains DistilBERT finetuned model for Theme Classification
Pegasus: Contains Pegasus finetuned model for Headline Generation.

4. Requirements

The requirements folder comprises of five txt files, containing required libraries version for each of the five notebooks mentioned above.

5. Development Data

Contains training data as provided under the problem statement.

6. Evaluation Datasets

Comprises of evaluation dataset as provided.

7. Results

Contains position standings for the entire event.

Team Members and their contribution:

Team Leader - Aryan Verma

Aryan Rastogi with Vardhan Paliwal coded the project, created the Presentation and contributed in the Technical Report.
Tarun Gupta served as the principal advisor for the entire project.
Aryan Verma and Smit Patel wrote the Technical Report.
Manav Trivedi, Gadge Prafull, Yash Kothekar, Siddhesh Shelke and Shivprasad Kadam contributed to the Technical Report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bridge-i2i Problem Statement

Using the files

Submission output files

Headline Generation Model

Running Time

Structure

1. Notebook Files:

2. Output Submission

3. Saved Models

4. Requirements

5. Development Data

6. Evaluation Datasets

7. Results

Team Members and their contribution:

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Development Data		Development Data
DistilBERT		DistilBERT
EvaluationDatasets		EvaluationDatasets
Outputs		Outputs
Requirements		Requirements
Results		Results
.gitattributes		.gitattributes
Bridgei2i_PS.pdf		Bridgei2i_PS.pdf
Evaluation.ipynb		Evaluation.ipynb
LICENSE		LICENSE
Pegasus_FineTune.ipynb		Pegasus_FineTune.ipynb
Preprocessed_Text.csv		Preprocessed_Text.csv
Presentation.pdf		Presentation.pdf
README.md		README.md
T5_FineTune.ipynb		T5_FineTune.ipynb
Task1&2.ipynb		Task1&2.ipynb
Task3.ipynb		Task3.ipynb
Technical Report.pdf		Technical Report.pdf

License

yashk1900/BridgeI-I_2021

Folders and files

Latest commit

History

Repository files navigation

Bridge-i2i Problem Statement

Using the files

Submission output files

Headline Generation Model

Running Time

Structure

1. Notebook Files:

2. Output Submission

3. Saved Models

4. Requirements

5. Development Data

6. Evaluation Datasets

7. Results

Team Members and their contribution:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages