Skip to content

Summasphere/bangkit-machine-learning

Repository files navigation

Summasphere bangkit-machine-learning

The source code of machine learning model's API of Summasphere smart guide in order to complete Bangkit Capstone Project

Model Building's Notebook

Bart Large Notebook

Dataset Resources

API Endpoint

Endpoint Method Body Sent (JSON) Description
/api/summarize POST - mode (text, pdf, link)
- model (bart, gemini)
- text [optional]
- file [optional]
- url [optional]
HTTP POST REQUEST for Summarization Task
/api/analyzer POST - media (frontend, android)
- mode (pdf, link)
- text [optional]
- file [optional]
- url [optional]
HTTP POST REQUEST for Topic Analyzer Task

The flow of Machine Learning Service

How to run this FastAPI app

  • Clone this repo
  • Open terminal and go to fastapi-summasphere directory by typing
    cd fastapi-summasphere
  • Type py -m venv env to create python virtual environment
  • Activate python virtual environment
    <!-- Windows -->
    env\Scripts\activate.bat
    
    <!-- Linux -->
    source env/bin/activate
  • Type pip install -r requirements.txt to install neccesary library
  • Run the app

    These examples run the server program (e.g Uvicorn), starting a single process, listening on all the IPs (0.0.0.0) on a predefined port (e.g. 80)

    fastapi dev app.py
    
    <!-- For Production -->
    fastapi run 
    
    OR 
    
    uvicorn app.main:app --reload

FastAPI Docs

Endpoint Testing

  • /api/summarize for URL

  • /api/summarize for PDF

  • /api/analyzer for URL

  • /api/analyzer for PDF

Architecture of Bart Large Seq2Seq NLP Model

References

  • Liu, Y., & Lapata, M. (2019). Text Summarization with Pretrained Encoders. arXiv preprint arXiv:1908.08345. Retrieved from https://arxiv.org/abs/1908.08345
  • Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461. Retrieved from https://arxiv.org/abs/1910.13461
  • Fabbri, A., Li, I., She, T., Li, S., & Radev, D. (2019). Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. arXiv preprint arXiv:1906.01749. Retrieved from https://arxiv.org/abs/1906.01749

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •