κΈΈκ³ λ³΅μ‘ν λ΄μ€ μλ¬Έμ μ§§κ² μμ½νκ³ μ£Όμ ν€μλλ₯Ό μ§λ΅ ννλ‘ νμ΄ μ€λͺ
ν΄μ£Όλ μλΉμ€μ
λλ€.
λ±λ±ν λ¬Έμ₯μμ λ€λ₯Έ λ§ν¬λ‘ λ³κ²½μ΄ κ°λ₯νλ©°, ν€μλμ λν΄ μ§λ¬Ένλ μ λλ₯Ό μ‘°μ κ°λ₯ν©λλ€.
μ΄λ €μ΄ λ¨μ΄κ° λ§μ IT/κ³Όν λΆμΌ λλ κΈμ΅ λΆμΌμ νΉνν΄ Fine-tuning λμμ΅λλ€.
Team Notion
μμ° μμ
$ pip install -r requirements.txt
$ streamlit run streamlit/malang_news.py
- μνλ λ΄μ€μ URL μ λ ₯ (λ€μ΄λ² λ΄μ€μ μ΅μ ν)
- Inference κΈ°λ€λ¦¬κ³ κ²°κ³Ό λ°μ보기
malang_news.py
μ λ³ΈμΈμ Huggingface API Key, OpenAI API Keyλ₯Ό μ λ ₯ν΄μΌ ν©λλ€.
API_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Huggingface
API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # OpenAI
- μμ§ λͺ¨λΈμ λΆλ¬μ€λ μ€μ΄μμ. μλ΄ λ¬Έκ΅¬ μΆλ ₯ μ μ‘°κΈ λ€ λ€μ μλ
- λ΄μ€λ₯Ό μ°Ύμ μ μμ΄μ. μλ΄ λ¬Έκ΅¬ μΆλ ₯ μ URLμ΄ μ¬λ°λ₯Έμ§ νμΈ
Malang_news/
β
βββ crawler/
| βββ headline_crawler_final.py
| βββ headline_crawler_onlybs.py
| βββ newneek_crawler.ipynb
| βββ news_crawler_final.py
| βββ λ€μ΄λ²λ΄μ€_ν¬λ‘€λ§.ipynb
β
βββ model/
β βββ BART/
| | βββ KoBART_navernews.ipynb
| | βββ μμ±μμ½_KoBART.ipynb
| | βββ μΆμΆμμ½_KoBART.ipynb
| |
β βββ KeyBERT/
| | βββ keyword_extract.ipynb
β |
| βββ causalLM/
| βββ GPTtrain.py
| βββ koalpaca_fine-tuning.ipynb
|
βββ preprocessing/
| βββ json2csv.ipynb
| βββ newneek_preprocessing.ipynb
| βββ news_preprocessing_labeling.ipynb
|
βββ streamlit/
βββ malang_news.py
βββ utils.py
- λ€μ΄λ²λ΄μ€ - κΈμ΅
- λ€μ΄λ²λ΄μ€ IT/κ³Όν ν€λλΌμΈ λ΄μ€
- Korean SmileStyle Dataset
- KeyBERT
- KoBERT