Skip to content

Latest commit

 

History

History
417 lines (325 loc) · 6.93 KB

README.md

File metadata and controls

417 lines (325 loc) · 6.93 KB

Pujangga

Indonesian Natural Language Processing REST API

An interface for InaNLP and Deeplearning4j's Word2Vec for Indonesian (Bahasa Indonesia) in the form of REST API.

Below is the screenshot of Pujangga's request and response using Paw REST Client

screenshot

Credits:

Local Setup

  1. Install scala 2.12.2 and Lightbend Activator

  2. Clone the project

$ git clone git@github.com:panggi/pujangga.git
  1. Download the dependencies
$ cd pujangga
$ activator
  1. Pretrained word2vec model can be downloaded here https://drive.google.com/uc?id=0B5YTktu2dOKKNUY1OWJORlZTcUU&export=download

  2. Run Application

$ export WORD2VEC_FILE=/path/to/word2vec_wiki_id   
$ activator run 
  1. Access on http://localhost:9000

API Endpoints

Stemmer

Request:

POST /stemmer

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": "prof Habibie akan laku kunjung resmi ke pt Pindad di bandung"
}

Phrase Chunker

Request:

POST /phrasechunker

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": {
    "map": {
      "Pindad ": "NP",
      "Prof. Habibie ": "NP",
      ".": ".",
      "di Bandung ": "PP",
      "akan melakukan kunjungan resmi ke PT ": "VP"
    },
    "list": [
      "NP",
      "VP",
      "NP",
      "PP"
    ]
  }
}

Part-of-Speech Tagger

Request:

POST /postagger

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": {
    "map": {
      "resmi": "JJ",
      ".": ".",
      "akan": "MD",
      "ke": "IN",
      "di": "IN",
      "Bandung": "NNP",
      "Pindad": "NNP",
      "PT": "NN",
      "Prof.": "NNP",
      "kunjungan": "NN",
      "Habibie": "NNP",
      "melakukan": "VBT"
    },
    "list": [
      "NNP",
      "NNP",
      "MD",
      "VBT",
      "NN",
      "JJ",
      "IN",
      "NN",
      "NNP",
      "IN",
      "NNP"
    ]
  }
}

Named-Entity Tagger

Request:

POST /netagger

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": [
    "OTHER",
    "PERSON-B",
    "OTHER",
    "OTHER",
    "OTHER",
    "OTHER",
    "OTHER",
    "LOCATION-B",
    "OTHER",
    "PERSON-B",
    "OTHER",
    "LOCATION-B"
  ]
}

Formalizer

Request:

POST /formalizer

{
  "string": "Sis, lu bisa nggak pesenin gw sepatu newbalance tipe 960? gpl ya. hati2 sama penipuan anak 4l4y"
}

Response:

{
  "status": "success",
  "data": "Sis , kamu bisa tidak pesankan saya sepatu newbalance tipe 960 ? tidak pakai lama iya . hati-hati sama penipuan anak norak "
}

Stopwords Removal

Request:

POST /stopwords

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": "Prof. Habibie kunjungan resmi PT . Pindad Bandung "
}

Sentence Tokenizer

Request:

POST /sentence/tokenizer

{
  "string": "Saya pergi ke (bagian kanan) rumah sakit Prof. Dr. Soerojo."
}

Response:

{
  "status": "success",
  "data": [
    "Saya",
    "pergi",
    "ke",
    "(",
    "bagian",
    "kanan",
    ")",
    "rumah",
    "sakit",
    "Prof.",
    "Dr.",
    "Soerojo",
    "."
  ]
}

Sentence Tokenizer with Composite Words

Request:

POST /sentence/tokenizer/composite

{
  "string": "Saya pergi ke (bagian kanan) rumah sakit Prof. Dr. Soerojo."
}

Response:

{
  "status": "success",
  "data": [
    "Saya",
    "pergi",
    "ke",
    "(",
    "bagian kanan",
    ")",
    "rumah sakit",
    "Prof.",
    "Dr.",
    "Soerojo",
    "."
  ]
}

Sentence Splitter

Request:

POST /sentence/splitter

{
  "string": "Michael Jeffrey Jordan dilahirkan di Brooklyn, New York, Amerika Serikat, pada 17 Februari 1963 adalah pemain bola basket profesional asal Amerika. Michael Jordan merupakan pemain terkenal di dunia dalam cabang olahraga itu. Setidaknya ia enam kali merebut kejuaraan NBA bersama kelompok Chicago Bulls (1991-1993, 1996-1998). Ia memiliki tinggi badan 198 cm dan merebut gelar pemain terbaik."
}

Response:

{
  "status": "success",
  "data": [
    "Michael Jeffrey Jordan dilahirkan di Brooklyn, New York, Amerika Serikat, pada 17 Februari 1963 adalah pemain bola basket profesional asal Amerika .",
    "Michael Jordan merupakan pemain terkenal di dunia dalam cabang olahraga itu .",
    "Setidaknya ia enam kali merebut kejuaraan NBA bersama kelompok Chicago Bulls (1991-1993, 1996-1998) .",
    "Ia memiliki tinggi badan 198 cm dan merebut gelar pemain terbaik ."
  ]
}

Word2Vec Nearest Words

Request:

POST /word2vec/nearestwords

{
  "string": "mobil",
  "n": 10
}

Response:

{
  "status": "success",
  "data": [
    "motor",
    "dikendarai",
    "sepeda",
    "truk",
    "motornya",
    "mengemudikan",
    "mobil-mobil",
    "mobilnya",
    "mengendarai",
    "pengemudi"
  ]
}

Word2Vec Arithmetic

Request:

POST /word2vec/arithmetic

{
  "first_string": "serang",
  "second_string": "malang",
  "third_string": "surabaya",
  "n": 10
}

Response:

{
  "status": "success",
  "data": [
    "serang",
    "lebak",
    "puloampel",
    "keserangan",
    "bogor",
    "waringinkurung",
    "jawilan",
    "cianjur",
    "garut",
    "padarincang"
  ]
}

Word2Vec Similarity

Request:

POST /word2vec/similarity

{
  "first_string": "sore",
  "second_string": "petang"
}

Response:

{
  "status": "success",
  "data": 0.7748607993125916
}

License

All files in libs and resource directories are the property of Dr. Eng. Ayu Purwarianti, ST.,MT., et al and not part of the license below (Apache License, Version 2.0).

All other custom codes made by Panggi Libersa Jasri Akadol are licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.