Skip to content

Script for parsing kanji data from the KANJIDIC2 project

Notifications You must be signed in to change notification settings

nramkissoon/Kanjidicparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Kanjidicparser

Kanjidicparser.py is a script for parsing data from the KANJIDIC project: http://www.edrdg.org/wiki/index.php/KANJIDIC_Project

The script exports a json file ("kanji_dict.json") containing a nested dictionary with data for all kanji entries in the KANJIDIC database.

Usage

Make sure kanjidic2.xml is downloaded from the KANJIDIC project site into the same directory as the script and run the script file.

How to use kanji_dict.json

kanji_dict.json is a nested dictionary where each key is an individual kanji. Each kanji key links to another dictionary where kanji-specific information.

Example entry: 増

meanings: 'increase', 'add', 'augment', 'gain', 'promote'
onyomi: 'ゾウ'
kunyomi: 'ま.す', 'ま.し', 'ふ.える', 'ふ.やす'
nanori: 'まし', 'ます'
freq: '231'
jlpt: '2'

Accessing information: kanji_dict.json loaded into variable dict

dict["増"]["meanings"] returns the list ['increase', 'add', 'augment', 'gain', 'promote']

Notes regarding data fields

meanings - definitions in English

onyomi - readings closer to original Chinese readings, usually used for noun and compounds

kunyomi - Japanese readings

freq - frequency of occurrence

jlpt - JLPT level

License informtion

kanjidicparser.py is free to use and modify. Data from the KANJIDIC projects is subject to conditions found at http://www.edrdg.org/edrdg/licence.html.

About

Script for parsing kanji data from the KANJIDIC2 project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages