Skip to content

A curated list of awesome resources and tools for Kurdish language technology

License

Notifications You must be signed in to change notification settings

sinaahmadi/awesome-kurdish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Kurdish

(last updated on 04/03/2024)

A curated list of awesome resources, tools and scientific papers for Kurdish language technology

Although I do my best to keep this page as comprehensive as possible by including all projects, the list may not include all the fantastic small and big projects regarding Kurdish language processing. Please be kind and notify me by reaching out by email or through our community on Gitter.

Are you interested in contributing to Kurdish language processing? Check out this post to see how you can do so.

News 🎉

March 2023

  • A few datasets are added for automatic speech recognition and Central Kurdish dialect identification and translation

April 2023

  • A few datasets are added for emotion analysis, summarization and news headline classification
  • Two projects are released for language identification of Zaza-Gorani and Kurdish langauges.
  • A benchmark is released for sentiment analysis of Central Kurdish.

Development

Resources

Language Models

Corpora

Parallel corpora

Dictionaries, terminologies and ontologies

Check out a comprehensive list of Kurdish dictionaries and beware of copyright issues in the following projects:

Datasets

Automatic speech recognition

Benchmarks

Other resources

Word Embeddings:

Tools

Fundamental processing

Machine translation

Named-entity recognition

Optical character recognition

Libraries

Language identification

Other

In addition to these, you can find further information in other repositories and pages as follows:

Research

These references are provided based on the data collected in the paper entitled KLPT – Kurdish Language Processing Toolkit. Note that references are provided in the bibliography file.

Reference Year Field dialects
esmaili2013sorani 2013 Dialectology Sorani, Kurmanji
hassani2016automatic 2016 Dialectology Sorani, Kurmanji
malmasi2016subdialectal 2016 Dialectology Sorani
al2017kurdish 2017 Dialectology Sorani, Kurmanji, Gorani
amani:hal-03262435 2021 Dialectology Kurdish, Zazaki & Gorani
ahmadi2024cordi 2024 Dialectology Sorani varieties
mohammed2012automatic 2012 Information retrieval and Text mining Sorani
esmaili2012challenges 2012 Information retrieval and Text mining Sorani
littell2016named 2016 Information retrieval and Text mining Sorani
hassani2017method 2017 Information retrieval and Text mining Sorani, Kurmanji
esmaAl-Talabaniili2014towards 2014 Information retrieval and Text mining Sorani, Kurmanji
jaf2016simple 2016 Information retrieval and Text mining Sorani
rashid2017robust 2017 Information retrieval and Text mining Sorani
rashid2017automatic 2017 Information retrieval and Text mining Sorani
saeed2018improving 2018 Information retrieval and Text mining Sorani
mustafa2018kurdish 2018 Information retrieval and Text mining Sorani
saeed2018evaluation 2018 Information retrieval and Text mining Sorani
ahmadi2019wergor 2019 Information retrieval and Text mining Sorani
mahmudi2021automated 2021 Information retrieval and Text mining Sorani
abdulrahman2022lmspell 2022 Information retrieval and Text mining Sorani
esmaili2013building 2013 Lexical resources Sorani
aliabadi2014towards 2014 Lexical resources Sorani
aliabadi2014semi 2014 Lexical resources Sorani
ataman2018bianet 2018 Lexical resources Kurmanji
ahmadi2019towards 2019 Lexical resources Sorani, Kurmanji, Gorani
abdulrahman2019developing 2019 Lexical resources Sorani
abdulrahman2020using 2020 Lexical resources Sorani
veisi2020toward 2020 Lexical resources Sorani
ahmadi2020corpus 2020 Lexical resources Sorani
ahmadi-2020-building 2020 Lexical resources Zaza, Gorani
veisi2021jira 2021 Lexical resources Sorani
azin2021sk 2021 Lexical resources Southern Kurdish
hassani2017kurdish 2017 Machine Translation Sorani, Kurmanji
kaka2018english 2018 Machine Translation Sorani
ahmadi2020machine 2020 Machine Translation Sorani
goyal2021flores 2021 Machine Translation 101 languages incl. Sorani
amini2021central 2021 Machine Translation Sorani
ahmadi2022leveraging 2022 Machine Translation Sorani
ahmadi2024cordi 2024 Machine Translation Sorani
baban1995programmable 1995 Morphological and syntactic analysis Sorani
walther2010developing 2010 Morphological and syntactic analysis Sorani
walther2010fast 2010 Morphological and syntactic analysis Kurmanji
salavati2013stemming 2013 Morphological and syntactic analysis Sorani
jaf2014stemmer 2014 Morphological and syntactic analysis Sorani
jaf2016chapter 2016 Morphological and syntactic analysis Sorani
gokirmak2017dependency 2017 Morphological and syntactic analysis Kurmanji
salavati2018building 2018 Morphological and syntactic analysis Sorani
mustafa2018kurdish 2018 Morphological and syntactic analysis Sorani
ahmadi2020towards 2020 Morphological and syntactic analysis Sorani
ahmadi-2020-tokenization 2020 Morphological and syntactic analysis Sorani, Kurmanji
ahmadi2021modelling 2021 Morphological and syntactic analysis Sorani
ahmadi2020Hunspell 2021 Morphological and syntactic analysis Sorani
naserzade2021ckmorph 2021 Morphological and syntactic analysis Sorani
ahmadi2023revisiting 2023 Morphological and syntactic analysis Sorani
mohammed2012uniqueness 2012 Optical character recognition Sorani
mohammed2013handwritten 2013 Optical character recognition Sorani
shaltookisentiment 2016 Optical character recognition Sorani
zarro2017recognition 2017 Optical character recognition Sorani
yaseen2018kurdish 2018 Optical character recognition Sorani
dinler2018kurdish 2018 Optical character recognition Sorani
app11209752 2021 Optical character recognition Sorani
kaka2017building 2017 Other Sorani
mahmudi2021automatic 2021 Other Sorani
ahmadi2021ickl 2021 Other Sorani
ahmadi2023script 2023 Other Sorani, Kurmanji, Gorani
hashim2018kurdish 2018 Sign language recognition Sorani
kamal-hassani-2020-towards 2020 Sign language recognition Sorani
daneshfar2009implementation 2009 Speech recognition Sorani
barkhoda2009comparison 2009 Speech recognition Sorani
bahrampour2009implementation 2009 Speech recognition Sorani
hassani2011kurdish 2011 Speech recognition Sorani
dinler2017formant 2017 Speech recognition Kurmanji
dinler2018extraction 2018 Speech recognition Sorani, Kurmanji
qader2019kurdish 2019 Speech recognition Sorani
delgado2024kaset 2024 Speech recognition Sorani, Kurmanji
ahmadi2024cordi 2024 Speech recognition Sorani varieties
ahmadi-2020-klpt 2020 Toolkits Sorani, Kurmanji
de2021multilingual 2021 Named-entity recognition Kurmanji
abdullah2022 2022 Sentiment analysis Sorani
awlla2022 2022 Sentiment analysis Sorani
amin2022kurdish 2022 Sentiment analysis Sorani
hameed2023sentiment 2023 Sentiment analysis Sorani
zuhair2021 2021 Other Sorani
kamala2022kurdish 2022 Other Sorani
ahmadi2023fieldmatters 2023 Language identification Sorani, Kurmanji, Southern Kurdish, Zazaki, Gorani
ahmadi2023pali 2023 Language identification Sorani, Kurmanji, Southern Kurdish, Gorani

Cite this repository

If you find the provided data useful for your project, feel free to use it and please, cite the following paper, too:

@inproceedings{ahmadi-2020-klpt,
    title = "{KLPT} {--} {K}urdish Language Processing Toolkit",
    author = "Ahmadi, Sina",
    booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlposs-1.11",
    doi = "10.18653/v1/2020.nlposs-1.11",
    pages = "72--84"
}