templeKB

This package contains the corpus and the corpus creation and curation platform explained in the paper 'A Seed Corpus of Hindu Temples In India.' LREC2020. Please cite the paper if you are using this software.

Folder structure

. : Platform
corpus : temple corpus
data : Wikipedia pages and scrapped web pages
models : CQ and QA pretrained model files
output : preprocessing and other intermediate outputs

Requirements:

Python 3.7
Transformer model 'bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin' from https://huggingface.co/transformers/pretrained_models.html
BERT pretrained model 'wwm_uncased_L-24_H-1024_A-16'
SQuAD dataset

Paths in KGconfig.py :

wiki_corpus_path
web_scraped_temple_text_path
bert_path
bert_for_qa
squad_path

Create Corpus

Web Scrape

''' python Scrapper.py --url '''

Create corpus

''' python templeQA_1.py '''

Cite

@inproceedings{radhakrishnan-2020-seed, title = "A Seed Corpus of {H}indu Temples in {I}ndia", author = "Radhakrishnan, Priya", booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://www.aclweb.org/anthology/2020.lrec-1.32", pages = "254--258", abstract = "Temples are an integral part of culture and heritage of India and are centers of religious practice for practicing Hindus. A scientific study of temples can reveal valuable insights into Indian culture and heritage. However to the best of our knowledge, learning resources that aid such a study are either not publicly available or non-existent. In this endeavour we present our initial efforts to create a corpus of Hindu temples in India. In this paper, we present a simple, re-usable platform that creates temple corpus from web text on temples. Curation is improved using classifiers trained on textual data in Wikipedia articles on Hindu temples. The training data is verified by human volunteers. The temple corpus consists of 4933 high accuracy facts about 573 temples. We make the corpus and the platform freely available. We also test the re-usability of the platform by creating a corpus of museums in India. We believe the temple corpus will aid scientific study of temples and the platform will aid in construction of similar corpuses. We believe both these will significantly contribute in promoting research on culture and heritage of a region.", language = "English", ISBN = "979-10-95546-34-4", }

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
corpus		corpus
DietyExtractor.py		DietyExtractor.py
KGconfig.py		KGconfig.py
LanguageExtractor.py		LanguageExtractor.py
Questions.py		Questions.py
README.md		README.md
Scrapper.py		Scrapper.py
WebProcessing.py		WebProcessing.py
WikiProcessing.py		WikiProcessing.py
statistics.py		statistics.py
templeKB.py		templeKB.py
templeQA_1.py		templeQA_1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

templeKB

Folder structure

Requirements:

Paths in KGconfig.py :

Create Corpus

Web Scrape

Create corpus

Cite

About

Releases

Packages

Languages

codefordharma/templeKB

Folders and files

Latest commit

History

Repository files navigation

templeKB

Folder structure

Requirements:

Paths in KGconfig.py :

Create Corpus

Web Scrape

Create corpus

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages