LE-CAT (py)

Version 0.1.0

LE-CAT is a Lexicon-based Categorization and Analysis Tool developed by the Centre for Interdisciplinary Methodologies in collaboration with the Media of Cooperation Group at the University of Siegen.

The tool allows you to apply a set of word queries associated with a category (a lexicon) to a data set of textual sources (the corpus). LE-CAT determines the frequency of occurrence for each query and category in the corpus, as well as the relations between categories (co-occurrence) by source.

This repository contains a work in progress implmenetation of LE-CAT written in Python. A more extensive implementation in R can be found here.

Usage

Load in pandas and numpy.

import pandas as pd
import numpy as np

Import the lecat functions.

from lecat import parse_lexicon, run_search, run_lecat_analysis, create_unique_total_diagnostics

Read in corpus and lexicon.

corpus = pd.read_excel(r"sample_data\Corpus.xlsx")
lexicon = pd.read_excel(r"sample_data\Lexicon.xlsx")

Parse the lexicon file. The parse_lexicon function converts the wide lexicon file format to a table with one row per query.

parsed_lexicon = parse_lexicon(lexicon)

	Type	Category	Query
0	technology	Apple	iphone
1	technology	Apple	iPad
2	technology	Apple	imac
3	influencers	CIM	Noortje Marres
4	influencers	CIM	James Tripp

The lexicon can then be passed to the lecat analysis function. The corpus, our preferred regular expression and the column we wish to search are also passed to run_lecat_analysis. The regular expression needs to include the word query.

run_lecat_analysis(parsed_lexicon, corpus, 'query', 'description')

The run_lecat_analsysis function counts up the number of query matches and returns a dataframe.

Query	iphone	iPad	imac	Noortje Marres	James Tripp
description
In this iphone and ipad delivered lecture James...	1	0	0	0	0
An interesting interview	0	0	0	0	0
Apple has launched a series of iphones, ipads a...	1	0	1	0	0

The Query column shown above is the index for each row.

The create_unique_total_diagnostics function allows us to summarise the total number of query occurences and also the number of corpus items each query occurs in.

create_unique_total_diagnostics(parsed_lexicon, result)

A nicely formatted table is returned.

	Query	Type Category	unique	total
0	iphone	technology	Apple	2	2
1	iPad	technology	Apple	0	0
2	imac	technology	Apple	1	1
3	Noortje Marres	influencers	CIM	0	0
4	James Tripp	influencers	CIM	0	0

Contributing

Please feel free to either open a pull request or contact James Tripp with contributions. All contributions, questions and suggestions are warmly welcomed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sample_data		sample_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lecat.py		lecat.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LE-CAT (py)

Usage

Contributing

About

Releases

Packages

Languages

License

WarwickCIM/lecat-py

Folders and files

Latest commit

History

Repository files navigation

LE-CAT (py)

Usage

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages