This workshop aims to provide humanities scholars with advanced techniques to efficiently handle, model, transform, and visually present text and tabular data. The focus will be on user-friendly yet powerful tools and libraries to make modern data analysis and visualization methods accessible.
Participants should have basic knowledge of Python, including loops, conditions, functions, lists, dictionaries, and experience in creating a simple word counter.
- Duration: 8 hours (including breaks)
- Format: Theoretical introductions interspersed with practical exercise sessions.
- Tools: Jupyter Notebooks, Python (Pandas, Matplotlib, Seaborn), with optional use of Tableau for additional visualizations, and an introduction to NLP tools like SpaCy and API requests to GPT.
- Goals: Familiarize participants with the Jupyter Notebook environment, which will be used throughout the workshop.
- Content:
- Overview of Jupyter Notebooks: installation, launching, and basic navigation.
- Understanding and creating cells (Markdown vs Code).
- Running code and documenting work within notebooks.
- Practical exercise: Creating a simple notebook with basic Python code and Markdown documentation.
- Goals: Understanding data structures and data modeling concepts.
- Content:
- Brief review of Python data structures.
- Introduction to advanced data structures (Sets, Tuples, Lists) and their use cases.
- Fundamental data modeling concepts: entities, relationships.
- Hands-on exercise: Model a small dataset relevant to the humanities.
- Goals: Learn techniques for data preparation and transformation.
- Content:
- Introduction to Pandas: DataFrames and Series.
- Loading, inspecting, and cleaning data (e.g., text files, CSV).
- Data transformation: filtering, sorting, grouping.
- Practical exercise: Apply concepts to a real dataset (e.g., literary texts, historical records).
- Goals: Basic understanding of NLP and practical use of the SpaCy library.
- Content:
- Overview of NLP and its applications in the humanities.
- Introduction to SpaCy: installation, basic usage
- Practical exercise: Structurize a text with the help of Spacy's data format.
- Goals: Learn the usage of NLP and the SpaCy library.
- Content:
- Overview of NLP and its applications in the humanities.
- We discuss common NLP tasks (tokenization, part-of-speech tagging, named entity recognition)
- Practical exercise: Extract named entities from a sample text and perform basic text analysis.
- Jupyter Notebooks for all practical exercises.
- Datasets from the humanities, prepared for use in the workshop.
- Access to online resources for further information and learning materials.
- It is recommended to bring a personal laptop with pre-installed Python and the mentioned libraries (Pandas, Matplotlib, Seaborn, SpaCy). Installation guides for these tools will be provided prior to the workshop.
- No prior experience with Pandas, visualization tools, or NLP libraries is needed.
Through this workshop, participants will be able to independently conduct data projects from conception to visual presentation and effectively analyze and interpret complex datasets using advanced Python tools.