An Efficient System for Automatic Map Storytelling – A Case Study on Historical Maps

We have set the linked official demo page to 'private' by default to control the costs. If you wish to try it, please send us an Email in order to book a time slot. During this time, the demo page will be set to 'public'.

Description

Historical maps provide valuable information and knowledge about the past. However, as they often feature non-standard projections, hand-drawn styles, and artistic elements, it is challenging for non-experts to identify and interpret them. While existing image captioning methods have achieved remarkable success on natural images, their performance on maps is suboptimal as maps are underrepresented in their pre-training process. Despite the recent advance of GPT-4 in text recognition and map captioning, it still has a limited understanding of maps, as its performance wanes when texts (e.g., titles and legends) in maps are missing or inaccurate. Besides, it is inefficient or even impractical to fine-tune the model with users’ own datasets.

To address these problems, we propose a novel and lightweight map-captioning counterpart. Specifically, we fine-tune the state-of-the-art vision-language model CLIP (Contrastive Language-Image Pre- Training) to generate captions relevant to historical maps and enrich the captions with GPT-3.5 to tell a brief story regarding where, what, when and why of a given map. We propose a novel decision tree architecture to only generate captions relevant to the specified map type. Our system shows invariance to text alterations in maps. The system can be easily adapted and extended to other map types and scaled to a larger map captioning system.

Approach

We first process maps and their metadata automatically from the online map repository David Rumsey Map Collection to generate a training dataset with keyword captions regarding where, what and when and use this dataset to fine-tune different CLIP models. In the inference phase, we propose a decision tree architecture to structure the keyword captions with respect to the map type and use GPT to extend the context (why) and summarize the story. Furthermore, a web interface is developed for interactive storytelling with the decision tree architecture and fine-tuned models loaded at the backend.

Reproduction

Step by step instructions to reproduce our results with our proposed approach.

1. Training prerequisites

git clone https://github.com/claudaff/automatic-map-storytelling && cd automatic-map-storytelling
conda env create -f environment.yml
conda activate map_storytelling

2. Map datasets

Download and unzip the following fifteen .zip files containing our collected maps with associated metadata (1.6 GB overall).

M1, M2, M3, M4, M5, M6, M7, M8, M9, M10, M11, M12, M13, M14, M15

3. Generate ground-truth captions

Run the two scripts CaptionGenerationClassical.py (for topographic maps) and CaptionGenerationPictorial.py (for pictorial maps). The output will be two NumPy arrays (one containing the map image paths and one containing the corresponding ground-truth captions) for each of the six caption categories.

4. CLIP fine-tuning

Run the six fine-tuning scripts fineTuneCLIP{Caption Category}. The output will be six fine-tuned CLIP models. One for each caption category.

Alternatively, download the six fine-tuned models here (3.4 GB overall):

FT1, FT2, FT3, FT4, FT5, FT6

5. Inference

Download our test maps here (less than 50 MB) and unzip: Pictorial Test Maps, Topographic Test Maps
Run the script Inference.py after reading the instructions in the comments. This script allows testing the six fine-tuned models separately on our test maps.

Map Storytelling GUI

To run our map storytelling web app, open the script CaptionInferenceGUI.py, add your own OpenAI API Key and run it. Make sure that the six fine-tuned models (FT1 to FT6) were downloaded.

Alternatively, if no API Key is available a 'light' version of our approach can be tested without GPT. For this open CaptionInferenceLight.py and assign input_map the path to the desired historical map. Running this script will generate corresponding keyword captions with no why part.

BibTeX

@misc{liu2024efficientautomaticmapstorytelling,
      title={An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps}, 
      author={Ziyi Liu and Claudio Affolter and Sidi Wu and Yizi Chen and Lorenz Hurni},
      year={2024},
      eprint={2410.15780},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.15780}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
CaptionGenerationClassical.py		CaptionGenerationClassical.py
CaptionGenerationPictorial.py		CaptionGenerationPictorial.py
CaptionInferenceGUI.py		CaptionInferenceGUI.py
CaptionInferenceLight.py		CaptionInferenceLight.py
Inference.py		Inference.py
README.md		README.md
environment.yml		environment.yml
fineTuneCLIPCentury.py		fineTuneCLIPCentury.py
fineTuneCLIPLocationPict.py		fineTuneCLIPLocationPict.py
fineTuneCLIPLocationTopo.py		fineTuneCLIPLocationTopo.py
fineTuneCLIPMapType.py		fineTuneCLIPMapType.py
fineTuneCLIPStyle.py		fineTuneCLIPStyle.py
fineTuneCLIPTopic.py		fineTuneCLIPTopic.py
testmap_CenturyTopographic.npy		testmap_CenturyTopographic.npy
testmap_LocationPictorial.npy		testmap_LocationPictorial.npy
testmap_LocationTopographic.npy		testmap_LocationTopographic.npy
testmap_MapType.npy		testmap_MapType.npy
testmap_paths.npy		testmap_paths.npy
testmap_pathsPictorial.npy		testmap_pathsPictorial.npy
testmap_pathsTopographic.npy		testmap_pathsTopographic.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Efficient System for Automatic Map Storytelling – A Case Study on Historical Maps

Description

Approach

Reproduction

1. Training prerequisites

2. Map datasets

3. Generate ground-truth captions

4. CLIP fine-tuning

5. Inference

Map Storytelling GUI

BibTeX

About

Contributors 2

Languages

claudaff/automatic-map-storytelling

Folders and files

Latest commit

History

Repository files navigation

An Efficient System for Automatic Map Storytelling – A Case Study on Historical Maps

Description

Approach

Reproduction

1. Training prerequisites

2. Map datasets

3. Generate ground-truth captions

4. CLIP fine-tuning

5. Inference

Map Storytelling GUI

BibTeX

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages