Skip to content

Commit

Permalink
change to the way PIDs are specified
Browse files Browse the repository at this point in the history
Also v1.0.0-beta in changelog
  • Loading branch information
kdutia committed May 17, 2021
1 parent 1ee3889 commit e60bcc6
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 4 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

All notable changes documented below.

## 1.0.0-beta
- **enhancement (breaking change):** properties now passed as whitespace-separated list rather than comma-separated. They can also be passed through a config file by giving the `--properties` option a filename to a file that exists.

## 0.3.7
- **fix:** reading from JSON dump forces utf-8
## 0.3.6
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Simple CLI tools to load a subset of Wikidata into Elasticsearch. Part of the [H
- [Loading from Wikidata dump (.ndjson)](#loading-from-wikidata-dump-ndjson)
- [Loading from SPARQL query](#loading-from-sparql-query)
- [Temporary side effects](#temporary-side-effects)

</br>

![PyPI - Downloads](https://img.shields.io/pypi/dm/elastic-wikidata)
Expand Down Expand Up @@ -62,7 +62,7 @@ A full list of options can be found with `ew --help`, but the following are like

- `--index/-i`: the index name to push to. If not specified at runtime, elastic-wikidata will prompt for it
- `--limit/-l`: limit the number of records pushed into ES. You might want to use this for a small trial run before importing the whole thing.
- `--properties/-prop`: pass a comma-separated list of properties to include in the ES index. E.g. *p31,p21*.
- `--properties/-prop`: a whitespace-separated list of properties to include in the ES index e.g. *'p31 p21'*, or the path to a text file containing newline-separated properties e.g. [this one](./pids.sample.cfg).
- `--language/-lang`: [Wikimedia language code](https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all). Only one supported at this time.

### Loading from Wikidata dump (.ndjson)
Expand Down
10 changes: 8 additions & 2 deletions cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from elastic_wikidata import dump_to_es, sparql_to_es
from elastic_wikidata.config import runtime_config
import os
import click
from configparser import ConfigParser

Expand Down Expand Up @@ -44,7 +45,7 @@
"--properties",
"-prop",
type=str,
help="One or more Wikidata property e.g. p31 or p31,p21. Not case-sensitive",
help="One or more Wikidata property e.g. 'p31' or 'p31 p21'. A path to a file containing newline-separated properties can also be passed. Not case-sensitive",
)
@click.option(
"--timeout",
Expand Down Expand Up @@ -117,7 +118,12 @@ def main(
if language:
kwargs["lang"] = language
if properties:
kwargs["properties"] = properties.split(",")
if os.path.exists(properties):
with open(properties, "r") as f:
kwargs["properties"] = f.read().splitlines()
else:
kwargs["properties"] = properties.split()

if disable_refresh:
kwargs["disable_refresh_on_index"] = disable_refresh

Expand Down
3 changes: 3 additions & 0 deletions pids.sample.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
P31
P279
P18

0 comments on commit e60bcc6

Please sign in to comment.