Skip to content

Commit

Permalink
Merge branch 'master' into macos
Browse files Browse the repository at this point in the history
  • Loading branch information
tdoan2010 committed Aug 15, 2022
2 parents dd58903 + 4148a88 commit aa4b776
Show file tree
Hide file tree
Showing 57 changed files with 1,095 additions and 491 deletions.
52 changes: 50 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,50 @@ Versioned according to [Semantic Versioning](http://semver.org/).

## Unreleased

## [2.38.0] - 2022-08-14

Fixed:

* `ocrd zip`: Properly respect `Ocrd-Mets`, #899
* `ocrd workspace merge`: missing arguments, #896
* `ocrd resmgr download`: Support dynamic discovery, #901

Added:

* Processors support profiling with `--profile` and `--profile-file`, #878, bertsky/core#4

Removed:

* `ocrd zip`: remove support for obsolete `Ocrd-Manifestation-Depth`, #902, OCR-D/spec#182

## [2.37.0] - 2022-08-03

Added:

* `ocrd resmgr`: Resources of processors can be described in the `ocrd-tool.json`, #800

## [2.36.0] - 2022-07-18

Fixed:

* `ocrd_utils.generate_range`: `maxsplits` should be 1, not 2, #880
* Typos in CHANGELOG, README and code comments, #890

Changed:

* Consistenly use snake_case but continue to support CamelCase for kwargs and CLI options, #874, #862
* Update to spec to 3.19.0, introducing greater flexibility in describing parameters, #872, #848, OCR-D/spec#206
* `ocrd workspace merge`: support mapping `file_id` and `page_id` in addition to `file_grp`, #886, #888
* `ocrd workspace merge`: rebase `OcrdFile.url` to target workspace, #887, #888
* Replace `resource_filename` et al from pkg_resources with faster alternatives, #881, #882

## [2.35.0] - 2022-06-02

Changed:

* OCRD-ZIP: Drop `Ocrd-Manifestation-Depth` and disallow `fetch.txt`, OCR-D/spec#182
* Parameters can now be described with most JSON-Schema constructs, OCR-D/spec#206, #848

## [2.34.0] - 2022-05-20

Added:
Expand Down Expand Up @@ -322,7 +366,7 @@ Changed:

Fixed:

* As a workaround for tensorflow compatiblity, require `numpy < 1.19.0`, #620
* As a workaround for tensorflow compatibility, require `numpy < 1.19.0`, #620

## [2.17.1] - 2020-10-05

Expand Down Expand Up @@ -553,7 +597,7 @@ Changed:
* `Workspace.remove_file`: Optional `page_same_gropup` parameter to remove
only those images linked in PAGE that are in the same group as the PAGE-XML
* `Workspace.remove_file_gropup`: The same `page_recursive` and `page_same_gropup` parameters as `Workspace.remove_file`
* `WorkspaceValidator.check_file_grp` now accepts a `page_id` parameter and will no raise an error if an exisitng
* `WorkspaceValidator.check_file_grp` now accepts a `page_id` parameter and will not raise an error if an existing
output file group is targeted but for pages that aren't in that group, #471
* `ocrd_cli_wrap_processor`: Take `page_id` into account when doing `WorkspaceValidator.check_file_grp`
* `run_cli` accepts an `overwrite` parameter to pass on to processor calls, #471
Expand Down Expand Up @@ -1482,6 +1526,10 @@ Fixed
Initial Release

<!-- link-labels -->
[2.38.0]: ../../compare/v2.38.0..v2.37.0
[2.37.0]: ../../compare/v2.37.0..v2.36.0
[2.36.0]: ../../compare/v2.36.0..v2.35.0
[2.35.0]: ../../compare/v2.35.0..v2.34.0
[2.34.0]: ../../compare/v2.34.0..v2.33.0
[2.33.0]: ../../compare/v2.33.0..v2.32.0
[2.32.0]: ../../compare/v2.32.0..v2.31.0
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ deps-test:

# (Re)install the tool
install:
$(PIP) install -U pip wheel
$(PIP) install -U pip wheel setuptools fastentrypoints
for mod in $(BUILD_ORDER);do (cd $$mod ; $(PIP_INSTALL) .);done

# Install with pip install -e
Expand Down Expand Up @@ -147,7 +147,7 @@ assets: repo/assets
test: assets
HOME=$(CURDIR)/ocrd_utils $(PYTHON) -m pytest --continue-on-collection-errors -k TestLogging $(TESTDIR)
HOME=$(CURDIR) $(PYTHON) -m pytest --continue-on-collection-errors -k TestLogging $(TESTDIR)
$(PYTHON) -m pytest --continue-on-collection-errors --ignore=$(TESTDIR)/test_logging.py $(TESTDIR)
$(PYTHON) -m pytest --continue-on-collection-errors --durations=10 --ignore=$(TESTDIR)/test_logging.py $(TESTDIR)

test-profile:
$(PYTHON) -m cProfile -o profile $$(which pytest)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ pip install ocrd_modelfactory

All python software released by [OCR-D](https://github.com/OCR-D) requires Python 3.6 or higher.

**NOTE** Some OCR-D-Tools (or even test cases) _might_ reveal an unintended behavior if you have specific enviroment modifications, like:
**NOTE** Some OCR-D-Tools (or even test cases) _might_ reveal an unintended behavior if you have specific environment modifications, like:
* using a custom build of [ImageMagick](https://github.com/ImageMagick/ImageMagick), whose format delegates are different from what OCR-D supposes
* custom Python logging configurations in your personal account

Expand Down
18 changes: 13 additions & 5 deletions ocrd/ocrd/cli/ocrd_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from json import dumps
import codecs
import sys

import os
import click

from ocrd.decorators import parameter_option, parameter_override_option
Expand Down Expand Up @@ -96,15 +96,23 @@ def ocrd_tool_tool_description(ctx):
@ocrd_tool_tool.command('list-resources', help="List tool's file resources")
@pass_ocrd_tool
def ocrd_tool_tool_list_resources(ctx):
Processor(None, ocrd_tool=ctx.json['tools'][ctx.tool_name],
list_resources=True)
class BashProcessor(Processor):
@property
def moduledir(self):
return os.path.dirname(ctx.filename)
BashProcessor(None, ocrd_tool=ctx.json['tools'][ctx.tool_name],
list_resources=True)

@ocrd_tool_tool.command('show-resource', help="Dump a tool's file resource")
@click.argument('res_name')
@pass_ocrd_tool
def ocrd_tool_tool_show_resource(ctx, res_name):
Processor(None, ocrd_tool=ctx.json['tools'][ctx.tool_name],
show_resource=res_name)
class BashProcessor(Processor):
@property
def moduledir(self):
return os.path.dirname(ctx.filename)
BashProcessor(None, ocrd_tool=ctx.json['tools'][ctx.tool_name],
show_resource=res_name)

@ocrd_tool_tool.command('help', help="Generate help for processors")
@pass_ocrd_tool
Expand Down
185 changes: 116 additions & 69 deletions ocrd/ocrd/cli/resmgr.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,21 @@
:nested: full
"""
import sys
from os import environ
from pathlib import Path
from distutils.spawn import find_executable as which
from yaml import safe_load, safe_dump

import requests
import click

from ocrd_utils import (
initLogging,
directory_size,
getLogger,
RESOURCE_LOCATIONS
RESOURCE_LOCATIONS,
)
from ocrd.constants import RESOURCE_USER_LIST_COMMENT

from ..resource_manager import OcrdResourceManager

Expand All @@ -39,13 +43,14 @@ def resmgr_cli():
initLogging()

@resmgr_cli.command('list-available')
@click.option('-e', '--executable', help='Show only resources for executable EXEC', metavar='EXEC')
def list_available(executable=None):
@click.option('-D', '--no-dynamic', is_flag=True, default=False, help="Whether to skip looking into each processor's --dump-json for module-level resources")
@click.option('-e', '--executable', help='Show only resources for executable beginning with EXEC', metavar='EXEC', default='ocrd-*')
def list_available(executable, no_dynamic):
"""
List available resources
"""
resmgr = OcrdResourceManager()
for executable, reslist in resmgr.list_available(executable):
for executable, reslist in resmgr.list_available(executable=executable, dynamic=not no_dynamic):
print_resources(executable, reslist, resmgr)

@resmgr_cli.command('list-installed')
Expand All @@ -55,94 +60,136 @@ def list_installed(executable=None):
List installed resources
"""
resmgr = OcrdResourceManager()
ret = []
for executable, reslist in resmgr.list_installed(executable):
print_resources(executable, reslist, resmgr)

@resmgr_cli.command('download')
@click.option('-n', '--any-url', help='Allow downloading/copying unregistered resources', is_flag=True)
@click.option('-n', '--any-url', help='URL of unregistered resource to download/copy from', default='')
@click.option('-D', '--no-dynamic', is_flag=True, default=False, help="Whether to skip looking into each processor's --dump-json for module-level resources")
@click.option('-t', '--resource-type', help='Type of resource', type=click.Choice(['file', 'directory', 'archive']), default='file')
@click.option('-P', '--path-in-archive', help='Path to extract in case of archive type', default='.')
@click.option('-a', '--allow-uninstalled', help="Allow installing resources for uninstalled processors", is_flag=True)
@click.option('-o', '--overwrite', help='Overwrite existing resources', is_flag=True)
@click.option('-l', '--location', help='Where to store resources', type=click.Choice(RESOURCE_LOCATIONS), default='data', show_default=True)
@click.argument('executable', required=True)
@click.argument('url_or_name', required=False)
def download(any_url, allow_uninstalled, overwrite, location, executable, url_or_name):
@click.argument('name', required=False)
def download(any_url, no_dynamic, resource_type, path_in_archive, allow_uninstalled, overwrite, location, executable, name):
"""
Download resource URL_OR_NAME for processor EXECUTABLE.
Download resource NAME for processor EXECUTABLE.
NAME is the name of the resource made available by downloading or copying.
URL_OR_NAME can either be the ``name`` or ``url`` of a registered resource.
If NAME is '*' (asterisk), then download all known registered resources for this processor.
If URL_OR_NAME is '*' (asterisk), download all known resources for this processor
If ``--any-url=URL`` or ``-n URL`` is given, then URL is accepted regardless of registered resources for ``NAME``.
(This can be used for unknown resources or for replacing registered resources.)
If ``--any-url`` is given, also accepts URL or filenames of non-registered resources for ``URL_OR_NAME``.
If ``--resource-type`` is set to `archive`, then that archive gets unpacked after download,
and its ``--path-in-archive`` will subsequently be renamed to NAME.
"""
log = getLogger('ocrd.cli.resmgr')
resmgr = OcrdResourceManager()
basedir = resmgr.location_to_resource_dir(location)
if executable != '*' and not url_or_name:
log.error("Unless EXECUTABLE ('%s') is the '*' wildcard, URL_OR_NAME is required" % executable)
if executable != '*' and not name:
log.error("Unless EXECUTABLE ('%s') is the '*' wildcard, NAME is required" % executable)
sys.exit(1)
elif executable == '*':
executable = None
is_url = (url_or_name.startswith('https://') or url_or_name.startswith('http://')) if url_or_name else False
is_filename = Path(url_or_name).exists() if url_or_name else False
if name == '*':
name = None
is_url = (any_url.startswith('https://') or any_url.startswith('http://')) if any_url else False
is_filename = Path(any_url).exists() if any_url else False
if executable and not which(executable):
if not allow_uninstalled:
log.error("Executable %s is not installed. Is there a typo in the executable? " \
"To install resources for uninstalled processor, use the -a/--allow-uninstalled flag" % executable)
log.error("Executable '%s' is not installed. " \
"To download resources anyway, use the -a/--allow-uninstalled flag", executable)
sys.exit(1)
else:
log.warning("Executable %s is not installed but -a/--allow-uninstalled was given, so proceeding" % executable)
find_kwargs = {'executable': executable}
if url_or_name and url_or_name != '*':
find_kwargs['url' if is_url else 'name'] = url_or_name
reslist = resmgr.find_resources(**find_kwargs)
log.info("Executable %s is not installed, but " \
"downloading resources anyway", executable)
reslist = resmgr.list_available(executable=executable, dynamic=not no_dynamic)
if name:
reslist = [(executable, r) for _, rs in reslist for r in rs if r['name'] == name]
if not reslist:
log.info("No resources found in registry")
if any_url and (is_url or is_filename):
log.info("%s unregistered resource %s" % ("Downloading" if is_url else "Copying", url_or_name))
if is_url:
with requests.get(url_or_name, stream=True) as r:
content_length = int(r.headers.get('content-length'))
else:
url_or_name = str(Path(url_or_name).resolve())
content_length = Path(url_or_name).stat().st_size
with click.progressbar(length=content_length, label="Downloading" if is_url else "Copying") as bar:
fpath = resmgr.download(
executable,
url_or_name,
overwrite=overwrite,
basedir=basedir,
no_subdir=location == 'cwd',
progress_cb=lambda delta: bar.update(delta))
log.info("%s resource '%s' (%s) not a known resource, creating stub in %s'" % (executable, fpath.name, url_or_name, resmgr.user_list))
resmgr.add_to_user_database(executable, fpath, url_or_name)
log.info("%s %s to %s" % ("Downloaded" if is_url else "Copied", url_or_name, fpath))
log.info("Use in parameters as '%s'" % fpath.name)
log.info(f"No resources {name} found in registry for executable {executable}")
if executable and name:
reslist = [(executable, {'url': any_url or '???', 'name': name,
'type': resource_type,
'path_in_archive': path_in_archive})]
for executable, resdict in reslist:
if 'size' in resdict:
registered = "registered"
else:
sys.exit(1)
else:
for executable, resdict in reslist:
if not allow_uninstalled and not which(executable):
log.info("Skipping installing resources for %s as it is not installed. (Use -a/--allow-uninstalled to force)")
continue
if resdict['url'] == '???':
log.info("Cannot download user resource %s" % (resdict['name'])),
continue
log.info("Downloading resource %s" % resdict)
with click.progressbar(length=resdict['size']) as bar:
fpath = resmgr.download(
executable,
resdict['url'],
name=resdict['name'],
resource_type=resdict['type'],
path_in_archive=resdict.get('path_in_archive', '.'),
overwrite=overwrite,
size=resdict['size'],
no_subdir=location == 'cwd',
basedir=basedir,
progress_cb=lambda delta: bar.update(delta)
)
log.info("Downloaded %s to %s" % (resdict['url'], fpath))
log.info("Use in parameters as '%s'" % resmgr.parameter_usage(resdict['name'], usage=resdict['parameter_usage']))
registered = "unregistered"
if any_url:
resdict['url'] = any_url
if resdict['url'] == '???':
log.warning("Cannot download user resource %s", resdict['name'])
continue
if resdict['url'].startswith('https://') or resdict['url'].startswith('http://'):
log.info("Downloading %s resource '%s' (%s)", registered, resdict['name'], resdict['url'])
with requests.get(resdict['url'], stream=True) as r:
resdict['size'] = int(r.headers.get('content-length'))
else:
log.info("Copying %s resource '%s' (%s)", registered, resdict['name'], resdict['url'])
urlpath = Path(resdict['url'])
resdict['url'] = str(urlpath.resolve())
if Path(urlpath).is_dir():
resdict['size'] = directory_size(urlpath)
else:
resdict['size'] = urlpath.stat().st_size
with click.progressbar(length=resdict['size']) as bar:
fpath = resmgr.download(
executable,
resdict['url'],
name=resdict['name'],
resource_type=resdict.get('type', resource_type),
path_in_archive=resdict.get('path_in_archive', path_in_archive),
overwrite=overwrite,
size=resdict['size'],
no_subdir=location == 'cwd',
basedir=basedir,
progress_cb=lambda delta: bar.update(delta)
)
if registered == 'unregistered':
log.info("%s resource '%s' (%s) not a known resource, creating stub in %s'", executable, name, any_url, resmgr.user_list)
resmgr.add_to_user_database(executable, fpath, url=any_url)
resmgr.save_user_list()
log.info("Installed resource %s under %s", resdict['url'], fpath)
log.info("Use in parameters as '%s'", resmgr.parameter_usage(resdict['name'], usage=resdict.get('parameter_usage', 'as-is')))

@resmgr_cli.command('migrate')
@click.argument('migration', type=click.Choice(['2.37.0']))
def migrate(migration):
"""
Update the configuration after updating core to MIGRATION
"""
resmgr = OcrdResourceManager(skip_init=True)
log = getLogger('ocrd.resmgr.migrate')
if not resmgr.user_list.exists():
log.info(f'No configuration file found at {resmgr.user_list}, nothing to do')
if migration == '2.37.0':
backup_file = resmgr.user_list.with_suffix(f'.yml.before-{migration}')
yaml_in_str = resmgr.user_list.read_text()
log.info(f'Backing {resmgr.user_list} to {backup_file}')
backup_file.write_text(yaml_in_str)
log.info(f'Applying migration {migration} to {resmgr.user_list}')
yaml_in = safe_load(yaml_in_str)
yaml_out = {}
for executable, reslist_in in yaml_in.items():
yaml_out[executable] = []
for resdict_in in reslist_in:
resdict_out = {}
for k_in, v_in in resdict_in.items():
k_out, v_out = k_in, v_in
if k_in == 'type' and v_in in ['github-dir', 'tarball']:
if v_in == 'github-dir':
v_out = 'directory'
elif v_in == 'tarball':
v_out = 'directory'
resdict_out[k_out] = v_out
yaml_out[executable].append(resdict_out)
resmgr.user_list.write_text(RESOURCE_USER_LIST_COMMENT +
'\n# migrated with ocrd resmgr migrate {migration}\n' +
safe_dump(yaml_out))
log.info(f'Applied migration {migration} to {resmgr.user_list}')
Loading

0 comments on commit aa4b776

Please sign in to comment.