Skip to content

Releases: ilius/pyglossary

5.0.4

04 Jan 01:25
ad1a2e7
Compare
Choose a tag to compare

What's Changed

  • Fix regression in glossary_v2.py effecting deprecated Glossary usage
  • Fix docstring for glossary_v2.Glossary.write method
  • Fix broken script scripts/view-glossary-plaintext
  • Feature: include write_options in .info file with --info flag
  • Testing: fix scripts/test.sh not testing deprecated stuff
  • Testing: fix deprecated tests
  • Testing: add SKIP_MISSING env var to skip testing plugins with missing dependencies
  • Fix / update automation scripts
  • Add recent releases' doc
  • Improve and refactor type annotations
  • Break up all plugins into directories (with reader.py and/or writer.py)
  • Fix ruff 0.8.5 errors
  • Some refactoring (as usual)

Full Changelog: 5.0.3...5.0.4

5.0.3

28 Dec 07:06
8c5c1a6
Compare
Choose a tag to compare

What's Changed

  • Fix in PyGlossary icon / logo, visible in light background
  • Web UI: update favicon.ico
  • Mobipocket: refactor, run kindlegen with relative file path, #613
  • Add back pkg/pyglossary.desktop for flathub build, #614
  • Rename plugin ABCMedicalNotes to MakindoMedical (#267)
  • Make plugins' documentation tidier
  • Update project.urls in pyproject.toml according to packaging.python.org
  • Add or update __all__ in imported modules
  • Fewer uses of sys.exit
  • Refactor pyglossary/ui/main.py and add mainNoExit function (c19ba56)

Full Changelog: 5.0.2...5.0.3

5.0.2

24 Dec 10:17
f21ee10
Compare
Choose a tag to compare

What's Changed

  • New PyGlossary icon logo
  • Zimfile: fix possible NameError
  • Web UI: add glossary preview buttons by @glowinthedark in #610
  • Remove plugin: IUPAC goldbook (.xml)
  • Replace all usages of OrderedDict with dict which is ordered since python 3.6
  • FreeDict: improve test coverage
  • Refactor Yomichan and Web UI, and cleanup in BGL
  • Remove pyglossary.pyw, pkg directory and res/resize-16.png

Full Changelog: 5.0.1...5.0.2

5.0.1

19 Dec 06:08
26203fa
Compare
Choose a tag to compare

What's Changed

  • Glossary info: map "creationTime" metadata to/from "date" metadata (used by StarDict)
  • Gettext .po: fix broken syntax due to missing quotations, unescape |, and fix duplicate msgids
  • Wiktextract: improvements and better testing
    • Disable categories by default with an option to enable it
  • FreeDict: refactoring
  • Web UI: add setup.py metadata by @glowinthedark in #609
  • Allow disabling in-memory SQLite with an environment variable
  • Better testing, fix/add type annotations and (as usual) some refactoring

Full Changelog: 5.0.0...5.0.1

5.0.0

14 Dec 08:49
bea1df8
Compare
Choose a tag to compare

Breaking changes for library users

  • 38f8f91 glossary_v2.Glossary class raises Error exception if operation failed, instead of log.critical and return None

    • Applies to these methods: convert, read, write
    • glossary.Glossary (and pyglossary.Glossary) still behaves the same way (return None if failed)
  • a5204bb Breaking changes in Glossary.detectInputFormat and Glossary.detectOutputFormat methods:

    • format argument is renamed to formatName
    • quiet argument is removed (must handle Error exception instead)
  • 9cc2887 Glossary.wordTitleStr: rename _class argument to class_

  • Remove toBytes and replaceStringTable functions from text_utils.py and plugins/formats_common.py
    Breaking change for plugins outside this repo

Deprecated API for library users

  • glossary.Glossary is deprecated, use glossary_v2.Glossary
  • format variable in plugins is deprecated, rename it to name
  • info argument to Glossary() is deprecated. Use glos.setInfo(key, value)
  • Glossary: format arguments to read, directRead and write methods are deprecated, rename them to formatName

What's changed since last version?

We have a web-based user interface by @glowinthedark, a new plugin StardictMergeSyns, new options in various plugins/formats, lots of improvements, refactoring and cleanup.

Full Changelog: 4.7.1...5.0.0

New Contributors

PyPI package is released

4.7.1

16 Sep 17:28
Compare
Choose a tag to compare

Changes since 4.7.0

Breaking changes:

4c78aa4 replace CC-CEDICT plugin with EDICT2 plugin

Bug fixes and improvements:

f5a420c Bugfix: Glossary: removeHtmlTagsAll was ineffective with --sort same for preventDuplicateWords
01b5606 Yomichan: merge entries with same headword, #574
5fe93f4 Yomichan: add beautifulsoup4 to dependencies, #577
2a23966 use python3 in scripts/view-glossary and scripts/diff-glossary to bypass pyenv
c878cbd zimfile: replace OSError on Windows with a warning, #580
1573d5c Wiktextract: rewrite writeSenseExample and fix #572 - Fix TypeError: got invalid input value of type <class 'list'> - Create a list of examples - Add the example type as prefix in bold
7f64af5 Wiktextract: keep warnings in a Counter, remove duplicate messages and show at end

New Features

aa6765b add new plugin xdxf_css (XdxfCss) based on PR #570 by @soshial
0e9d221 add read_options to .info file
fea2223 StarDict Textual writer: save resource files in res/ folder, #558
3800fac add Dyula language, #575
08c41da add glos.readOptions property

Refactoring, linting and testing

6786880 fix ruff preview error in appledict_bin/init.py
fd09e16 github actions: switch to ruff 0.5.2
019740e fix ruff error
69bcbf9 fix ruff preview error: B909 Mutation to loop iterable during iteration
5596b7f switch to ruff 0.6.4
03a509b fix ruff preview errors, use str.removesuffix
6ca9902 fix some mypy errors
eac286b github test: use lxml==5.2 to fix jmdict test
f2eb39d move info writer out of plugins
578c854 fix tests: test_save_info_json
0f4d885 update pyproject.toml
1e20a1a format pyglossary/glossary_v2.py
e231b64 update scripts/format-code
4aa4f09 github action test: remove test cache
acdbede github test: upload failed test files
1f095ad fix test action
9df1ed6 update jmdict test and switch to lxml==5.3

Full Changelog: 4.7.0...4.7.1

4.7.0

16 Jun 17:32
b41161d
Compare
Choose a tag to compare

Changes since 4.6.1

New Contributors

Full Changelog: 4.6.1...4.7.0

4.6.1

10 Mar 12:48
5f6ebd6
Compare
Choose a tag to compare

Changes since 4.6.0

Bug fixes

  • Fix a bug causing broken installation if ~/.local/lib is a symbolic link

    • or site-packages or any of its parents are a symbolic link
  • Fix incompatibilty with Python 3.9 (despite documentation)

  • Fix scripts/entry-filters-doc.py, scripts/plugin-doc.py and doc/entry-filters.md

  • AppleDict: Fix typos in Chinese language module

Features:

  • Use environment variable VERBOSITY as default (a number from 0 to 5)

Improvements

  • AppleDict Binary: set html_full=True by default

  • Update wcwidth to 0.2.6

Refactoring

  • Add glos.stripFullHtml(errorHandler) and use it in 3 plugins

    • Add entry filter StripFullHtml and change entry.stripFullHtml() to return error
  • Refactor entryFiltersRules

  • Remove empty plugin gettext_mo.py

  • Remove glos.titleElement from glossary_v2.Glossary

    • Add to glossary.Glossary for compatibility
    • glossary.Glossary is a wrapper (child class) on top on glossary_v2.Glossary

Documentation

  • Update doc/entry-filters.md to list some entry filters that were enabled conditionally (besides config)

  • Remove sdict.md and sdict_source.md (removed plugins)

Type checking

  • Add missing method in GlossaryType class
  • Fix mypy errors on most of code base and some of plugins
  • Use builtin types list, dict, tuple, set for type annotations
  • Replace Optional[X] with X or None
    • will not effect runtime, but type checking now only works with Python 3.10+

4.6.0

07 Mar 11:27
4b7ae78
Compare
Choose a tag to compare

Changes since 4.5.0

Dependency change

We now require Python 3.9 or a later version.

Bug fixes

  • Fix exception in scripts/plugin-index.py: 8a94b8c

  • StarDict: Fix writing to .zip file produced empty zip, and fix bad test

  • dictunformat: fix #367: add option headword_separator, default to ;

  • Fixes in ui_gtk, #380 #382 #403

  • AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes

  • DictionaryForMIDs: remove | from word when normalizing, fix punctuation regex, use Unix newlines

  • StarDict: use Unix newline when reading and writing .ifo file on Windows

  • Fix bug of glos.addEntryObj(dataEntry) adding empty file because tmpDataDir is not set until glos.read()

    • Set and create tmpDataDir on glos.tmpDataDir access, and add test, #424
  • Fix scripts/wiki-formats.py, #428

  • Dictd / Dict.org: fix exception on Windows

Features

  • Support sorting by an ICU locale, see Sorting section of README

  • Add Gtk4 interface --ui=gtk4 / --gtk4

    • still buggy and not as functional as Gtk3 or Tkinter interfaces
  • Add flag --optimize-memory, config key optimize_memory

    • To enable entry compression on --indirect
    • Not enabled by default (it was previously always compressed)
  • Allow plugin's reader.open() to return an Iterator for progress bar

    • Implement for Tabfile (reading info/metedata)
    • Implement for AppleDict Binary (reading KeyText.data)
  • Add read and write support for StarDict Textual File (.xml), #348

  • Add support for writing Yomichan dictionary files, #395 by @tomtung

  • StarDict reader: support .syn.dz file, #410

  • StarDict writer: add write option large_file, #392 #422

  • StarDict reader: support dxoffsetbits=64 on read, #392 #422

  • JMDict: support examples, #383

  • Add read support for JMnedict, #386

  • Add flag --skip-duplicate-headword, config skip_duplicate_headword, #365

    • Zim reader: remove option skip_duplicate_words, #365
  • Add flag --trim-arabic-diacritics, config trim_arabic_diacritics, #366

  • Add read support for IUPAC goldbook (.xml), #355

  • Add write support for DIKT JSON

  • StarDict writer: limit memory usage by using SQLite for idx and syn data, #409

  • CSV: add newline option, defaulting to Unix-style

  • Aard2 Slob writer: add option file_size_approx_check_num_entries

  • Add scripts/diff-glossary and scripts/view-glossary

Improvements

  • When remove HTML tags, also replace <div> with \n, #394 by @tomtung

    • Treat <div> the same way <p> is treated.
  • Mobi: add mobi7-forcing switch to kindlegen command, #374 by @holyspiritomb

  • Octopus MDict: ignore directories with same_dir_data_files, #362

  • StarDict reader: handle definitions with mixed types/formats

  • Dictfile: strip whitespaces from word and defi before going through entry filters

  • BGL: strip whitespaces from word and defi before going through entry filters

  • Improvement in glos.write: avoid printing exception for invalid encoding

  • Remove empty logs in glos.convert

  • StarDict reader: fix validating sametypesequence, and add test

  • glos.convert: Allow an existing empty directory as output path

  • TextGlossaryReader: replace nextPair method with nextBlock which returns resource files as third item

  • ui_cmd_interactive: allow converting several times before exiting

  • Change title tag for Greek from <big> to <b>

  • Update language data set (langs.json)

  • ui/main.py: print 1-line error instead of full exception on ImportError

  • ui/main.py: Windows: try Tkinter before Gtk

  • ebook_base.py: avoid shutil.move on Windows, #368

  • TextGlossaryReader: fix loading info and some refactoring, #370 36b9cd8

  • Entry: Allow word to be tuple in Entry(word=...)

  • glos.iterInfo() return Iterator rather than Iterable

  • Zim: change dependency to libzim>=1.0, and some comments

  • Mobi: work with kindlegen executable in PATH directories, #401

  • ui: limit the length of option comments in Format Options dialog

  • ui_gtk: improvement: show (last) critical error on status bar

  • ui_gtk: set intial focus

  • ui_gtk: improvements in About tab

  • ui_tk: revert most ttk widgets to tk because the theme doesn't match

  • Add SVG icon, #414 by @proletarius101

  • Prevent exception/traceback on Ctrl+C

  • Optimize progress bar

  • Aard2 slob: show info log before and after slobWriter.finalize(), #437

Removed features

  • Remove read support for Wiktiomary Dump, #48

  • Remove support for Sdictionary Binary and Source

Octopus MDict MDX: features and improvements

  • Support MDict V3 fomrat by updating readmdict, #385 by @xiaoqiangwang

  • Fix files created without UUID in header, #387 by @xiaoqiangwang

    • MdxBuilder 4.0 RC2 and before creates files without UUID header
  • Decode mdict title & description if they're bytes, #393 by @tomtung

  • readmdict: Skip zlib decompress exceptions, #384

  • readmdict: Use __name__ as logger name, and add 2 debug logs, #384

  • readmdict: improve exception msg for xxhash, #385

XDXF: fixes / imrovements, issue #376

  • Support <categ>
  • Support embedded tags in <iref>
  • Fix ignoring <mrkd>
  • Fix extra newlines
  • Get rid of warning for <etm>
  • Fix/improve newline and space issues
  • Fix and improve tests
  • Update url for format description
  • Support any tag/string in <ex>, #396
  • Support reading compressed files directly (.xdxf.gz, .xdxf.bz2, .xdxf.lzma)
  • Allow using XSL using --write-options=xsl=True
  • Update XSL
  • Other improvements in XDXF to HTML transformation

AppleDict Binary: features, bug fixes, improvements, refactoring

  • Fix css name on html_full=True

  • Fix using self._encoding when should use utf-8

  • Fix internal links, #343

    • Remove x-dictionary:d: prefix from href
    • First fix for x-dictionary:r:: use title if present
    • Add bword:// prefix to href (unless it points to http/https)
    • Read entry IDs on open and fix links with x-dictionary:r:
  • Add plistlib to dependencies

  • Add tests

  • Replace <entry ...> with <div>

  • Fix bad exception formatting

  • Fixes from PR #436

  • Support morphology (alternates): #434 by @soshial

  • Support different AppleDict offsets, #417 by @soshial

  • Extract AppleDict meta-info (langs, title, author), #418 by @soshial

  • Progress Bar on open() / loading KeyText.data

  • Improve memory usage of loading KeyText.data

  • Replace appledict_bin.py with appledict_bin directory and more refactoring

Glossary class (glossary.py)

  • Lots of refactoring in glossary.py

    • Improve the design and readability
    • Reduce complexity of methods
    • Move some code into new classes that Glossary inherits from
    • Improve error messages
  • Introduce glossary_v2.py, and maintain API backward-compatibility for glossary.py (as far as documented)

Refactoring

  • Fix style errors using ruff based on pyproject.toml configuration

  • Remove all usages of pyglossary.plugins.formats_common

  • Use str.startswith(tuple) and str.endswith(tuple)

  • Reduce complexity of Glossary methods

  • Rename entry filter strip to trim_whitespaces

  • Some refactoring in StarDict reader

  • Use f-string equal syntax added in Python 3.8

  • Use str.removeprefix and str.removesuffix added in Python 3.9

  • langs/writing_system.py:

    • Change iso field to list
    • Add new scripts
    • Add getAllWritingSystemsFromText
    • More refactoring
  • Split up TextGlossaryReader.loadInfo method

  • plugin_manager.py: make some methods private

Documentation

  • Update plugins' documentation

  • Glossary: add comments about entryFilters

  • Update config.rst

  • Update doc/entry-filters.md

  • Update README.md

  • Update doc/sort-key.md

  • Update doc/pyicu.md

  • Update plugins/testformat.py

  • Add types for arguments and result of all functions/methods

  • Add types for r/w options in reader/writer classes

  • Fix a few incorrect type annotations

  • README.md: Add document for adding data entries, #412

  • README.md: Fix -> nixos command, #400 by @srghma

  • Update bgl_info.md and move it from pyglossary/plugins/babylon_bgl/ to doc/babylon/

Testing

  • Add test for DSL -> Tabfile conversion

  • dsl_test.py: fix method names not starting with test_

  • StarDict reader: better testing for handling definitions with mixed types

  • StarDict writer: much better testing, coverage of stardict.py: from %62 to %83

  • Refactoring and improvements in tests of Glossary, along with new tests

  • Add test for dictunformat -> Tabfile

  • AppleDict (source) tests: validate plist file contents

  • Allow forking and branching pyglossary-test repo

  • Fix some failing tests on Windows

  • Slob: test file_size_approx

  • Test Tabfile -> SQL conversion

  • Test StarDict error/warning for sortKeyName with and without locale

  • Print useful messages for unhandled warnings

  • Improve logs

  • Add showDiff=False arg to compareTextFiles and convert

Packaging

  • Update and refactor Dockerfile and run-with-docker.sh

    • Dockerfile: chan...
Read more

4.5.0

04 Feb 23:19
2433ff5
Compare
Choose a tag to compare

Changes since 4.4.1

Bug fixes

  • Fix 2 log messages in glos._resolveConvertSortParams

  • Fixes and improvements in Dictfile (.df) reader

    • Fix exception: disable loading info (Dicfile does not support info)
    • TextGlossaryReader: prevent producing duplicate data entries
      • This fixes: error in DataEntry.save: [Errno 2] No such file or directory: ... because entry.save() moves the temp file to output path
      • This bug only existed for Dictfile (.df) format.
    • Remove extra colon, #358
    • Remove some extra newline
    • And add test for Dictfile to/from Tabfile
  • Fix not cleaning up temp directory on return with error from glos.convert

Features

  • ui_gtk: add a "General Options" button that opens a dialog for:

    • Settings for sort and sortKey
    • Checkbox for SQLite mode
    • Check boxes for config params: save_info_json, lower, skip_resources, rtl, enable_alts, cleanup, remove_html_all
  • Add support for --sort-key random to shuffle entries

Performance improvements

  • Performance improvement: remove gc.collect() calls in Glossary and *EntryList

    • Not needed since Python 3.8
    • Change minimum python requirement to 3.8 in README.md
  • Do not import all plugin modules (only import two plugins that are used)

    • Load json file plugins-meta/index.json instead
    • In debug mode, all plugin modules are still imported and validated
    • User plugins are still imported

Other improvements

  • Improve detection of languages from glossary name, and add tests
  • Update langs.json: add new 3-letter codes for 25 languages
  • glos.preventDuplicateWords and glos.removeHtmlTagsAll: prevent adding filter twice
  • glos.cleanup: reset path list to avoid (non-critical) error if called again
  • Minor improvements in Glossary.init()
  • DataEntry.save: on FileNotFoundError show a 1-line error instead of log.exception
  • ui_gtk: create a new Glossary object every time Convert button is clicked
  • Add docstring for Glossary.init

Unit testing

  • Update tests/glossary_errors_test.py
  • Add missing cleanup for some temp file
  • add test for LDF to/from Tabfile

Refactoring

  • Plugins: replace import of formats_common from currect directory with pyglossary.plugins.formats_common

  • Fix logging.warn method is deprecated, use warning instead, PR #360 by @BoboTiG

  • Fix DeprecationWarning: invalid escape sequence, PR #361 by @BoboTiG

  • Move some functions from glossary_utils.py to compression.py

  • Move some methods from Glossary to new parent classes PluginManager and GlossaryInfo

  • Some refactoring in plugin_prop.py and plugin_manager.py

    • Rename plugin.pluginModule to plugin.module
    • Minimize direct access to plugin.module, plugin.readerClass or plugin.writerClass
    • Add some new properties to PluginProp
    • Remove a log from glossary.py
    • Disable validation of plugins unless in debug mode
    • plugin_prop.py: fix checking debug level
  • sq_entry_list.py: rename sortColumns to sqliteSortKey

  • Some refactoring around setSortKey between Glossary, EntryList and SqEntryList

  • Remove Entry.sqliteSortKeyFrom and related classmethods

  • Some more simplification in glossary.py

  • Remove Entry.defaultSortKey

  • Some style fixes

  • iter_utils.py: remove unused key= argument from unique_everseen

  • Refactor ui_gtk and update config comments

  • extractInlineHtmlImages: avoid writing file within sub func