04 Feb 23:19

ilius

2433ff5

4.5.0

Changes since 4.4.1

Bug fixes

Fix 2 log messages in glos._resolveConvertSortParams
Fixes and improvements in Dictfile (.df) reader
- Fix exception: disable loading info (Dicfile does not support info)
- TextGlossaryReader: prevent producing duplicate data entries
  - This fixes: error in DataEntry.save: [Errno 2] No such file or directory: ... because entry.save() moves the temp file to output path
  - This bug only existed for Dictfile (.df) format.
- Remove extra colon, #358
- Remove some extra newline
- And add test for Dictfile to/from Tabfile
Fix not cleaning up temp directory on return with error from glos.convert

Features

ui_gtk: add a "General Options" button that opens a dialog for:
- Settings for sort and sortKey
- Checkbox for SQLite mode
- Check boxes for config params: save_info_json, lower, skip_resources, rtl, enable_alts, cleanup, remove_html_all
Add support for --sort-key random to shuffle entries

Performance improvements

Performance improvement: remove gc.collect() calls in Glossary and *EntryList
- Not needed since Python 3.8
- Change minimum python requirement to 3.8 in README.md
Do not import all plugin modules (only import two plugins that are used)
- Load json file plugins-meta/index.json instead
- In debug mode, all plugin modules are still imported and validated
- User plugins are still imported

Other improvements

Improve detection of languages from glossary name, and add tests
Update langs.json: add new 3-letter codes for 25 languages
glos.preventDuplicateWords and glos.removeHtmlTagsAll: prevent adding filter twice
glos.cleanup: reset path list to avoid (non-critical) error if called again
Minor improvements in Glossary.init()
DataEntry.save: on FileNotFoundError show a 1-line error instead of log.exception
ui_gtk: create a new Glossary object every time Convert button is clicked
Add docstring for Glossary.init

Unit testing

Update tests/glossary_errors_test.py
Add missing cleanup for some temp file
add test for LDF to/from Tabfile

Refactoring

Plugins: replace import of formats_common from currect directory with pyglossary.plugins.formats_common
Fix logging.warn method is deprecated, use warning instead, PR #360 by @BoboTiG
Fix DeprecationWarning: invalid escape sequence, PR #361 by @BoboTiG
Move some functions from glossary_utils.py to compression.py
Move some methods from Glossary to new parent classes PluginManager and GlossaryInfo
Some refactoring in plugin_prop.py and plugin_manager.py
- Rename plugin.pluginModule to plugin.module
- Minimize direct access to plugin.module, plugin.readerClass or plugin.writerClass
- Add some new properties to PluginProp
- Remove a log from glossary.py
- Disable validation of plugins unless in debug mode
- plugin_prop.py: fix checking debug level
sq_entry_list.py: rename sortColumns to sqliteSortKey
Some refactoring around setSortKey between Glossary, EntryList and SqEntryList
Remove Entry.sqliteSortKeyFrom and related classmethods
Some more simplification in glossary.py
Remove Entry.defaultSortKey
Some style fixes
iter_utils.py: remove unused key= argument from unique_everseen
Refactor ui_gtk and update config comments
extractInlineHtmlImages: avoid writing file within sub func

Contributors

BoboTiG

Assets 3

25 Jan 10:22

ilius

4.4.1

663748c

4.4.1

Changes since 4.4.0

Bug fixes

Automatically create cacheDir on Glossary.init()
- Fixes exception in SQLite mode

Features

ui_cmd_interactive: support setting sortKey

Improvements and documentation

Wiktionary Dump: remove detect-by-extension
glossary.py: update docstrings for sortKeyName
sort_keys.py: add desc to NamedSortKey
Update doc/sort-key.md

Assets 3

24 Jan 17:39

ilius

4.4.0

cfd61e8

4.4.0

Changes since 4.3.0

Breaking changes

Remove partial sorting support (obsolete feature)
- Remove --sort-cache-size flag in command line
- (For library users) Remove sortCacheSize argument to glos.write and glos.convert
Re-design sorting and sortKey parameters
- Breaking change for library users, and user plugins that need sorting (sortOnWrite = ALWAYS)
- Change glos.convert
  - Replace argument sortKey (Callable) with sortKeyName (str)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change glos.write
  - Replace argument sortKey (Callable) with namedSortKey (sort_keys.NamedSortKey)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change glos.sortWords
  - Replace argument key (Callable) with sortKeyName (str)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change API of plugins that use sortOnWrite = ALWAYS
  - Replace writer.sortKey and Writer.sqliteSortKey with sortKeyName in plugin module.
  - See the stardict.py for example.
Note 1: All sortKey and sortEncoding arguments are optional.

Note 2: Values of sortKeyName are documented in doc/sort-key.md
Rename 2 files in doc/:
- Rename doc/entry_filters.md to doc/entry-filters.md
- Rename doc/term_colors.md to doc/term-colors.md

Features

--sort-key and --sort-encoding command line flags (as part of above re-design)
- See README.md and doc/sort-key.md.
Now SQLite mode works for all output formats.

Bug fixes

Fix lack of Progress Bar while writing in indirect or SQLite mode
Fix misleading message log about SQLite mode
Fix unclosed files in XDXF and FreeDict plugins

Improvements

Show a 1-line log instead of FileNotFoundError traceback in glos.read and glos.write
Close readers in glos.convert if write failed
Fix some type annotations and comments
(For library users) Change Glossary.__str__
(For library users) glos.setInfo: convert non-str value to str, and add tests

Unit testing

Add new tests and improve existing tests.

Coverage of glossary.py: %89
Overall coverage of codebase + plugins: %58

Refactoring and design improvements

Simplify by passing glos object to EntryList()
Replace SqList with SqEntryList
Change __iter__ of SqEntryList and EntryList to give entry objects
Simplify Glossary by moving gc.collect to EntryList and SqEntryList
Remove unused function xml_unescape
Remove unused import from FreeDict and JMDict plugins
Use operator.itemgetter in stardict.py, dict_cc.py, ebook_kobo.py, reverse.py
glossary.py: cleanup, simplify and optimize generators logic
- Also remove index argument from entryFilter.run method and add some comments
Remove redundant check in glos.progress
Remove redundant check in _getLangByStr
Remove redundant check in Glossary.detectOutputFormat

Assets 3

15 Jan 12:18

ilius

4.3.0

cf4db2b

4.3.0

Changes since 4.2.1

Bug fixes

Tabfile writer: fix replacing \ with \\
--remove-html flag: fix bad regex
ui_cmd_interactive: fix a few bugs
Lowercase word/entry links (<a href="bword://...) when --lower flag is passed
TextGlossaryWriter: do not skip words that start with #
Fix StdLogHandler: was not applying --no-color
Fix checking for sys.frozen

New features

Add auto_sqlite config parameter
- to use SQLite mode for StarDict and EPUB-2 (which require sorting) by default
- also allow overriding it with --no-sqlite flag
Add 3 config parameters allow changing log colors in terminal:
- color.cmd.critical
- color.cmd.error
- color.cmd.warning
Add 2 keys to config to enable/disable colors in Unix and Windows separately
- color.enable.cmd.unix: default true
- color.enable.cmd.windows: default false

New features for library users

Allow glos.setInfo(key, None) to delete the info / metadata key
Add glos.alts property as shortcut, and use it internally

Design improvements

Change rawEntry[0] from bytes to List[str] and avoid split/join when converting rawEntry <-> entry.
This fixes some very edge cases involving | in words, but uses more RAM in indirect mode (converting to StarDict), which can be solved with --sqlite.

Documentation

Replace doc/config.md with doc/config.rst, update comments and other improvements
Generate doc/entry_filters.md
Update plugins doc
Update README.md

Unit testing

Coverage of glossary.py: %75

There are 2501 lines of test code in tests directory.

Tests for Glossary class include:

Basic functionality
Error handling
Sorting and direct / indirect / SQLite modes
Entry filter config/flags (lower, rtl, remove_html, remove_html_all)
Resources / data entries
Convert: Tabfile <-> Aard2 slob
Convert: Tabfile <-> CSV
Convert: Tabfile -> EPUB-2
Convert: Tabfile -> JSON
Convert: Tabfile <-> StarDict

Other improvements:

glossary_test.py: check CRC32 of downloaded test files
glossary_test.py: use a new temp dir for each test method for isolation.
ebook_kobo_test.py: split into several test methods

Improvements

Zim: make improvements, #352
Aard2 slob: add 2 mime types, #352
ui/main.py: do not allow --remove-html and --remove-html-all together
Glossary: do not allow glos.config to be set twice
Glossary: change some error logs to critical, and more improvements
Prevent conflicting config flags together, like --lower --no-lower
Disable utf8_check config parameter by default (not needed since 3.0.0)

Refactoring and cleanup

Glossary: some refactoring in convert method
Rename 3 scripts in scripts/ directory
Remove DataEntry.fromFile and improve behavior of DataEntry.__init__
Refactoring in ui/
rename option.cmdFlag to option.customFlag
Glossary: add glos.rawEntryCompress property, and use in entry.py
Glossary: minor improvement in loadPlugins
XDXF: remove useless argument in Reader.open
remove unused some functions from text_utils.py
plugin_prop.py: refactor getExtraOptions
Avoid assigning protected attrs in text_writer.py and plugins/tabfile.py
Fewer protected attr access in entry_filters.py
Move sortKey and get_prefix implementations from ebook_base.py to epub and mobi plugins
Change name of 2 entry filters to match the config param

Assets 3

26 Dec 20:01

ilius

4.2.1

c0d0eef

4.2.1

Changes since version 4.2.0

Minor bug fixes and improvements:

text_utils.py
- Minor bug: fix legacy function urlToPath using urllib.parse.unquote
- Minor bug: replacePostSpaceChar: remove trailing space from the output str
- Cleanup:
  - Remove unused function isControlChar
  - Remove unused function formatByteStr
  - Remove argument exclude from function isASCII
- Add unit tests
ui_cmd_interactive.py: fix a minor bug and some small refactoring
Command line: Override input glossary info with --source-lang and --target-lang flags
Add unit tests for CSV -> Tabfile conversion
CSV plugin: some refactoring, and rename the module to csv_plugin.py
Update setup.py: add python_requires=">=3.7.0", update extras_require
Update README.md

Fearures:

Command line: Add --name flag for changing glossary name
Glossary: convert: add infoOverride optional argument

Assets 3

20 Dec 08:30

ilius

4.2.0

1b1450c

4.2.0

Changes since 4.1.0

Breaking changes:
- Replace glos.getAuthor() with glos.author
  - This looks for "author" and then "publisher" keys in info/metadata
- Rename option apply_css to css for mobi and epub2
- glos.getInfo and glos.setInfo only accept str as key (or a subclass of str)
Bug fixes:
- Indirect mode: Fix handling '|' character in words.
 - Escape/unescape | in words when converting entry <-> rawEntry
- Escape/unescape | in words when writing/reading text-based file formats
- JSON: Prevent duplicate keys in json output, #344
 - Add new method glos.preventDuplicateWords()
Features and improvements
- Add SQLite mode with --sqlite flag for converting to StarDict.
  - Eliminates the need to load all entries into RAM, limiting RAM usage.
  - You can add --sqlite to you command, even for running GUI.
    - For example: python3 main.py --tk --sqlite
  - See README.md for more details.
- Add --source-lang and --target-lang flags
- XDXF: support more tags and improvements
- Add unit tests for Glossary class, and some functions in text_utils.py
- Windows: change cache directory to %LOCALAPPDATA%
- Some refactoring and optimization
- Update, improve and re-format documentations

Assets 3

01 Dec 15:44

ilius

4.1.0

11b1710

4.1.0

There are a lot of changes since last release, but here is what I could gather and organize!
Please see the commit list for more!

Improvements in ui_gtk
Improvements in ui_tk
Improvements in ui_cmd_interactive
Refactoring and improvements in ui-related codebase
Fix not loading config with --ui=none
Code style fixes and cleanup
Documentation
- Update most documentations.
- Add comments for read/write options.
- Generate documentation for all formats
  - Placed in doc/p, linked to in README.md
  - Generating with scripts/plugin-doc-gen.py script
  - Read list of dictionary tools/applicatios from TOML files in plugins-meta/tools
Add Dockerfile and run-with-docker.sh script
New command-line flags:
- --json-read-options and --json-write-options
  - To allow using ; in option values
  - Example: '--json-write-options={"delimiter": ";"}'
- --gtk, --tk and --cmd as shortcut for --ui=gtk etc
- --rtl to change direction of definitions, #268, also added to config.json
Fix non-working --remove-html flag
Changes in Glossary class
- Rename glos.getPref to glos.getConfig
- Change formatsReadOptions and formatsWriteOptions to Dict[str, OrderedDict[str, Any]]
  - to include default values
- remove glos.writeTabfile, replace with a func in pyglossary/text_writer.py
- Glossary.init: avoid showing error if user plugin directory does not exist
Fixes and improvements code base
- Prevent dataEntry.save() from raising exception because of invalid filename or permission
- Avoid exception if removing temp file/folder failed
- Avoid mktemp and more improvements
  - use ~/.cache/pyglossary/ directory instead of /tmp/
- Fixes and improvements in runDictzip
- Raise RuntimeError instead of StopIteration when iterating over a non-open reader
- Avoid exception if no zip command was found, fix #294
- Remove directory after creating .zip, and some refactoring, #294
- DataEntry: replace inTmp argument with tmpPath argument
- Entry: fix html pattern for hyperlinks, #330
- Fix incorrect virutal env directory detection
- Refactor dataDir detection, #307 #316
- Show warning if failed to create user plugins directory
- fix possible exception in log.emit
- Add support for Conda in dataDir detection, #321
- Fix f-string in StdLogHandler.emit
Fixes and improvements in Windows
- Fix bad dataDir on Windows, #307
- Fix shutil.rmtree exception on Windows
- Support creating .zip on Windows 10, #294
- Check zip command before tar on Windows, #294
- Show graphical error on exceptions on Windows
- Fix dataDir detection on Windows, #323 $324
Changes in Config:
- Rename config key skipResources to skip_resources
  - Add it to config.json and configDefDict
- Rename config key utf8Check to utf8_check
  - User should edit ~/.pyglossary/config.json manually
Implement direct compression and uncompression, and some refactoring
- change glos.detectInputFormat to return (filename, format, compression) or None
- remove Glossary.formatsReadFileObj and Glossary.formatsWriteFileObj
- remove fileObj= argument from glos.writeTxt
- use optional 'compressions' list/tuple from Writer or Reader classes for direct compression/uncompression
- refactoring in glossary_utils.py
Update setup.py
Show version from 'git describe --always' on --version
FileSize option (used in many formats):
- Switch to metric (powers of 1000) for K, M, G units
- Add KiB, MiB, GiB for powers of 1024
Add extensionCreate variable (str) to plugins and plugin API
- Use it to improve ui_tk
Text-based glossary code-base (effecting Tabfile, Kobo Dictfile, LDF)
- Optimize TextGlossaryReader
- Change multi-file text glossary file names from .N.txt to .txt.N (where N>=1)
- Enable reading pyglossary-writen multi-file text glossary by adding file_count=-1 to metadata
  - because the number of files is not known when creating the first txt file
Tabfile
- Rename option writeInfo to enable_info
- Reader: read resource files from *.txt_res directory if exists
- Add *.txt_res directory to *.zip file
Zim Reader:
- Migrate to libzim 1.0
- Add mimetype image/webp, fix #329
Slob and Tabfile Writer: add file_size_approx option to allow writing multi-part output
- support values like: 5500k, 100m, 1.2g
Add word_title=False option to some writers
- Slob Writer: add word_title=False option
- Tabfile Writer: add word_title=False option
- CSV Writer: add word_title=False option
- JSON Writer: add word_title=False option
- Dict.cc Reader: do not add word title
- FreeDict Reader: rename keywords_header option to word_title
- Add glos.wordTitleStr, used in plugins with word_title option
- Add definition_has_headwords=True info key to avoid adding the title next time we read the glossary
Aard2 (slob)
- Writer: add option separate_alternates=False, #270
- Writer: fix handling content_type option
- Writer: use ~/.cache/pyglossary/ instead of /tmp
- Writer: add mp3 to mime types, #289
- Writer: add support for .ini data file, #289
- Writer: support .webp files, #329
- Writer: supoort .tiff and .tif files
- Reader: read glossary name/title and creation time from tags
- Reader: extract all metedata / tags
- slob.py library: Refactoring and cleanup
StarDict:
- Reader: add option unicode_errors for invalid UTF-8 data, #309
- Writer: add bool write-option audio_goldendict, #327
- Writer: add option audio_icon=True, and add option comment, #327
FreeDict Reader
- Fix two slashes before and after pron
- Avoid running unescape_unicode by encoding="utf-8" arg to ET.htmlfile
- Fix exception if edition is missing in header, and few other fixes
- Support <cit type="example"> with <cit type="trans"> inside it
- Support <cit type="trans"> inside nested second-level(nested) <sense>
- Add "lang" attribute to html elements
- Add option "example_padding"
- Fix rendering <def>, refactoring and improvement
- Handle <note> inside <sense>
- Support <note> in <gramGrp>
- Mark external refs with <a ... class="external">
- Support comment in <cit>
- Support <xr> inside <sense>
- Implement many tags under <sense>
- Improvements and refactoring
XDXF
- Fix not finding xdxf.xsl in installed mode
 - Effecting XDXF and StarDict formats
- xdxf.xsl: generate  instead of 
- StarDict Reader: Add xdxf_to_html=True option, #258
- StarDict Reader: Import xdxf_transform lazily
 - Remove forced dependency to lxml, #261
- XDXF plugin: fix glos.setDefaultDefiFormat call
- xdxf_transform.py: remove warnings for , #322
- Merge PR #317
 - Parse sr, gr, ex_orig, ex_transl tags and audio
 - Remove None attribute from audio tag
 - Use unicode symbols for audio and external link
 - Use another speaker symbol for audio
 - Add audio controls
 - Use plain link without an audio tag
Mobi
- Update ebook_mobi.py and README.md, #299
- Add PR #335 with some modifications
Changes in ebook_base.py (Mobi and EPUB)
- Avoid exception if removing tmpDir failed
- Use style.css dataEntry, #299
DSL Reader:
- Strip whitespaces around language names, #264
- Add progressbar support, #264
- Run html.escape on text before adding html tags, #265
- Strip and unquote glossary name
- Generate  and  instead of 
- Avoid adding html comment
- Remove \ufeff from header lines, #306
AppleDict Source
- Change path of Dictionary Development Kit, #300
- Open all text files with encoding="utf-8"
- Some refactporing
- Rename 4 options:
  - cleanHTML -> clean_html
  - defaultPrefs -> default_prefs
  - prefsHTML -> prefs_html
  - frontBackMatter -> front_back_matter
AppleDict Binary
- Improvements, #299
- Read DefaultStyle.css file, add as style.css, #299
- Change default value of option: html=True
Octopus MDict (MDX)
- Fix image links
- Do not set empty title
- Minor improvement in readmdict.py
- Handle exception when reading from a corrupt MDD file
- Add bool flag same_dir_data_files, #289
- Add read-option: audio=True (default: False), #327
- audio: remove extra attrs and add comments
DICT.org plugin:
- installToDictd: skip if target directory does not exist
- Make rendering dictd files a bit clear in pure txt
- Fix indention issue and add bword prefix as url
Fixes and improvements in Dict.cc (SQLite3) plugin:
- Fix typo, and avoid iterating over cur, use fetchall(), #296
- Remove gender from headword, add it to definition, #296
- Avoid running unescape_unicode
JMDict
- Support reading compressed file directly
- Show pos before gloss (translations)
- Avoid running unescape_unicode
DigitalNK: work around Python's sqlite bug, #282
Changes in dict_org.py plugin, By Justin Yang
- Use
  to replace newline
- Replace words with {} around to true web link
CC-CEDICT Reader:
- Fix import error in conv.py
- Switch from jinja2 to lxml
 - Fix not escaping <, > and &
 - Note: lxml inserts   instead of  
- Use  instead of 
- add option to use Traditional Chinese for entry name
- Avoid colorizing if tones count does not match len(syllables), #328
- Add  for each syllable in case of mismatch tones, #328
Rename read/write options:
- DSL: rename option onlyFixMarkUp to only_fix_markup
- SQL: ren...

Assets 3

24 Oct 11:45

ilius

4.0.0

047b747

4.0.0

Changes since 3.3.0

Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
Fix / rewrite setup.py
- Fix python3 setup.py sdist bdist_wheel, and pypi paackage
  - Had to move ui/ directory into pyglossary/
- Switch from distutils to setuptools
- Remove py2exe
Add interactive command line user interface
- Automatically selected if input & ouput file arguments are not passed and one of these:
  - On Linux and no $DISPLAY is not set
  - On Mac and no tkinter module is found
  - --ui=cmd flag is passed
New format support:
- Add read support for FreeDict, #206
- Add read support for Zim (Kiwix)
- Add read and write support for Kobo E-Reader Dictfile (.df)
- Add write support for DICT.org dictfmt source file
- Add read support for dictunformat output file
- Add write support for JSON
- Add read support for Dict.cc (SQLite3)
- Add read support for JMDict, #239
- Add basic read support for Wiktionary Dump (.xml)
- Add read support for cc-kedict
- Add read support for DigitalNK (SQLite3)
- Add read support for Wordset.org JSON directory
Remove Omnidic write support (Unmaintained J2ME dictionary)
Remove Octopus MDict Source plugin
Remove Babylon Source plugin
BGL Weader: improvements
DictionaryForMIDs Writer: fix non-working code
Gettext Source (po) Writer: fix info header
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path option, #112
EPUB-2 E-Book Writer: fix sort order
XDXF Reader: rewrite with etree.iterparse to avoid using too much RAM
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
dict_org.py: rewrite broken plugin (Reader and Writer)
DSL Reader: fix loosing metadata/info
Aard 2 (slob) Reader:
- Fix adding css/js files as normal entries
- Add bword:// prefix to entry links
- Fix duplicate entries issue by keeping a set of blob IDs, #224
- Detect and pass defiFormat
Aard 2 (slob) Writer:
- Fix content_type detection
- Remove bword:// prefix from entry links
- Add resource files / data entries, #243
- Fix replacing image paths
- Show log events from slob.py in debug mode
- Change default compression to zlib
- Allow passing empty compression
Octopus MDict Reader:
- Read MDX file twice to load links
- Count data entries as part of len(reader) for progressbar
StarDict Writer:
- Copy "copyright" and "publisher" values to "description"
- Add source and target language codes to the end of bookname
- Add write-option stardict_client: bool
  Set True to make glossary more compatible with StarDict 3.x
- Fix broken result when sametypesequence option is given and a definitions contains |
- Allow sametypesequence=x for xdxf
- Add merge_syns option
- Allow sametypesequence=None option
XDXF Reader:
- Fix/improve xdxf to html transformation
Kobo Writer:
- Fix get_prefix algorithm and sorting order, with tests, #219
- Replace <img src=... tags with [Image: name.bmp], #219
 - and show a warning about data entries
- Additional keywords as alternatives, #232
- Fix support for alternates: duplicate entries based on word prefix, #238
- Show headword in title of alternate entries, #238, #245
- Strip full html definition, #246
CSV:
- Add delimiter option to Reader and Writer
- Read and write info
- Writer: accept bool option add_defi_format=True (default False)
AppleDict Writer:
- AppleDict Writer: replace fix_sound_link() code with a single line
- AppleDict Writer should not call glos.setDefaultDefiFormat
MDX Reader:
- Replace entry:// with bword:// in MDX Reader instead of AppleDict Writer
- Fix internal href="x:" and href="d:" links
- Fix file:// in images path, fix #243
User Interface improvements and fixes:
- ui_gtk: add About tab and more improvements
- ui_tk: replace About dialog with About tab and more improvements
- ui_cmd: improvements in progressbar
- ui_cmd: allow "=" in value of read/write options
Add a list of 208 languages and ~40 writing systems
- Detect sourceLang and targetLang from glossary name/title
- Auto-select between  and <big> tags depending on writing system
 - Using glos.titleElement method, used in FreeDict, JMDict and Dict.cc writers
- glos.sourceLang and glos.targetLang properties (with setters) as Lang objects
- glos.sourceLangName and glos.targetLangName properties (with setters) as str
 - Used in several plugins
Break compatibilty of plugins
- Drop support for read and write functions (outside a class)
- Now we only support Reader class and Writer class
- Reader class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from file and set with glos.setInfo
  - __len__(self) -> int
    - Should return the number or entries, or zero if it's too costly
  - __iter__(self) -> "Iterator[BaseEntry]"
    - Can be a generator
  - close(self)
- Writer class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from glos.getInfo or glos.iterInfo and written to file
  - write(self) -> "Generator[None, BaseEntry, None]"
    - Entries must be fetched with entry = yield in a while True loop:
      while True: entry = yield if entry is None: break # process and write entry into file(s)
  - finish(self)
- Read options and write options must be set to their default values as class attributes
  - See pyglossary/plugins/csv_pyg.py plugin for example
- sortKey must be an intance method of Writer, instead of a function outside any class
  - Only for plugins that need sorting before write
Refactor and cleanup Glossary class
- Removed or replaced most of class/static attributes of Glossary
  - To see the diff, run git diff 3.3.0..master -- pyglossary/glossary.py
- Removed glos.addEntry method
  - If you use it in your program, replace with glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
- Removed instance methods:
  - getMostUsedDefiFormats
  - iterEntryBuckets
  - zipOutDir and archiveOutDir
    - Moved to pyglossary/glossary_utils.py
    - archiveOutDir renamed to compressOutDir
  - writeDict
  - iterSqlLines -> moved to pyglossary/plugins/sql.py
  - reverse, takeOutputWords, searchWordInDef -> moved to pyglossary/reverse.py
- Values of Glossary.plugins is changed to plugin_prop.PluginProp instances
- Change glos.writeTxt arguments
  - Replace sep1 and sep2 with entryFmt
  - Replace rplList with defiEscapeFunc, wordEscapeFunc and tail
  - Remove iterEntries, entryFilterFunc
  - Method returns Generator[None, BaseEntry, None] instead of bool
  - See for usage example:
    - pyglossary/glossary.py -> def writeTabfile
    - pyglossary/plugins/dict_org_source.py
    - pyglossary/plugins/json_plugin.py
    - pyglossary/plugins/lingoes_ldf.py
    - pyglossary/plugins/sdict_source.py
Refactor, cleanup and fixes in Entry and DataEntry classes
- Replace entry.getWord() with entry.word
- Replace entry.getWords() with entry.l_word
- Replace entry.getDefi() with entry.defi
- Remove entry.getDefis()
  - Drop handling alternate definitions in Entry objects
- Replace entry.getDefiFormat() with entry.defiFormat
- Add entry.b_word and entry.b_defi shortcuts that give bytes (UTF-8)
- Replace dataEntry.getData() with dataEntry.data
- Add __slots__ to Entry and DataEntry classes
- Fix DataEntry in indirect mode
  - Mistaken for Entry with defi=DATA, and file content discarded
  - Save resource files in user's cache directory when loading input glossary into memory
    - Move file to output glossary on dataEntry.save(...)
- Fix Entry.getRawEntrySortKey not being alternates-aware, broke StarDict Writer
- DataEntry: save: use shutil.copy if has _tmpPath, and set _tmpPath
New features of Entry
- entry.stripFullHtml(), remove <html... <head>...</head>...<body>
 - Used in Kobo and Kobo Dictfile writers
 - Add tests
Fix glos.writeTabfile:
- Remove \r from definitions and info values
- Fix not escaping word
Fix/improve html detection in definitions
Switch to lazy imports of non-standard modules in plugins
Optimize RAM usage of indirect conversion
- To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
Other new features of Glossary class
- glos.getAuthor() to get "author", or "publisher" (as fallback)
- glos.removeHtmlTagsAll() method, can be called by plugins' writer
- glos.collectDefiFormat(maxCount) extract defiFormat counts
  - by reading first maxCount entries. (then iterator will be reset)
  - Used in StarDict Writer
- Show memory usage in trace mode
Bug fixes and improvements in code base
- Apply entry filter when iterating over reader, fix #251
  - Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
- Fixes and improvements in TextGlossaryReader class
  - Fix ignoring glossary defaultDefiFormat
- Fix evaluating None value in read/write options
Support reading multi-file Tabfile or other text formats
- Example: file.txt, file.txt.1, file.txt.2
- Need to add file_count info key, for example: ##file_count 3
Fixes in Tabfile Writer
- Fix not escaping ""
Add/update docume...

Assets 3

21 May 08:19

ilius

3.3.0

e7126e1

3.3.0

Changes since 3.2.1

Require Python 3.6 or higher (mainly becuase of f-strings)
New format support
- Add support to write Kobo dictionary, #205
- Add support to write EPUB-2
- Add support to read AppleDict Binary (.dictionary)
- Add support to read and write Aard 2 (slob), #116
Glossary: detect and load Writer class from plugins
- Remove write function from plugin if it has Writer class
Glossary: call gc.collect() on indirect mode after reading/writing each 128 entries
- To free up memory and avoid running out of RAM for large glossaries
Glossary: remove empty and duplicate alternate words when converting, using Entry Filter, #188
Add command line options to remove html tags:
- --remove-html=tag1,tag2,tag3
- --remove-html-all
Re-design format-specific options
- Allow specifying format-specific read/write options in ui_gtk and ui_tk
- Add much better and cleaner codebase for handling options in option.py
- Implement validation of options in command line, GTK and Tkinter interfaces
- Add tests for option.py in option_test.py
- Avoid using None as default value of option argument
- Check default value of plugin options and show warning if invalid
- Add IntOption class, use it in Omnidic plugin
- Add DictOption, use it for appledict defaultPrefs
- And optionsProp to all plugins
  - Containing value type, allowed values and optional comment
- Remove readOptions and writeOptions from all plugins
  - Detect options from functions' signature and optionsProp variables
  - Avoid using **kwargs in plugin read, Reader.open or write functions
Add depends variable to plugins
- To let GUI install plugin dependencies
- Type: dict, keys are module names, values are pip's package name
- Add Glossary.formatsDepends
Minor fixes and improvements in Glossary class:
- Return with error if output file path is an existing directory
- Fix empty zip when creating DIRECTORY.zip as output glossary
- Do not uncompress gz/bz2/zip input files automatically
- Ignore "read" function of plugin if "Reader" class is present
- Cleaning: Add Glossary.init() classmethod to initialize the class, can be called multiple times
- Some refactoring and cleaning, and add some logs
- Small optimization: index % 100 -> index & 0x7f
- Allow having progressbar by position in file and size of file
  - use for appledict_bin.py
- Do not write resource file names as entries to text file in Glossary.writeTxt
StarDict plugin
- Always open .ifo file as UTF-8
- Fix output filenames without .ifo extention creating hidden files, #187
Babylon BGL plugin
- Fix bytes metedata values b'...' and some refactoring in readType3
- Skip empty info values
- Fix non-string info values written as empty
- Prefix 3 info keys with bgl_
- Fix NameError in debug mode in stripHtmlTags
- Some refactoring
Octopus MDict plugin
- Fix Python 3 bug in readmdict.py: https://bitbucket.org/xwang/mdict-analysis/commits/8f66c30
- Support multiple mdd files (#203)
Change yes/no options in AppleDict and ABBYY Lingvo DSL plugins to boolean
- To keep compatibility of command line flags, fix yes/no manually in ui_cmd.py
AppleDict plugin:
- Fix echo problem in Makefile (#177)
- Add dark mode support for AppleDict output (#177)
- Add comments for optionsProp
- Use keyword argument features= and fix a warning about from_encoding=
Fix misspelled "extension" (as "extention") in plugins
Detect entries with span tag as html, #193
Refactoring in ui_gtk and ui_tk
Fix some deprecated API in ui_gtk
Fix minor bugs and improvements in ui_tk and ui_gtk
Update setup.py to adapt packaging with wheel, #189
Add type hints to codebase and plugins
Refactoring and style changes:
- rename pyglossary.pyw to main.py, add a small pyglossary.pyw for compatibility
- Switch to f-strings in glossary.py and freedict.py
- main.py: replace single quotes with double quotes
- PEP-8 style fixes

Assets 2

21 Jun 21:11

ilius

3.2.1

bb2a66a

3.2.1

Changes since 3.2.0

Changes in StarDict plugin:
- Add sametypesequence write option (PR #162)
- Fix some bugs
- Cleaning
Disable gzip CRC check for BGL files with Python 3.7
Fix a bug in octopus_mdict.py
Fix Gtk warnings in ui_gtk
Allow seeing/customizing warnings by setting environment variable WARNINGS
Fix not being able to run the program when installed inside virtualenv (#168)
Show a tip about -h when no UI were found, #169
octopus_mdict_source.py: fix #68, add support for inconsecutive links with --read-options=links=True
Auto-detect UTF-16 encoding of DSL files
Update README.md (fix Archlinux pkg name, add AUR, add instructions for installing python-lzo on Windows, etc)
Some clean up

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes since 4.4.1

Bug fixes

Features

Performance improvements

Other improvements

Unit testing

Refactoring

Contributors

Changes since 4.4.0

Bug fixes

Features

Improvements and documentation

Changes since 4.3.0

Breaking changes

Features

Bug fixes

Improvements

Unit testing

Refactoring and design improvements

Changes since 4.2.1

Bug fixes

New features

New features for library users

Design improvements

Documentation

Unit testing

Improvements

Refactoring and cleanup

Changes since version 4.2.0

Minor bug fixes and improvements:

Fearures:

Changes since 4.1.0

Changes since 3.3.0

Changes since 3.2.1

Changes since 3.2.0

Releases: ilius/pyglossary

4.5.0

Changes since 4.4.1

Bug fixes

Features

Performance improvements

Other improvements

Unit testing

Refactoring

Contributors

4.4.1

Changes since 4.4.0

Bug fixes

Features

Improvements and documentation

4.4.0

Changes since 4.3.0

Breaking changes

Features

Bug fixes

Improvements

Unit testing

Refactoring and design improvements

4.3.0

Changes since 4.2.1

Bug fixes

New features

New features for library users

Design improvements

Documentation

Unit testing

Improvements

Refactoring and cleanup

4.2.1

Changes since version 4.2.0

Minor bug fixes and improvements:

Fearures:

4.2.0

Changes since 4.1.0

4.1.0

4.0.0

Changes since 3.3.0

3.3.0

Changes since 3.2.1

3.2.1

Changes since 3.2.0