Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more "how to" documentation for python library use #87

Open
kontur opened this issue Jun 27, 2022 · 8 comments
Open

Provide more "how to" documentation for python library use #87

kontur opened this issue Jun 27, 2022 · 8 comments
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@kontur
Copy link
Contributor

kontur commented Jun 27, 2022

As per #28 and #86 — the library is useful for the CLI, but without documentation it is not useful standalone.

@kontur kontur added the documentation Improvements or additions to documentation label Jun 27, 2022
@kontur kontur self-assigned this Jun 27, 2022
@kontur
Copy link
Contributor Author

kontur commented Jul 1, 2022

And (for later inclusion in an aggregated list of examples) "How to get languages/language counts by validity":

from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages

counts = {level: [] for level in VALIDITYLEVELS}

for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
    counts[language["validity"]].append(iso)

print({level: len(isos) for level, isos in counts.items()})

@kontur
Copy link
Contributor Author

kontur commented Jul 1, 2022

And "How many scripts are in the Hyperglot data" (all validity levels, all orthographies):

from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages
from hyperglot.language import Language

scripts = []

for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
    l = Language(language, iso)
    if "orthographies" in l:
        scripts.extend([o["script"] for o in l["orthographies"]])

print(len(set(scripts)), sorted(set(scripts)))

@kontur
Copy link
Contributor Author

kontur commented Nov 25, 2022

To document: 0.4.2 now has the destinction between accessing the raw yaml data of a language, e.g.:

from hyperglot.languages import Languages
hg = Languages()

# the raw yaml for 'eng'
hg["eng"]

# a ready to use hyperglot.language.Language object
hg.eng

This is a lot more convenient than having to initialize Language objects with Language(Languages()["xxx"], "xxx").

@ivangrozny
Copy link

ivangrozny commented May 27, 2024

Hello @kontur , it is exactly what I need but with the font checker, I dont manage to use it with python...

    from hyperglot import checker
    check = checker.FontChecker("fontFile.otf")
    l = check.get_supported_languages(report_missing=10) # Contain allways every languages ...

    print("fr ", check.supports_language('fra'))
    print("jp ", check.supports_language('jpn')) # return always True ...

@kontur
Copy link
Contributor Author

kontur commented May 29, 2024

Hey @ivangrozny!

What is data?

FontChecker expects a path to a font as parameter. It can perform checks on font shaping, e.g. for Arabic. If you are interested in checking only against a set of characters, use CharsetChecker instead.

@ivangrozny
Copy link

ivangrozny commented May 30, 2024

Oh right, it's working with a font file path, I was giving a ttFont object... By the way is it possible to build a FontChecker with a ttFont ? because I get error with some font files :

File "gui.py", line 256, in load_new_font
    typo = check.get_supported_languages()
  File "hyperglot\checker.py", line 406, in get_supported_languages
    return super().get_supported_languages(**kwargs)
  File "hyperglot\checker.py", line 124, in get_supported_languages
    lang_sup = self.supports_language(
  File "hyperglot\checker.py", line 412, in supports_language
    return super().supports_language(iso, **kwargs)
  File "hyperglot\checker.py", line 277, in supports_language
    joining_errors, mark_errors = self._check_shaping(
  File "hyperglot\checker.py", line 384, in _check_shaping
    mark_errors = orthography.check_mark_attachment(check_attachment, self.shaper)
  File "hyperglot\orthography.py", line 223, in check_mark_attachment
    if shaper.check_mark_attachment(c) is False:
  File "hyperglot\shaper.py", line 221, in check_mark_attachment
    names = ", ".join(self.names_for_codepoints(missing_from_font))
TypeError: sequence item 0: expected str instance, NoneType found

for instance with Roboto Black from google font

kontur added a commit that referenced this issue May 31, 2024
@kontur
Copy link
Contributor Author

kontur commented May 31, 2024

@ivangrozny thanks for submitting that bug, it should totally be possible. If you pull in the latest dev it should no longer crash with this issue.

@kontur kontur added this to the 0.7.0 milestone Jun 20, 2024
@frankrolf
Copy link

Just in case it’s useful – I needed to figure out which characters Hyperglot lists for a given language, and have come up with this snippet:

from hyperglot.languages import Languages
from hyperglot.language import Language

# I had only language names to start with, so this allows me map a language
# to an ISO code:
name_to_iso = {}
for iso, info in Languages().items():
    lang_name = info['name']
    name_to_iso[lang_name] = iso

# I had to fix some of these language names (most notably, Modern Greek)
# to match HG’s expectations
lang_names = [
    'Austrian', 'Bulgarian', 'Czech', 'Danish', 'Dutch',
    'Standard Estonian', 'Finnish', 'French', 'Scottish Gaelic',
    'Modern Greek (1453-)', 'Ukrainian', 'Northern Sami', 'Slovak']

# report chars required for each language, as well as design requirements
req_chars = set()
for lang_name in sorted(lang_names):
    iso = name_to_iso.get(lang_name, None)
    if iso:
        lang = Language(iso)
        ortho = lang.get_orthography()
        design_req = ortho.get('design_requirements', [])

        chars_all = ortho.get('base')
        chars_aux = ortho.get('auxiliary', '')
        chars_marks = ortho.get('marks', '')
        if chars_aux:
            chars_all += ' ' + chars_aux
        if chars_marks:
            chars_all += ' ' + chars_marks

        req_chars.update(set(chars_all))
        print(lang)
        print(chars_all)
        for req in design_req:
            print('* ' + req)
        print()


# report all chars required to support above languages
total_chars = sorted(req_chars)
print(f'{" ".join(total_chars)} ({len(total_chars)} chars)')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants