Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better display of estimated line numbers and add number of columns for tabular #17492

Merged
merged 8 commits into from
Mar 5, 2024

Conversation

bernt-matthias
Copy link
Contributor

  • Show estimated line number as a power of 10: showing all these zeros for an estimated number seems like a waste of space
  • Show number of columns for tabular

Cherry picked from #13918

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

fixes galaxyproject#6506

for large files the number of lines is estimated and shown
as a rounded number (using two significant digits), e.g
`~8,700,000 lines`.

with this change it will be: `~87 10^5 lines`

this commit also makes roundify really round numbers (as the name
suggests) and not simply cut at two digits, but this could be
reverted if there are concerns wrt speed due to using more math
@bernt-matthias bernt-matthias changed the title Topic/tabular blurp Better display of estimated line numbers and add number of columns for tabular Feb 19, 2024
Copy link
Member

@hexylena hexylena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great improvement!

@@ -1135,14 +1139,49 @@ def commaify(amount):
return commaify(new)


def trailing_zeros_to_powerof10(amount):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a shorter version operating on numbers directly:

def trailing_zeros_to_powerof10(amount, threshold=5):
    if abs(amount) < 10**threshold:
        return amount
    exp = threshold - 1
    while not amount % 10**(exp+1):
        exp += 1
    if exp < threshold:
        return amount
    return f"{amount // 10**exp}\u22C510^{exp}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even shorter:

def trailing_zeros_to_powerof10(amount, threshold=5):
    if amount == 0 or amount % 10**threshold:
        return amount
    exp = threshold
    while not amount % 10**(exp+1):
        exp += 1
    return f"{amount // 10**exp}\u22C510^{exp}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while not amount % 10**(exp+1): would run infinitely if amount is not a number of 10.

Note that if we would like to have short code for getting scientific notation we can just f"{number:.0e}". My goal here was to only remove trailing 0s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know and I've tested my suggestion at least with your docstring numbers :)

lib/galaxy/datatypes/data.py Outdated Show resolved Hide resolved
Co-authored-by: Wolfgang Maier <maierw@posteo.de>
@nsoranzo
Copy link
Member

roundify takes int

Can you add type annotation to the functions please?

@wm75
Copy link
Contributor

wm75 commented Feb 19, 2024

I'm not sure, this new version is always more readable than the old one. 2*10^6 may be better than 2,000,000, but is 23*10^5 better than 2,300,000? If anything, I'd prefer 2.3*10^6 I think.

@bernt-matthias
Copy link
Contributor Author

I'm not sure, this new version is always more readable than the old one. 210^6 may be better than 2,000,000, but is 2310^5 better than 2,300,000? If anything, I'd prefer 2.3*10^6 I think.

Because 1<=m<10 or because n is a multiple of 3 (for m × 10n)?

lib/galaxy/util/__init__.py Outdated Show resolved Hide resolved
lib/galaxy/util/__init__.py Outdated Show resolved Hide resolved
@wm75
Copy link
Contributor

wm75 commented Feb 19, 2024

I'm not sure, this new version is always more readable than the old one. 2_10^6 may be better than 2,000,000, but is 23_10^5 better than 2,300,000? If anything, I'd prefer 2.3*10^6 I think.

Because 1<=m<10 or because n is a multiple of 3 (for m × 10n)?

Because I'm used to think in exponents that are multiples of 3 (thousands, millions, etc lines).

@wm75
Copy link
Contributor

wm75 commented Feb 19, 2024

Expressed as code I think I'm looking for sth like this:

def trailing_zeros_to_powerof10(amount, threshold=5):
    """
    >>> trailing_zeros_to_powerof10(23000)
    '23000'
    >>> trailing_zeros_to_powerof10(300000)
    '0.3\u22C510^6'
    >>> trailing_zeros_to_powerof10(2300000)
    '2.3\u22C510^6'
    >>> trailing_zeros_to_powerof10(23000000)
    '23\u22C510^6'
    >>> trailing_zeros_to_powerof10(1)
    '1'
    >>> trailing_zeros_to_powerof10(0)
    '0'
    >>> trailing_zeros_to_powerof10(100)
    '100'
    >>> trailing_zeros_to_powerof10(-100)
    '-100'
    """
    if abs(amount) < 10**threshold:
        return str(amount)
    exp = 3
    while abs(amount) / 10**(exp+2) >= 1:
        exp += 3
    if amount % 10**exp:
        return f"{amount / 10**exp}\u22C510^{exp}"
    else:
        return f"{amount // 10**exp}\u22C510^{exp}"

Note: this assumes that the amount has been roundified before.

@nsoranzo
Copy link
Member

@wm75 It looks like to me you are proposing https://en.wikipedia.org/wiki/Engineering_notation instead of https://en.wikipedia.org/wiki/Scientific_notation ?

@bernt-matthias
Copy link
Contributor Author

Would be fine for me, then we could even use prefixes like k, M, instead of using exponents. Could reuse nice_size() for this.

@wm75
Copy link
Contributor

wm75 commented Feb 20, 2024

Would be fine for me, then we could even use prefixes like k, M, instead of using exponents. Could reuse nice_size() for this.

The idea is similar at least, yes.

@bernt-matthias
Copy link
Contributor Author

bernt-matthias commented Feb 20, 2024

How about this one: 9fa8345?

It's super compact (maybe even more than the E notation), it's maybe also the least technical (no exponential .. only hidden).

By using the text=False in the metric prefix function we could also use nearly the same code for having 10^x. But we should remove it if we do not need it.

@bernt-matthias bernt-matthias force-pushed the topic/tabular-blurp branch 2 times, most recently from 279c372 to 5797a3b Compare February 21, 2024 11:20
@dannon dannon merged commit 95eabbf into galaxyproject:dev Mar 5, 2024
51 of 52 checks passed
@bernt-matthias bernt-matthias deleted the topic/tabular-blurp branch March 5, 2024 16:18
@jdavcs
Copy link
Member

jdavcs commented Mar 13, 2024

@bernt-matthias can you, please, backport this to 24.0? It was accidentally merged into the dev branch.

@dannon
Copy link
Member

dannon commented Mar 13, 2024

This was not an accident -- we (wg-backend) discussed it on the call last week and decided targeting 24.1/dev was fine. We can just correct the milestone?

@jdavcs jdavcs modified the milestones: 24.0, 24.1 Mar 13, 2024
@jdavcs
Copy link
Member

jdavcs commented Mar 13, 2024

@bernt-matthias never mind!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants