Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bytes_with_sample for creating bytes reading instructions #357

Merged

Conversation

douglasdavis
Copy link
Collaborator

@douglasdavis douglasdavis commented Sep 5, 2023

Pulling this out of the JSON PR (#94) because we'll also use it for from_text (prototype here: douglasdavis#7)

@douglasdavis douglasdavis changed the title feat: add bytes_with_sample to io.io module creating bytes reading instructions feat: add bytes_with_sample for creating bytes reading instructions Sep 5, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #357 (651a28a) into main (a2ee65c) will decrease coverage by 2.14%.
The diff coverage is 15.94%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #357      +/-   ##
==========================================
- Coverage   94.88%   92.74%   -2.14%     
==========================================
  Files          20       20              
  Lines        2482     2550      +68     
==========================================
+ Hits         2355     2365      +10     
- Misses        127      185      +58     
Files Changed Coverage Δ
src/dask_awkward/lib/io/io.py 75.73% <15.94%> (-24.27%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

]
out.append(values)

sample_size = parse_bytes(sample) if isinstance(sample, str) else sample
Copy link
Collaborator

@martindurant martindurant Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should allow sample=False here, which returns maybe None or b"" and avoids the extra IO here.

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Sep 5, 2023 via email

length.append(off[-1] - off[-2])
length.append(size - off[-1])

if not_zero:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant this not_zero argument is the one thing from this copied upstream-dask-code that I don't totally understand; do you know when this is relevant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upstream docstring has something, so I guess more specifically do you know when this discarding of the header is relevant?

    not_zero : bool
        Force seek of start-of-file delimiter, discarding header.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's whether to regard the start of a file as the start of the first block, or the first occurrence of the delimiter.

@douglasdavis douglasdavis marked this pull request as ready for review September 5, 2023 17:16
@douglasdavis douglasdavis merged commit d8a916b into dask-contrib:main Sep 5, 2023
19 checks passed
@douglasdavis douglasdavis deleted the bytes-ingredients-and-sample branch September 5, 2023 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants