-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add bytes_with_sample
for creating bytes reading instructions
#357
feat: add bytes_with_sample
for creating bytes reading instructions
#357
Conversation
bytes_with_sample
to io.io
module creating bytes reading instructionsbytes_with_sample
for creating bytes reading instructions
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more. @@ Coverage Diff @@
## main #357 +/- ##
==========================================
- Coverage 94.88% 92.74% -2.14%
==========================================
Files 20 20
Lines 2482 2550 +68
==========================================
+ Hits 2355 2365 +10
- Misses 127 185 +58
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/dask_awkward/lib/io/io.py
Outdated
] | ||
out.append(values) | ||
|
||
sample_size = parse_bytes(sample) if isinstance(sample, str) else sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should allow sample=False
here, which returns maybe None
or b""
and avoids the extra IO here.
Martin Durant ***@***.***> writes:
@martindurant commented on this pull request.
Should allow `sample=False` here, which returns maybe `None` or `b""` and avoids the extra IO here.
makes sense; added a patch for it.
expanding in this a bit: I plan on this function being purely internal
(not public API facing) to be used by `from_json` and `from_text`. On
our end we'll need the sample to make sure we can create a propert
`_meta` on the eventual Array collection instance.
edit: and after adding this comment it clicked to me that `from_json` should eventually have the ability to have `_meta` defined through a schema and therefore not reading a sample should definitely be possible!
|
length.append(off[-1] - off[-2]) | ||
length.append(size - off[-1]) | ||
|
||
if not_zero: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant this not_zero
argument is the one thing from this copied upstream-dask-code that I don't totally understand; do you know when this is relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the upstream docstring has something, so I guess more specifically do you know when this discarding of the header is relevant?
not_zero : bool
Force seek of start-of-file delimiter, discarding header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's whether to regard the start of a file as the start of the first block, or the first occurrence of the delimiter.
Pulling this out of the JSON PR (#94) because we'll also use it for
from_text
(prototype here: douglasdavis#7)