-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunkinfo #234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hsds/datanode_lib.py
Outdated
from .util.hdf5dtype import createDataType, getItemSize | ||
|
||
from . import config | ||
from . import hsds_logger as log | ||
|
||
# supported initializer commands (just one at the moement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one initializer command is now supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
chunk_index = getChunkIndex(chunk_id) | ||
chunk_index = chunk_index[0] | ||
log.debug(f"chunk_index: {chunk_index}") | ||
for j in range(table_factor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant, since table_factor
is always a scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow - we want to iterate through each hyperchunk for each hsds chunk...
hsds/datanode_lib.py
Outdated
@@ -829,6 +956,7 @@ async def get_chunk( | |||
|
|||
# validate arguments | |||
if s3path: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not relevant after latest code changes.
chunk_extent = chunk_dims[dim] | ||
|
||
if dset_extent > 0 and chunk_extent > 0: | ||
table_extent = -(dset_extent // -chunk_extent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the double negative here - wouldn't the result be the same if both negatives were removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is little trick to compute the integer ceiling.
E.g.: -(8 // -3) -> 3
8 // 3 -> 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Is there a reason not to use int(math.ceil(dset_extent // chunk_extent))
? Other parts of HSDS already import math
.
hsds/util/chunkUtil.py
Outdated
|
||
def _find_min_pair(h5chunks, max_gap=None): | ||
""" given a dict of chunk_map entries, return the two | ||
chunks nearest to each other in the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return the two chunks
-> return the indices of the two chunks
@@ -366,8 +366,8 @@ async def GET_Chunk(request): | |||
num_bytes = s3size | |||
else: | |||
# list | |||
num_bytes = np.prod(s3size) | |||
log.debug(f"reading {num_bytes} from {s3path}") | |||
num_bytes = np.sum(s3size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the sum instead of product? If s3size is a list, isn't each entry in the list a dimension describing the size of the data in s3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it should be sum. In this case s3size is a colon separated list of range lengths, so the sum is the total number of bytes to read.
Added arange chunk initiaiizer.