Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunkinfo #234

Merged
merged 32 commits into from
Jul 12, 2023
Merged

Chunkinfo #234

merged 32 commits into from
Jul 12, 2023

Conversation

jreadey
Copy link
Member

@jreadey jreadey commented Jun 6, 2023

Added arange chunk initiaiizer.

@jreadey jreadey requested a review from mattjala June 6, 2023 19:19
mattjala
mattjala previously approved these changes Jun 7, 2023
Copy link
Contributor

@mattjala mattjala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

from .util.hdf5dtype import createDataType, getItemSize

from . import config
from . import hsds_logger as log

# supported initializer commands (just one at the moement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one initializer command is now supported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

chunk_index = getChunkIndex(chunk_id)
chunk_index = chunk_index[0]
log.debug(f"chunk_index: {chunk_index}")
for j in range(table_factor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, since table_factor is always a scalar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow - we want to iterate through each hyperchunk for each hsds chunk...

@@ -829,6 +956,7 @@ async def get_chunk(

# validate arguments
if s3path:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this validation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is not relevant after latest code changes.

chunk_extent = chunk_dims[dim]

if dset_extent > 0 and chunk_extent > 0:
table_extent = -(dset_extent // -chunk_extent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the double negative here - wouldn't the result be the same if both negatives were removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is little trick to compute the integer ceiling.
E.g.: -(8 // -3) -> 3
8 // 3 -> 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Is there a reason not to use int(math.ceil(dset_extent // chunk_extent))? Other parts of HSDS already import math.


def _find_min_pair(h5chunks, max_gap=None):
""" given a dict of chunk_map entries, return the two
chunks nearest to each other in the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return the two chunks -> return the indices of the two chunks

@@ -366,8 +366,8 @@ async def GET_Chunk(request):
num_bytes = s3size
else:
# list
num_bytes = np.prod(s3size)
log.debug(f"reading {num_bytes} from {s3path}")
num_bytes = np.sum(s3size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the sum instead of product? If s3size is a list, isn't each entry in the list a dimension describing the size of the data in s3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it should be sum. In this case s3size is a colon separated list of range lengths, so the sum is the total number of bytes to read.

mattjala
mattjala previously approved these changes Jul 10, 2023
@jreadey jreadey merged commit 6197314 into master Jul 12, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants