Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

Conversation

craigberry1
Copy link
Contributor

Fixes issue during lossless read/write with numeric string value representations (IS/DS). This was caused by a string like:
0.99924236548978\-0.0322633220972\-0.0217663285287\0.02949870928067\0.99267261121054\-0.1171789789306 where the space following the final number is correctly added for evenness, but produced an error during re-write as the unformatted value exceeded the max length of the VR.

Solution was to remove this padding byte from _rawValue property during read if it exceeds the max length of the vr. This padding will be re-added by the writer if needed for evenness.

Copy link

netlify bot commented Aug 26, 2024

Deploy Preview for dcmjs2 ready!

Name Link
🔨 Latest commit 2b57f10
🔍 Latest deploy log https://app.netlify.com/sites/dcmjs2/deploys/66d889e42998e4000820aa6b
😎 Deploy Preview https://deploy-preview-401--dcmjs2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@craigberry1 craigberry1 marked this pull request as draft August 26, 2024 17:36
@craigberry1
Copy link
Contributor Author

This can be an issue for other VRs as well. Converting to draft while coming up with a more generalized solution.

@craigberry1 craigberry1 marked this pull request as ready for review August 27, 2024 18:16
@craigberry1
Copy link
Contributor Author

Fixed generically by add dropPadByte check when strings are split for multiplicity. Aims to track specific case when the last element of an multiple values data element has is exactly 1 byte over max length and that byte is the VR's pad byte. In this case we drop that character as it is not considered part of the original value but was just added for evenness. The writer will add the padding back if needed.

Copy link
Collaborator

@pieper pieper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for the work and the PR 👍

I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

/**
* Removes padding byte, if it exists, from the last value in a multiple-value data element.
*
* This function ensures that data elements with multiple values maintain their integrity for lossless
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pad byte is an artifact of the binary encoding, and should always be stripped - that makes it consistent as to what value one gets. Depending on the maximum character length means one ends up getting different results depending on the exact boundary/parse conditions which is a recipe for errors and bugs.

@wayfarer3130
Copy link
Contributor

This looks good to me, thanks for the work and the PR 👍

I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

I think always removing the pad byte from numeric strings is a more consistent solution than trying to get the exact correct length. It just seems like that leaves the possibility of errors when the length is that exact length as compared to shorter lengths or vice-versa.

@craigberry1
Copy link
Contributor Author

This looks good to me, thanks for the work and the PR 👍
I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

I think always removing the pad byte from numeric strings is a more consistent solution than trying to get the exact correct length. It just seems like that leaves the possibility of errors when the length is that exact length as compared to shorter lengths or vice-versa.

The issue I had was trying to determine whether the pad byte/space was significant or added for evenness. Many VR's (like the numeric strings) allow for whitespace so unless the parsed string exceeds the max length there is nothing saying the whitespace shouldn't be there.

This originally come out of work for lossless round trip processing where the goal has been to preserve exactly what is in the source DICOM file.

With that approach in mind my thinking was to only manipulate the original string in a very specific circumstance, in this case being the final element in the string containing a single byte longer than allowed and it being equal to the padding byte. In any other case I leave as-is and let the writer determine if the value is valid during write.

@craigberry1
Copy link
Contributor Author

craigberry1 commented Aug 29, 2024

Hey @wayfarer3130 I wanted to check back in. I agree with your sentiment re: I think the pad byte is an artifact of the binary encoding, and should always be stripped. The tricky part is identifying when a padding byte was actually added.

For example the spec for DS states: Decimal Strings may be padded with leading or trailing spaces. So we don't want to strip off a whitespace character that was intended to be there:

Example

  • 0.9129
  • DS tag
  • Length 10
  • Leading and trailing spaces significant

In this case I don't think we can determine whether a padding SPACE was added for evenness or was part of the value, and don't think we should manipulate it.


This error I'm seeing is a very specific use case where a data element with multiplicity > 1 has a total encoded length that is odd, and the final value is already at max length.

The added padding byte achieves even length, but increases the final element above its individual max length:

Example
0.99924236548978\-0.0322633220972

  • DS tag
  • Multiplicity = 2
  • Both values are at max allowed length of 16
  • Delimiter \ creates odd length of 33
  • Padding byte added to final element to create length 34
  • Increases last element length to 17

Stripping the padding byte makes sense here IMO as the whitespace is not part of the original value, it was clearly added for padding to create evenness.


So I understand hesitation around parsing special logic though am thinking this is a way we can definitively detect a padding byte and not make other assumptions about the values.

But let me know what you think and I'll follow your direction!

@wayfarer3130
Copy link
Contributor

@craigberry1 - as long as you only remove the pad byte on the last item when it is at an odd position, you should be fine as it will then get re-added. I agree the standard is ambiguous there, but I don't quite see how to fix it at this point.

@craigberry1
Copy link
Contributor Author

@craigberry1 - as long as you only remove the pad byte on the last item when it is at an odd position, you should be fine as it will then get re-added. I agree the standard is ambiguous there, but I don't quite see how to fix it at this point.

Thanks for the response @wayfarer3130, I was away for a few days. I wrote up another comment but deleted it after some more thought because I think the odd length check you suggested is sufficient in this case. I just want to be careful not to regress on the lossless round trip work.

I'll update this PR shortly. Thanks for the help.

@craigberry1
Copy link
Contributor Author

@pieper @wayfarer3130 Ready for another look when you get a chance. Thanks!

@pieper
Copy link
Collaborator

pieper commented Sep 4, 2024

I think that only stripping a space at the end of the string is the right thing. I don't think any reasonable implementation would add a byte anywhere else to make the buffer and even length. Sounds like we are all agreed then.

@pieper pieper merged commit 7b44953 into dcmjs-org:master Sep 4, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants