-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401
fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401
Conversation
✅ Deploy Preview for dcmjs2 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This can be an issue for other VRs as well. Converting to draft while coming up with a more generalized solution. |
Fixed generically by add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks for the work and the PR 👍
I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.
/** | ||
* Removes padding byte, if it exists, from the last value in a multiple-value data element. | ||
* | ||
* This function ensures that data elements with multiple values maintain their integrity for lossless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pad byte is an artifact of the binary encoding, and should always be stripped - that makes it consistent as to what value one gets. Depending on the maximum character length means one ends up getting different results depending on the exact boundary/parse conditions which is a recipe for errors and bugs.
I think always removing the pad byte from numeric strings is a more consistent solution than trying to get the exact correct length. It just seems like that leaves the possibility of errors when the length is that exact length as compared to shorter lengths or vice-versa. |
The issue I had was trying to determine whether the pad byte/space was significant or added for evenness. Many VR's (like the numeric strings) allow for whitespace so unless the parsed string exceeds the max length there is nothing saying the whitespace shouldn't be there. This originally come out of work for lossless round trip processing where the goal has been to preserve exactly what is in the source DICOM file. With that approach in mind my thinking was to only manipulate the original string in a very specific circumstance, in this case being the final element in the string containing a single byte longer than allowed and it being equal to the padding byte. In any other case I leave as-is and let the writer determine if the value is valid during write. |
Hey @wayfarer3130 I wanted to check back in. I agree with your sentiment re: For example the spec for Example
In this case I don't think we can determine whether a padding This error I'm seeing is a very specific use case where a data element with multiplicity > 1 has a total encoded length that is odd, and the final value is already at max length. The added padding byte achieves even length, but increases the final element above its individual max length: Example
Stripping the padding byte makes sense here IMO as the whitespace is not part of the original value, it was clearly added for padding to create evenness. So I understand hesitation around parsing special logic though am thinking this is a way we can definitively detect a padding byte and not make other assumptions about the values. But let me know what you think and I'll follow your direction! |
@craigberry1 - as long as you only remove the pad byte on the last item when it is at an odd position, you should be fine as it will then get re-added. I agree the standard is ambiguous there, but I don't quite see how to fix it at this point. |
Thanks for the response @wayfarer3130, I was away for a few days. I wrote up another comment but deleted it after some more thought because I think the odd length check you suggested is sufficient in this case. I just want to be careful not to regress on the lossless round trip work. I'll update this PR shortly. Thanks for the help. |
@pieper @wayfarer3130 Ready for another look when you get a chance. Thanks! |
I think that only stripping a space at the end of the string is the right thing. I don't think any reasonable implementation would add a byte anywhere else to make the buffer and even length. Sounds like we are all agreed then. |
Fixes issue during lossless read/write with numeric string value representations (
IS
/DS
). This was caused by a string like:0.99924236548978\-0.0322633220972\-0.0217663285287\0.02949870928067\0.99267261121054\-0.1171789789306
where the space following the final number is correctly added for evenness, but produced an error during re-write as the unformatted value exceeded the max length of the VR.Solution was to remove this padding byte from
_rawValue
property during read if it exceeds the max length of the vr. This padding will be re-added by the writer if needed for evenness.