fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

craigberry1 · 2024-08-26T16:46:13Z

Fixes issue during lossless read/write with numeric string value representations (IS/DS). This was caused by a string like:
0.99924236548978\-0.0322633220972\-0.0217663285287\0.02949870928067\0.99267261121054\-0.1171789789306 where the space following the final number is correctly added for evenness, but produced an error during re-write as the unformatted value exceeded the max length of the VR.

Solution was to remove this padding byte from _rawValue property during read if it exceeds the max length of the vr. This padding will be re-added by the writer if needed for evenness.

netlify · 2024-08-26T16:46:35Z

✅ Deploy Preview for dcmjs2 ready!

Name	Link
🔨 Latest commit	`2b57f10`
🔍 Latest deploy log	https://app.netlify.com/sites/dcmjs2/deploys/66d889e42998e4000820aa6b
😎 Deploy Preview	https://deploy-preview-401--dcmjs2.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

craigberry1 · 2024-08-26T17:37:05Z

This can be an issue for other VRs as well. Converting to draft while coming up with a more generalized solution.

…on.js class

craigberry1 · 2024-08-27T18:18:23Z

Fixed generically by add dropPadByte check when strings are split for multiplicity. Aims to track specific case when the last element of an multiple values data element has is exactly 1 byte over max length and that byte is the VR's pad byte. In this case we drop that character as it is not considered part of the original value but was just added for evenness. The writer will add the padding back if needed.

pieper

This looks good to me, thanks for the work and the PR 👍

I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

wayfarer3130 · 2024-08-28T11:52:49Z

src/ValueRepresentation.js

+    /**
+     * Removes padding byte, if it exists, from the last value in a multiple-value data element.
+     *
+     * This function ensures that data elements with multiple values maintain their integrity for lossless


I think the pad byte is an artifact of the binary encoding, and should always be stripped - that makes it consistent as to what value one gets. Depending on the maximum character length means one ends up getting different results depending on the exact boundary/parse conditions which is a recipe for errors and bugs.

wayfarer3130 · 2024-08-28T11:54:48Z

This looks good to me, thanks for the work and the PR 👍

I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

I think always removing the pad byte from numeric strings is a more consistent solution than trying to get the exact correct length. It just seems like that leaves the possibility of errors when the length is that exact length as compared to shorter lengths or vice-versa.

craigberry1 · 2024-08-28T12:50:52Z

This looks good to me, thanks for the work and the PR 👍
I'd like if we could get a second set of eyes on it, just to be sure Iso 'll request another reviewer to have a look.

I think always removing the pad byte from numeric strings is a more consistent solution than trying to get the exact correct length. It just seems like that leaves the possibility of errors when the length is that exact length as compared to shorter lengths or vice-versa.

The issue I had was trying to determine whether the pad byte/space was significant or added for evenness. Many VR's (like the numeric strings) allow for whitespace so unless the parsed string exceeds the max length there is nothing saying the whitespace shouldn't be there.

This originally come out of work for lossless round trip processing where the goal has been to preserve exactly what is in the source DICOM file.

With that approach in mind my thinking was to only manipulate the original string in a very specific circumstance, in this case being the final element in the string containing a single byte longer than allowed and it being equal to the padding byte. In any other case I leave as-is and let the writer determine if the value is valid during write.

craigberry1 · 2024-08-29T14:31:04Z

Hey @wayfarer3130 I wanted to check back in. I agree with your sentiment re: I think the pad byte is an artifact of the binary encoding, and should always be stripped. The tricky part is identifying when a padding byte was actually added.

For example the spec for DS states: Decimal Strings may be padded with leading or trailing spaces. So we don't want to strip off a whitespace character that was intended to be there:

Example

0.9129
DS tag
Length 10
Leading and trailing spaces significant

In this case I don't think we can determine whether a padding SPACE was added for evenness or was part of the value, and don't think we should manipulate it.

This error I'm seeing is a very specific use case where a data element with multiplicity > 1 has a total encoded length that is odd, and the final value is already at max length.

The added padding byte achieves even length, but increases the final element above its individual max length:

Example
0.99924236548978\-0.0322633220972

DS tag
Multiplicity = 2
Both values are at max allowed length of 16
Delimiter \ creates odd length of 33
Padding byte added to final element to create length 34
Increases last element length to 17

Stripping the padding byte makes sense here IMO as the whitespace is not part of the original value, it was clearly added for padding to create evenness.

So I understand hesitation around parsing special logic though am thinking this is a way we can definitively detect a padding byte and not make other assumptions about the values.

But let me know what you think and I'll follow your direction!

wayfarer3130 · 2024-08-29T19:51:15Z

@craigberry1 - as long as you only remove the pad byte on the last item when it is at an odd position, you should be fine as it will then get re-added. I agree the standard is ambiguous there, but I don't quite see how to fix it at this point.

craigberry1 · 2024-09-04T16:14:21Z

@craigberry1 - as long as you only remove the pad byte on the last item when it is at an odd position, you should be fine as it will then get re-added. I agree the standard is ambiguous there, but I don't quite see how to fix it at this point.

Thanks for the response @wayfarer3130, I was away for a few days. I wrote up another comment but deleted it after some more thought because I think the odd length check you suggested is sufficient in this case. I just want to be careful not to regress on the lossless round trip work.

I'll update this PR shortly. Thanks for the help.

… odd length

craigberry1 · 2024-09-04T18:07:23Z

@pieper @wayfarer3130 Ready for another look when you get a chance. Thanks!

pieper · 2024-09-04T18:27:54Z

I think that only stripping a space at the end of the string is the right thing. I don't think any reasonable implementation would add a byte anywhere else to make the buffer and even length. Sounds like we are all agreed then.

Craig Berry added 4 commits August 6, 2024 16:07

Merge remote-tracking branch 'dcmjs/master'

badc2f7

Merge remote-tracking branch 'dcmjs/master'

0621e56

Trim padding byte from numeric strings if they exceed VR max length

890e08a

Tweak comment

123152f

craigberry1 marked this pull request as draft August 26, 2024 17:36

Craig Berry added 3 commits August 27, 2024 10:52

Generalize pad byte trimming and add unit test

5f7e4a0

Refactor with better jsdoc and move function within ValueRepresentati…

f760f78

…on.js class

Additional test cases, account for maxCharLength

b803667

craigberry1 marked this pull request as ready for review August 27, 2024 18:16

pieper reviewed Aug 27, 2024

View reviewed changes

pieper requested a review from wayfarer3130 August 27, 2024 21:30

wayfarer3130 reviewed Aug 28, 2024

View reviewed changes

Trip padding byte from last element in multiple value arrays if it is…

2b57f10

… odd length

pieper merged commit 7b44953 into dcmjs-org:master Sep 4, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

craigberry1 commented Aug 26, 2024

netlify bot commented Aug 26, 2024 •

edited

Loading

craigberry1 commented Aug 26, 2024

craigberry1 commented Aug 27, 2024

pieper left a comment

wayfarer3130 Aug 28, 2024

wayfarer3130 commented Aug 28, 2024

craigberry1 commented Aug 28, 2024

craigberry1 commented Aug 29, 2024 •

edited

Loading

wayfarer3130 commented Aug 29, 2024

craigberry1 commented Sep 4, 2024

craigberry1 commented Sep 4, 2024

pieper commented Sep 4, 2024

fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

fix(lossles-round-trip): Account for padding byte in numeric strings with multiplicity #401

Conversation

craigberry1 commented Aug 26, 2024

netlify bot commented Aug 26, 2024 • edited Loading

✅ Deploy Preview for dcmjs2 ready!

craigberry1 commented Aug 26, 2024

craigberry1 commented Aug 27, 2024

pieper left a comment

Choose a reason for hiding this comment

wayfarer3130 Aug 28, 2024

Choose a reason for hiding this comment

wayfarer3130 commented Aug 28, 2024

craigberry1 commented Aug 28, 2024

craigberry1 commented Aug 29, 2024 • edited Loading

wayfarer3130 commented Aug 29, 2024

craigberry1 commented Sep 4, 2024

craigberry1 commented Sep 4, 2024

pieper commented Sep 4, 2024

netlify bot commented Aug 26, 2024 •

edited

Loading

craigberry1 commented Aug 29, 2024 •

edited

Loading