Use GSM(packed) to encode and then decode, there is an extra '@' character #108

JemmyH · 2022-11-28T12:36:21Z

Question

GSM7 (Packed), encode first, and then decode the encoded result, which is inconsistent with the original input, and there are more '@' characters

For example

My source input was

"1234567890abcdefghijklm"

Firstly I encoded it to a bytes slice m

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

Then I Decoded the m, but got

"1234567890abcdefghijklm@"

There was one more character '@' than the original input.

Here is my test codes:

func TestEncode(t *testing.T) {
	content := "1234567890abcdefghijklm"
	t.Logf("original content: %s, length: %d", strconv.Quote(content), len(content))

	// encode the content with packed option
	encoder := GSM7(true).NewEncoder()
	es, _, err := transform.Bytes(encoder, []byte(content))
	assert.Nil(t, err)
	t.Logf("after encoded. bytes: %v, length: %d", es, len(es))

	// decode `es`
	decoder := GSM7(true).NewDecoder()
	res, _, err := transform.Bytes(decoder, es)
	assert.Nil(t, err)
	t.Logf("after decode. content: %s, length: %d", strconv.Quote(string(res)), len(res))
}

JemmyH · 2022-11-28T12:51:13Z

There is another question. As mentioned in GSM 03.38, if the first 7 bits of the last byte are all 0 after packing, a CR(0x0d) should be filled to the last byte to avoid confusion with @.

When there are 7 spare bits in the last octet of a message, these bits are set to the 7-bit code of the CR control (also used as a padding filler) instead of being set to zero (where they would be confused with the 7-bit code of an '@' character).

For example, the source input is "1234567890abcdefghijklm". After encoding and packing, we will get

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

the last byte is '1', 0000 0001, which matches the scenario mentioned in the above article. So a CR(0x0d) should be filled to it, 1 | (0x0d << 1) = 27, as 0001 1011. Then the new encode result should be:

m1 := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,27}

But when I tried to decode m1, I got

"1234567890abcdefghijklm\r"

Yes, the extra characters become '\r'.

JemmyH · 2022-11-28T13:01:32Z

After my deduction, when the encoded length of the original input satisfies the arithmetic sequence
$a_n=8*n-1$
, the above situation will occur.

For example：

s1 = 1234567890abcdefghijklm, The original length is 23, there is no escape character, and the encoded length is also 23；
s2 = 12345678[abcdefghijklm, The original length is 21, with the escape character '[', occupying two bytes. The length after encoding is also 23；

JemmyH · 2022-11-28T17:28:54Z

here it is the fix: #109

JemmyH · 2022-11-29T02:34:22Z

@fiorix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

JemmyH commented Nov 28, 2022 •

edited

Loading

JemmyH commented Nov 28, 2022

JemmyH commented Nov 28, 2022

JemmyH commented Nov 28, 2022

JemmyH commented Nov 29, 2022

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

Comments

JemmyH commented Nov 28, 2022 • edited Loading

Question

For example

JemmyH commented Nov 28, 2022

JemmyH commented Nov 28, 2022

JemmyH commented Nov 28, 2022

JemmyH commented Nov 29, 2022

JemmyH commented Nov 28, 2022 •

edited

Loading