Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

Open
JemmyH opened this issue Nov 28, 2022 · 4 comments
Open

Comments

@JemmyH
Copy link

JemmyH commented Nov 28, 2022

Question

GSM7 (Packed), encode first, and then decode the encoded result, which is inconsistent with the original input, and there are more '@' characters

For example

My source input was

"1234567890abcdefghijklm"

Firstly I encoded it to a bytes slice m

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

Then I Decoded the m, but got

"1234567890abcdefghijklm@"

There was one more character '@' than the original input.

image

Here is my test codes:

func TestEncode(t *testing.T) {
	content := "1234567890abcdefghijklm"
	t.Logf("original content: %s, length: %d", strconv.Quote(content), len(content))

	// encode the content with packed option
	encoder := GSM7(true).NewEncoder()
	es, _, err := transform.Bytes(encoder, []byte(content))
	assert.Nil(t, err)
	t.Logf("after encoded. bytes: %v, length: %d", es, len(es))

	// decode `es`
	decoder := GSM7(true).NewDecoder()
	res, _, err := transform.Bytes(decoder, es)
	assert.Nil(t, err)
	t.Logf("after decode. content: %s, length: %d", strconv.Quote(string(res)), len(res))
}
@JemmyH
Copy link
Author

JemmyH commented Nov 28, 2022

There is another question. As mentioned in GSM 03.38, if the first 7 bits of the last byte are all 0 after packing, a CR(0x0d) should be filled to the last byte to avoid confusion with @.

When there are 7 spare bits in the last octet of a message, these bits are set to the 7-bit code of the CR control (also used as a padding filler) instead of being set to zero (where they would be confused with the 7-bit code of an '@' character).

For example, the source input is "1234567890abcdefghijklm". After encoding and packing, we will get

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

the last byte is '1', 0000 0001, which matches the scenario mentioned in the above article. So a CR(0x0d) should be filled to it, 1 | (0x0d << 1) = 27, as 0001 1011. Then the new encode result should be:

m1 := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,27}

But when I tried to decode m1, I got

"1234567890abcdefghijklm\r"

Yes, the extra characters become '\r'.

@JemmyH
Copy link
Author

JemmyH commented Nov 28, 2022

After my deduction, when the encoded length of the original input satisfies the arithmetic sequence
$a_n=8*n-1$
, the above situation will occur.

For example:

  • s1 = 1234567890abcdefghijklm, The original length is 23, there is no escape character, and the encoded length is also 23;
  • s2 = 12345678[abcdefghijklm, The original length is 21, with the escape character '[', occupying two bytes. The length after encoding is also 23;

@JemmyH
Copy link
Author

JemmyH commented Nov 28, 2022

here it is the fix: #109

@JemmyH
Copy link
Author

JemmyH commented Nov 29, 2022

@fiorix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant