Skip to content

Commit

Permalink
Merge pull request #41 from basemachina/add-ignore-handle-fallback-op…
Browse files Browse the repository at this point in the history
…tion

SJIS, EUC-JP, JISに変換できないときに該当の文字を無視するオプションを追加する
  • Loading branch information
polygonplanet authored Jun 2, 2024
2 parents 6b18295 + 5055f40 commit a9a4172
Show file tree
Hide file tree
Showing 5 changed files with 100 additions and 0 deletions.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Convert and detect character encoding in JavaScript.
+ [Specify conversion options to the argument `to` as an object](#specify-conversion-options-to-the-argument-to-as-an-object)
+ [Specify the return type by the `type` option](#specify-the-return-type-by-the-type-option)
+ [Replacing characters with HTML entities when they cannot be represented](#replacing-characters-with-html-entities-when-they-cannot-be-represented)
+ [Ignoring characters when they cannot be represented](#ignoring-characters-when-they-cannot-be-represented)
+ [Specify BOM in UTF-16](#specify-bom-in-utf-16)
* [urlEncode : Encodes to percent-encoded string](#encodingurlencode-data)
* [urlDecode : Decodes from percent-encoded string](#encodingurldecode-string)
Expand Down Expand Up @@ -405,6 +406,30 @@ const sjisArray = Encoding.convert(unicodeArray, {
console.log(sjisArray); // Converted to a code array of 'ホッケの漢字は𩸽'
```

#### Ignoring characters when they cannot be represented

By specifying `ignore` as a `fallback` option, characters that cannot be represented in the target encoding format can be ignored.

Example of specifying `{ fallback: 'ignore' }` option:

```javascript
const unicodeArray = Encoding.stringToCode("寿司🍣ビール🍺");
// No fallback specified
let sjisArray = Encoding.convert(unicodeArray, {
to: "SJIS",
from: "UNICODE",
});
console.log(sjisArray); // Converted to a code array of '寿司?ビール?'

// Specify `fallback: html-entity`
sjisArray = Encoding.convert(unicodeArray, {
to: "SJIS",
from: "UNICODE",
fallback: "ignore",
});
console.log(sjisArray); // Converted to a code array of '寿司ビール'
```

#### Specify BOM in UTF-16

You can add a BOM (byte order mark) by specifying the `bom` option when converting to `UTF16`.
Expand Down
25 changes: 25 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ JavaScript で文字コードの変換や判定をします。
+ [引数 `to` にオブジェクトで変換オプションを指定する](#引数-to-にオブジェクトで変換オプションを指定する)
+ [`type` オプションで戻り値の型を指定する](#type-オプションで戻り値の型を指定する)
+ [変換できない文字を HTML エンティティ (HTML 数値文字参照) に置き換える](#変換できない文字を-html-エンティティ-html-数値文字参照-に置き換える)
+ [変換できない文字を無視する](#変換できない文字を無視する)
+ [UTF-16 に BOM をつける](#utf-16-に-bom-をつける)
* [urlEncode : 文字コードの配列をURLエンコードする](#encodingurlencode-data)
* [urlDecode : 文字コードの配列にURLデコードする](#encodingurldecode-string)
Expand Down Expand Up @@ -395,6 +396,30 @@ const sjisArray = Encoding.convert(unicodeArray, {
console.log(sjisArray); // 'ホッケの漢字は𩸽' の数値配列に変換されます
```

#### 変換できない文字を無視する

変換先の文字コードで表現できない文字を無視するには、 `fallback` オプションに `ignore` を指定します。

`{ fallback: 'ignore' }` オプションを指定する例:

```javascript
const unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺');
// fallback指定なし
let sjisArray = Encoding.convert(unicodeArray, {
to: 'SJIS',
from: 'UNICODE'
});
console.log(sjisArray); // '寿司?ビール?' の数値配列に変換されます

// `fallback: ignore`を指定
sjisArray = Encoding.convert(unicodeArray, {
to: 'SJIS',
from: 'UNICODE',
fallback: 'ignore'
});
console.log(sjisArray); // '寿司ビール' の数値配列に変換されます
```

#### UTF-16 に BOM をつける

`UTF16` に変換する際に `bom` オプションを指定すると BOM (byte order mark) の付加を指定できます。
Expand Down
3 changes: 3 additions & 0 deletions encoding.js
Original file line number Diff line number Diff line change
Expand Up @@ -1824,6 +1824,9 @@ function handleFallback(results, bytes, fallbackOption) {
}
results[results.length] = 0x3B; // ;
}
break;
case 'ignore':
break;
}
}

Expand Down
3 changes: 3 additions & 0 deletions src/encoding-convert.js
Original file line number Diff line number Diff line change
Expand Up @@ -1672,5 +1672,8 @@ function handleFallback(results, bytes, fallbackOption) {
}
results[results.length] = 0x3B; // ;
}
break;
case 'ignore':
break;
}
}
44 changes: 44 additions & 0 deletions tests/test.js
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,50 @@ describe('encoding', function() {
assert.deepEqual(decoded, '🍣寿司ビール🍺');
});
});

describe('Ignore untranslatable unknown characters', function() {
it('SJIS', function() {
// Characters that cannot be converted to Shift_JIS ('🍣', '🍺') will be ignored.
var sjis = encoding.convert(utf8, {
to: 'sjis',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(sjis, {
to: 'unicode',
from: 'sjis'
});
assert.deepEqual(decoded, '寿司ビール');
});

it('EUC-JP', function() {
// Characters that cannot be converted to EUC-JP ('🍣', '🍺') will be ignored.
var eucjp = encoding.convert(utf8, {
to: 'euc-jp',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(eucjp, {
to: 'unicode',
from: 'euc-jp'
});
assert.deepEqual(decoded, '寿司ビール');
});

it('JIS', function() {
// Characters that cannot be converted to JIS ('🍣', '🍺') will be ignored.
var jis = encoding.convert(utf8, {
to: 'jis',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(jis, {
to: 'unicode',
from: 'jis'
});
assert.deepEqual(decoded, '寿司ビール');
});
});
});
});

Expand Down

0 comments on commit a9a4172

Please sign in to comment.