Unicode has a few-dozen characters that do not render anything, on purpose.
This is cool for cultural idiosyncracies in historical languages. More often though, their use is unintentional (or nefarious!), and these characters end-up causing problems parsing text formats.
• these are sometimes called 'zero-width', 'ignorable', or 'tag-characters' •
This library helps spot and remove these funboys, before they cause some trouble.
Please remember that some text is meant to have Khmer-vowels, or Kaithi-alphabet characters.
npm install -g out-of-character
detect invisible characters in all files in a directory
out-of-character ./path/to/dir
remove them from all files in a directory
out-of-character ./path/to/dir --replace
detect invisible characters in a file
out-of-character ./path/to/file.txt
remove invisible characters from a file
out-of-character ./path/to/file.txt --replace
import {detect, replace} from 'out-of-character'
let str='nothing s͏neak឵y here' //actually, there is.
console.log(detect(str))
/* 😮 😮 😮
[
{
name: 'KHMER VOWEL INHERENT AA',
code: 'U+17B5',
offset: 15,
replacement: ''
},
{
name: 'MONGOLIAN VOWEL SEPARATOR',
code: 'U+180E',
offset: 19,
replacement: ''
}
]*/
// get rid of them!
let after = replace(str)
console.log(str !== after)
// true
fixing/detecting in files can be done like:
const fs = require('fs')
const {detect, replace} = require('out-of-character')
let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.
// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))
// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.
Thank you to character.construction/blanks by Jan Lelis
and a tale of characters in Unicode by Stefan Judis
- printable-characters - by Vit Gordon
- unzalgo - by kdex
MIT