Skip to content

Conversion between different codepages and/or utf8; source encoding can be guessed

License

Notifications You must be signed in to change notification settings

michael105/codepage_converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpconv

Codepage conversion tool.

Filter stdin to stdout, without any options the charset will be guessed and translated to cp1252.

Currently, the guessing of the charmap might work only with German, the "algorithm" looks for German Umlauts.( äö .. ) It's explained in the source, howto add other languages.

Only extended ascii is implemented (ASCII 0 - 255 ), and UTF8.

Conversion from and to the following charmaps is possible:

cpe4002a
cp850
cp437
cp1252
cp1250
cp1251
cp1253
cp1255
cp1256
cp1257
cp1258
atarist
macintosh
mac_centraleurope
iso8859_15
utf8

cpe4002a is a special codepage, I'm using with slterm. (https://github.com/michael105/slterm)

The option -c converts the input to the notation, used in c strings. ( "\x84\xef .. " ) No previous conversion of the input, extended ascii and chars < ascii 32 are converted to \xnn notation, linebreaks ("\n","\x0a") aren't modified.

It's a small hack, but useful, if you do work at the terminal. Most programs will work with e.g. cp1252, or cp437 (the set with border symbols), but I did always have a hard time with German Umlauts and utf-8.

BOM characters aren't translated, but I also didn't come along them yet. (Might be also obvious, they aren't contained in the extended ascii sets).

I wouldn't recommend using this in a security relevant context, it's not written for security, BOM characters could get a problem. Filtering texts should be save, just don't put it in back of a server.

# cpconv -h

convert stdin to stdout
Usage: cpconv [-hslxud] [tocp [fromcp]]

Examples: cat text.txt | cpconv cp437 cp850
         (convert from cp850 to cp437)

          cat text.txt | cpconv
         (convert to cp1252, guessing source charset)

Without any options, try to guess the charset and convert to cp1252
(change the default in the source, if needed)

options: -s : silence, no messages to stderr
         -v : verbose
         -l : list codepages
         -U : dump umlaute, converted
         -c : convert input to cstring notation
         -C : convert input to cstring notation, including linebreaks
         -x : display non convertible chars in hexadecimal
         -u : display non convertible chars as utf8
         -d : print debug information
         -h : show this help

(I'm using this as input filter for vi, and to copy text between terminal and x clipboard
 the conversion is done automatically, if needed)

miSc, Michael Myer, 2023, GPL

github.com/michael105/codepage_converter

About

Conversion between different codepages and/or utf8; source encoding can be guessed

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published