Skip to content

Conversion library between string, u16string, u32string and u8string

License

Notifications You must be signed in to change notification settings

mbits-libs/utfconv

Repository files navigation

UTF converter

Travis (.org) Coveralls github Cpp Standard Cpp Standard

Open-source library providing conversion between string, u16string, u32string and u8string. It is platform-independent and uses the Unicode UTF code as its basis.

This library distinguishes std::string and std::u8string under C++20, but still assumes the std::string objects contain UTF-8 values.

Synopsis

#include <utf/utf.hpp>

Apart from as_u8(string_view) and as_str8(u8string_view), all the functions decode each Unicode code point of the input (using uint32_t as the interlingua) and encode it in the output. If the decoding fails, an empty string is returned.

Conversion between string[_view] and u8string[_view] is done by simple re-interpretation of the contents.

Versions marked with "C++20" comment are only available, if the standard library defines __cpp_lib_char8_t.

utf::is_valid

bool utf::is_valid(std::u8string_view src);         // C++20
bool utf::is_valid(std::string_view src);
bool utf::is_valid(std::u16string_view src);

Tries to decode the string one character at a time and returns false as soon as decoding fails; otherwise, returns true. If the utf::is_valid returns false for any argument, then any is_xxx function will return an empty string for the same argument.

bool utf::is_valid(std::u32string_view src);

Returns true.

utf::as_u8

std::u8string utf::as_u8(std::u16string_view src);  // C++20
std::u8string utf::as_u8(std::u32string_view src);  // C++20
std::u8string utf::as_u8(std::string_view src);     // C++20

(C++20) Converts other UTF strings to std::u8string. The behavior is that of utf::as_str8, except for the type of the character used.

utf::as_str8

std::string utf::as_str8(std::u8string_view src);   // C++20
std::string utf::as_str8(std::u16string_view src);
std::string utf::as_str8(std::u32string_view src);

Converts other UTF strings to std::string encoded as UTF-8. If compiled as C++20, the behavior is that of utf::as_u8, except for the type of the character used.

utf::as_str8

std::u16string utf::as_u16(std::u8string_view src); // C++20
std::u16string utf::as_u16(std::string_view src);
std::u16string utf::as_u16(std::u32string_view src);

Converts other UTF strings to std::u16string.

utf::as_u32

std::u32string utf::as_u32(std::u8string_view src); // C++20
std::u32string utf::as_u32(std::string_view src);
std::u32string utf::as_u32(std::u16string_view src);

Converts other UTF strings to std::u32string.

#include <utf/version.hpp>

utf::version

constexpr semver::project_version utf::version;

Current version of the library to link against.

utf::get_version

semver::project_version utf::get_version();

Current version of loaded library (if used in dynamic linking) or the same value as utf::version (if used in static linking).

UTFCONV_NAME macro

#define UTFCONV_NAME "utfconv"

Name of the library

UTFCONV_VERSION macros

#define UTFCONV_VERSION_MAJOR
#define UTFCONV_VERSION_MINOR
#define UTFCONV_VERSION_PATCH
#define UTFCONV_VERSION_STABILITY

C macros representing the same information, as utf::version variable, that is UTFCONV_VERSION_MAJOR / MINOR / PATCH have the same values, as utf::version.get_major() / get_minor() / get_patch(). UTFCONV_VERSION_STABILITY contains the same string, that would be returned by utf::version.get_prerelease().to_string(), that is, either an empty string, or string starting with a hyphen for easy version strings concatenation.