GitHub - llelf/prose: Pure-Haskell proper unicode string handling

Pure-Haskell proper unicode strings

λ> graphemes "བོད་ཀྱི་སྐད་ཡིག།"
["བོ","ད","་","ཀྱི","་","སྐ","ད","་","ཡི","ག","།"]

See prose-lens for a lens interface.

segmentation:
✓⃞ grapheme
✓⃞ words ⃞ tailored
⃞ sentences ⃞ tailored
⃞ line-breaking ⃞ tailored

normalization:
✓⃞ NFD ✓⃞ NFKD ✓⃞ NFC ⃞ NFKC

collating ⃞ …
transformation ⃞ …
character properties ⃞ …
other cldr ⃞ …

HCAR entry:

Many programming languages offer non-existing or very poor support for Unicode. While many think that Haskell is not one of them, this is not completely true. The way-to-go library of Haskell’s string type, Text, only provides codepoint-level operations. Just as a small and very elementary example: two “Haskell café” strings, first written with the ‘é’ character, and the second with the ‘e’ character followed by a combining acute accent character, are obviously have a correspondence for many real-world situations. Yet they are entirely different and unconnected things for Text and its operations.

And even though there is text-icu library offering proper Unicode functions, it has a form of FFI bindings to C library (and that is painful, especially for Windows users). More so, its API is very low-level and incomplete.

Prose is (work-in-progress) pure Haskell implementation of Unicode strings. Right now it’s completely inoptimized. Implemented parts are normalization algorithms and segmentation by graphemes and words.

Numerals is pure Haskell implementation of CLDR (Common Language Data Repository, Unicode’s locale data) numerals formatting.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
Prose		Prose
benchmark		benchmark
data-test		data-test
data		data
tests		tests
.travis.yml		.travis.yml
LICENSE		LICENSE
Prose.hs		Prose.hs
README.md		README.md
Setup.hs		Setup.hs
gather-props.hs		gather-props.hs
prose.cabal		prose.cabal
stack.yaml		stack.yaml
try.hs		try.hs

	Prose/𝘚	ICU
segmentation/graphemes one-lang text	1.60ms	0.47ms
segmentation/graphemes chars sample	15.84ms	16.30ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

llelf/prose

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages