How to load data blob in C++ and then use segmenter? #3298
Replies: 1 comment 8 replies
-
Hi @sffc , I've written below code to load data file but it is not running: UnicodeString str ("ພາສາລາວ");
printUnicodeString(str);
cout<<endl;
std::ifstream file("checking.txt");
if (!file.is_open()) {
// handle error
cout<<"here1"<<endl;
}
if (!file.good()) {
cout<<"here2"<<endl;
// handle error
}
file.seekg(0, std::ios::end);
std::streamsize blob_size = file.tellg();
file.seekg(0, std::ios::beg);
cout<<"size buffer "<<blob_size<<endl;
std::vector<uint8_t> buffer(blob_size);
const diplomat::span<const uint8_t> blob(buffer.data(), buffer.size());
const uint16_t* u16Ptr = reinterpret_cast<const uint16_t*>(str.getBuffer());
diplomat::result<ICU4XDataProvider, ICU4XError> provider_result = ICU4XDataProvider::create_from_byte_slice(blob);
const auto provider_result = ICU4XDataProvider::create_from_byte_slice(blob);
const auto segmenter_auto = ICU4XWordSegmenter::create_auto(provider_result.).ok().value();
const ICU4XWordSegmenter* segmenters[] = {&segmenter_auto};
size_t sizeInBytes = str.length() * sizeof(UChar);
const size_t size = sizeInBytes / sizeof(uint16_t); // Compute the number of elements in the array
diplomat::span<const uint16_t> span(u16Ptr,size);
auto iterator = segmenters[0]->segment_utf16(span);
int32_t breakpoint = iterator.next();
int32_t breakpoint2 = iterator.next();
cout<<breakpoint2; blob_size is coming as -1 which is why the folllowing error is coming: libc++abi: terminating with uncaught exception of type std::length_error: vector |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using ICU4X library in C++. I was trying segmenter code with test data provider initially but now I want to use data blob with all locales and segmenter files. So, I want to know that how to load this data blob in C++?
Beta Was this translation helpful? Give feedback.
All reactions