Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem resolving some symbols when using the library in an Android C++ project (I am compiling using ndk) #31

Open
cs-jlopezr opened this issue May 2, 2024 · 3 comments

Comments

@cs-jlopezr
Copy link

I was able to successfully compile the library but when I use it as indicated in the example folder I am having the following errors:

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::SentencePieceProcessor()

referenced by sentencepiece_tokenizer.cc:18 (src/sentencepiece_tokenizer.cc:18)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::SentencePieceTokenizer(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::LoadFromSerializedProto(std::__ndk1::basic_string_view<char, std::__ndk1::char_traits >)

referenced by sentencepiece_tokenizer.cc:19 (src/sentencepiece_tokenizer.cc:19)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::SentencePieceTokenizer(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::util::Status::~Status()

referenced by sentencepiece_tokenizer.cc:19 (src/sentencepiece_tokenizer.cc:19)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::SentencePieceTokenizer(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
referenced by sentencepiece_tokenizer.cc:24 (src/sentencepiece_tokenizer.cc:24)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Encode(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
referenced by sentencepiece_tokenizer.cc:24 (src/sentencepiece_tokenizer.cc:24)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Encode(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
referenced 2 more times

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::~SentencePieceProcessor()

referenced by sentencepiece_tokenizer.cc:20 (src/sentencepiece_tokenizer.cc:20)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::SentencePieceTokenizer(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
referenced by sentencepiece_tokenizer.cc:16 (src/sentencepiece_tokenizer.cc:16)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::~SentencePieceTokenizer()) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::Encode(std::__ndk1::basic_string_view<char, std::__ndk1::char_traits >, std::__ndk1::vector<int, std::__ndk1::allocator >*) const

referenced by sentencepiece_tokenizer.cc:24 (src/sentencepiece_tokenizer.cc:24)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Encode(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::util::Status::IgnoreError()

referenced by sentencepiece_tokenizer.cc:24 (src/sentencepiece_tokenizer.cc:24)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Encode(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
referenced by sentencepiece_tokenizer.cc:30 (src/sentencepiece_tokenizer.cc:30)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Decode(std::__ndk1::vector<int, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::Decode(std::__ndk1::vector<int, std::__ndk1::allocator > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator >*) const

referenced by sentencepiece_tokenizer.cc:30 (src/sentencepiece_tokenizer.cc:30)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::Decode(std::__ndk1::vector<int, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::GetPieceSize() const

referenced by sentencepiece_tokenizer.cc:35 (src/sentencepiece_tokenizer.cc:35)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::GetVocabSize()) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::IdToPiece(int) const

referenced by sentencepiece_tokenizer.cc:40 (src/sentencepiece_tokenizer.cc:40)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::IdToToken(int)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a

ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::PieceToId(std::__ndk1::basic_string_view<char, std::__ndk1::char_traits >) const

referenced by sentencepiece_tokenizer.cc:42 (src/sentencepiece_tokenizer.cc:42)
sentencepiece_tokenizer.cc.o:(tokenizers::SentencePieceTokenizer::TokenToId(std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const&)) in archive ./src/tokenizers-cpp/libtokenizers_cpp.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

When I check inside the library the symbols are properly defined.

In my code I am just doing the same as in the example folder, so I am not invoking directly the symbols that are not recognized. The ones that I am using (FromBlobSentencePiece, for example) are correctly identified. What could be the error?

One things which is curious for me is: why the compiler of my program is complaining about the src/sentencepiece_tokenizer.cc file if I am just using the static library (the .a file) through the tokenizers_cpp.h file provided by the library?

@cs-jlopezr
Copy link
Author

I was able to solve the issue compiling the sentencepiece tokenizer library separately and adding the dependency explicitly. It is not clear in the usage instructions.

@cs-jlopezr
Copy link
Author

And now, Not sure why I am getting a Segmentation fault when using the library. I am just doing the same as in the example. The initialization of the tokenizer is apparently ok but then when I want to encode: segmentation fault!

@aidevmin
Copy link

aidevmin commented Oct 8, 2024

@cs-jlopezr
Could you share the source code and config to reproduce?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants