Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when calling libsais_int #14

Open
julianmukaj opened this issue May 16, 2024 · 0 comments
Open

Segmentation Fault when calling libsais_int #14

julianmukaj opened this issue May 16, 2024 · 0 comments

Comments

@julianmukaj
Copy link

julianmukaj commented May 16, 2024

Suffix array initialized with length: 3055148
Calling libsais_int with parameters:
  buffer.as_ptr(): 0x75fee1dff010
  suffix_array.as_mut_ptr(): 0x6967ee0
  buffer.len() as i32: 3055148
  vocab_size: 100000
  symbol_frequency_table: 0
Segmentation fault (core dumped)

Justing the datastore creation scripts and they seem to crash on the finalize step of lib.rs, this is on ubuntu 22 with py 3.9.. Same thing on Windows.

Opened the git issue prematurely, I fixed this by adding +1 to vocabulary size in lib.rs (https://discourse.julialang.org/t/segfault-calling-c-function-any-advice/94730/8) and rebuilding the wheel, maybe it is model dependent issue not sure, something to track down and handle for future releases maybe?

I am having trouble with the data reader/search part too..

let end_of_indices = end_of_indices.unwrap();

is not caught is end_of_indices is None

Edit again: increasing vocabulary size above the tokenizer vocab size seems to solve the segmentation error, seems dependent on the datastore data if it throws or not.

if end_of_indices.is_none() {
                    return
                }

couldn't figure out why end_of_indices was null sometimes so just returned if so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant