Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with extremely large vectors #560

Closed
hummy123 opened this issue May 5, 2024 · 6 comments · Fixed by #565
Closed

Segmentation fault with extremely large vectors #560

hummy123 opened this issue May 5, 2024 · 6 comments · Fixed by #565

Comments

@hummy123
Copy link

hummy123 commented May 5, 2024

Hi there.

This isn't something that affects me (it seems to rely on various factors coalescing - a use case I don't have), but I accidentally managed to trigger a segmentation fault and thought it might be worth mentioning here (although I felt bad about possibly adding to the maintenance burden so a bit hesitant).

I basically have a simplified implementation of Clojure's Persistent Vector data structure here, except that this implementation has no indexing operation or metadata needed for it, since I don't need that for my use case.

There is a function toVector which is meant to cons all of the leaf/Lf nodes from right to left, and then a call to Vector.concat to return a flattened vector result. This function is semantically fine I'm pretty sure, but it can trigger a segmentation fault in some cases and it's not clear to me why.

There are comments at lines 7 to 61 in the persistent file snapshot I linked which explain four different independent methods I found for avoiding the segmentation fault.

If it's helpful information, I'm using a build on arm64-linux (a Raspberry Pi 5), with an MLton binary from mlton-builds which was then used to compile a recent native version of mlton from a recent Git commit.

This bug report is just here in case the maintainers want to act on it; I don't have any code planned that will trigger the segfault, and I assume no one else does since this bug hasn't (to my knowledge) been reported before.

@MatthewFluet
Copy link
Member

Could you give more details about how to build and run the program to produce the segfault? It looks like this is more of a data structure library rather than a standalone program, so I'm not sure how to trigger the bad behavior.

@hummy123
Copy link
Author

hummy123 commented May 6, 2024

The program can be built with the following steps:

  1. Saving the linked file (which is from a specific Git commit than the latest commit in that repository)
  2. Adding the following .mlb file in the same directory and running mlton file.mlb

ann
  "allowVectorExps true"
in
  mlvg_vector.sml
end```

@MatthewFluet
Copy link
Member

Thanks. With that, I was able to intermittently reproduce a segfault (on amd64-linux).

@MatthewFluet
Copy link
Member

And compiling with -debug true, I consistently get:

mlvg_vector: gc/invariant.c:13: assertIsObjptrInFromSpaceOrImmutableMutableOrRootStaticHeap: Assertion `isObjptrInFromSpace (s, *opp) || isObjptrInImmutableMutableOrRootStaticHeap (s, *opp)' failed.

MatthewFluet added a commit to MatthewFluet/mlton that referenced this issue May 15, 2024
It is valid for an object pointer to point to the "end" of the
heap (`s->heap.start + s->heap.oldGenSize`) if it is a pointer to an object of
zero size, such as a zero-length array/vector or a `unit array` or `unit ref`,
in which case the object proper does not extend beyond the "end" of heap.

Closes MLton#560
@MatthewFluet
Copy link
Member

@hummy123 #560 and #559 are related in the sense that they are both triggering bugs resulting from aggressive optimization of micro-benchmarks. In particular, both of these bugs stem from MLton observing that the data structures being constructed are never meaningfully destructed (and never doing anything with the elements inserted into the data structure), so while the source program appears to be manipulating int vectors or Real32.real vectors, MLton optimizes these down to unit vectors. (Actually, MLton is missing an optimization: it could replace a unit vector with a word64 (since the only meaningful information in a unit vector is its length.) Particularly in #560, these very many unit vectors being allocated is what triggers the heap translation bug (see #565), since it becomes more likely that a zero-sized object happens to be the last one allocated/copied at a garbage collection that triggers a heap resizing.

While I'm happy to have the bug reports, be aware that these micro-benchmarks may not be giving you meaningful information about how the data structures behave in more realistic scenarios (where, presumably, the data being inserted is ultimately consumed and used to influence the program behavior).

@hummy123
Copy link
Author

@MatthewFluet Thanks for the detailed explanation and teaching something new about how MLton works. I was aware of some optimisation passes used (like defunctionalisation turning closures into sum types) but not this one.

I'm impressed by the static analysis and that you managed to correct the bug this quickly. (I wouldn't know how to start if I had a bug like that on my hands but hoping to learn more about compilers and parsing soon hopefully.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants