-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly track json memory usage #3641
Conversation
src/core/compact_object.cc
Outdated
@@ -1037,7 +1040,8 @@ size_t CompactObj::MallocUsed() const { | |||
|
|||
if (taglen_ == JSON_TAG) { | |||
DCHECK(u_.json_obj.json_ptr != nullptr); | |||
return zmalloc_size(u_.json_obj.json_ptr); | |||
auto* ptr = u_.json_obj.json_ptr; | |||
return zmalloc_usable_size(ptr) + ptr->capacity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also wrong. ptr->capacity
does not return the actual memory consumed in bytes. So clearly we have a problem. The main issues are:
- There doesn't seem to exist a function in
Json
that clearly returns the size in bytes used by the object. - We can't approximate this easily since
json elements are polymoprhic
and it does not use flat storage internally such that we can usezmalloc_usable_size()
on itsinternal pointer
.
We could in theory iterate over all of its elements and write a visitor that aggregates the size of all of its elements but that would be extremely slow especially when info memory
command is used because now we need to iterate over all the json keys in the data store -- obviously a no no
And we also don't really control json
since it's an external library.
So my idea is to wrap class MiMemoryResource
and introduce a MiJsonMemoryResource
. This allocator will use the same heap
as the CompactObject::memory_resource
with one difference. It will have a member which will keep track how much memory it allocated/deallocated. Once the json operation finishes, we will reset this member value (such that subsequent operations on different json keys get accurate memory usage) and add/subtract it from a size
object which will introduce in JsonWrapper
. That way, when we ask for memory we will return this field without calculating anything and the memory consumption will properly be tracked by the allocator eliminating the need for manual tracking on an object that belongs to a library we don't own.
WDYT? @chakaz @adiholden @romange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, using a custom memory resource for JSON sounds like a good solution. Indeed it would just call the data heap so all data will be in the same place as it was before.
This however won't play very nicely with the way we compute object memory today, and in many places we want to know the size of a specific object. So I wonder, perhaps we want to have a memory_resource per JSON object? It has the cost of 2 pointers per object, so not sure if it's worth it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
know the size of a specific object.
Yep and that why we will introduce a single size_t per json object in CompactObj. So the overhead will only be for one element which will keep the size (which is fine because we need a way to track memory for json objects)
src/core/compact_object.cc
Outdated
@@ -696,6 +697,8 @@ void CompactObj::SetJson(JsonType&& j) { | |||
SetMeta(JSON_TAG); | |||
u_.json_obj.json_len = j.size(); | |||
u_.json_obj.json_ptr = AllocateMR<JsonType>(std::move(j)); | |||
DCHECK(std::pmr::polymorphic_allocator<char>{memory_resource()} == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bingo.
2024-09-03T16:48:59.0738486Z 60: F20240903 16:48:59.073283 22070 compact_object.cc:700] Check failed: std::pmr::polymorphic_allocator<char>{memory_resource()} == u_.json_obj.json_ptr->get_allocator()
2024-09-03T16:48:59.0740855Z 60: *** Check failure stack trace: ***
2024-09-03T16:48:59.0761492Z 60: @ 0x564941768dbb google::LogMessage::Fail()
2024-09-03T16:48:59.0781948Z 60: @ 0x564941768d02 google::LogMessage::SendToLog()
2024-09-03T16:48:59.0802555Z 60: @ 0x5649417684f7 google::LogMessage::Flush()
2024-09-03T16:48:59.0824658Z 60: @ 0x56494176c20e google::LogMessageFatal::~LogMessageFatal()
2024-09-03T16:48:59.0869710Z 60: @ 0x5649412c6855 dfly::CompactObj::SetJson()
2024-09-03T16:48:59.0895859Z 60: @ 0x564940be862b dfly::RdbLoaderBase::OpaqueObjLoader::HandleBlob()
2024-09-03T16:48:59.0930073Z 60: @ 0x564940be4b97 dfly::RdbLoaderBase::OpaqueObjLoader::operator()()
2024-09-03T16:48:59.0972643Z 60: @ 0x564940c0b783 std::__invoke_impl<>()
2024-09-03T16:48:59.1012552Z 60: @ 0x564940c05aaa std::__invoke<>()
2024-09-03T16:48:59.1059136Z 60: @ 0x564940bffb53 std::__detail::__variant::__gen_vtable_impl<>::__visit_invoke_impl()
2024-09-03T16:48:59.1091480Z 60: @ 0x564940bffb96 std::__detail::__variant::__gen_vtable_impl<>::__do_visit_invoke()
memory_resource was not propagated
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding the allocate used should have been CompactObject::memory_resource()
. I haven't checked the specific case but based on the semantics on the json family this should be the case. Let me know if I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think this is a problem with jsoncons, I'd try to create a simple standalone example of this happening and report it in their repo.
We've already reported similar issues before, and Daniel (the maintainer) always helped us
In the worst case we can handle json differently from the other objects by using a separate MiMemoryResource with the same data heap. We won't know "memory usage " but this is much smaller problem. |
that's the plan and it's almost ready. But I found bugs in their json impl not propagating the allocators properly. I am patching this and I will raise the issue on jsoncons lib |
Signed-off-by: kostas <kostas@dragonflydb.io>
src/core/compact_object.h
Outdated
struct JsonConsT { | ||
JsonType* json_ptr; | ||
// Owning and must be explicitly deallocated | ||
MiMemoryResource* mr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this but I want to keep the DCHECKS. If in the future we decide we need an extra 8
bytes for JsonConsT
, we can remove this 🕺
src/core/json/json_object.cc
Outdated
return decoder.get_result(); | ||
auto res = decoder.get_result(); | ||
// This should not trigger once pmr is fixed in jsoncons | ||
DCHECK(res.get_allocator().resource() == mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are blocked on this until it's addressed by jsoncons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so add an issue in the comment and comment out the DCHECK I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will cause a memory leak. We will loose this mr
. Yes maybe we can release it here but then the tracking will be wrong. :/
src/core/compact_object.h
Outdated
struct JsonConsT { | ||
JsonType* json_ptr; | ||
// Owning and must be explicitly deallocated | ||
MiMemoryResource* mr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why you keep MiMemoryResource* mr;
inside each CompactObject.
- JsonType already has allocator in it
- All json objects will use the same MiMemoryResource instance unless I am missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know about this. I keep it because I want to DCHECK
the hell out of it. It's just breaks with pmr
in so many levels that even if we rolled in production I would add LOG(warning)
to warn us that the pmr
went brrr
src/server/json_family.cc
Outdated
@@ -14,12 +14,17 @@ | |||
#include <jsoncons_ext/jsonpath/jsonpath.hpp> | |||
#include <jsoncons_ext/jsonpointer/jsonpointer.hpp> | |||
#include <jsoncons_ext/mergepatch/mergepatch.hpp> | |||
#include <memory> | |||
#include <memory_resource> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not include <memory_resource>
directly and in general, can you please disable include what your use rule in your environment? it's too agressive
} | ||
pv.SetJsonSize(net_positive, zero_diff, diff); | ||
// Under any flow we must not end up with this special value. | ||
DCHECK(pv.MallocUsed() != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding LOG(ERROR)
for that case so if it ever happens we get an alert.
src/core/compact_object.cc
Outdated
@@ -50,6 +52,11 @@ size_t QlMAllocSize(quicklist* ql) { | |||
return res + ql->count * 16; // we account for each member 16 bytes. | |||
} | |||
|
|||
uint32_t JsonEnconding() { | |||
static bool is_experimental = absl::GetFlag(FLAGS_experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth making it thread local since the compiler injects a mutex on initialization to make the static thread safe 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do that, why not?
src/core/compact_object.cc
Outdated
@@ -50,6 +52,11 @@ size_t QlMAllocSize(quicklist* ql) { | |||
return res + ql->count * 16; // we account for each member 16 bytes. | |||
} | |||
|
|||
uint32_t JsonEnconding() { | |||
static bool is_experimental = absl::GetFlag(FLAGS_experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static bool is_experimental = absl::GetFlag(FLAGS_experimental_flat_json); | |
static bool is_flat = absl::GetFlag(FLAGS_experimental_flat_json); |
src/core/compact_object.cc
Outdated
@@ -50,6 +52,11 @@ size_t QlMAllocSize(quicklist* ql) { | |||
return res + ql->count * 16; // we account for each member 16 bytes. | |||
} | |||
|
|||
uint32_t JsonEnconding() { | |||
static bool is_experimental = absl::GetFlag(FLAGS_experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do that, why not?
src/core/compact_object.cc
Outdated
if (u_.json_obj.cons.bytes_used == 0) { | ||
u_.json_obj.cons.bytes_used = size; | ||
} else if (net_positive) { | ||
u_.json_obj.cons.bytes_used += size; | ||
} else { | ||
DCHECK(u_.json_obj.cons.bytes_used >= size); | ||
u_.json_obj.cons.bytes_used -= size; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not accept size
(call it delta
) as some signed type (int64_t
or whatever), and then just do u_.json_obj.cons.bytes_used += delta;
?
Why all the if
s and extra parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also specifically this first if
is unneeded right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I agree we don't need the first case. Furthermore, I was afraid of overflowing/underflowing but we do the exact same thing as you suggest in AccountObjectMemory
so I guess it should not be an issue. Delta sounds simpler also. I will apply this.
@@ -445,13 +448,21 @@ class CompactObj { | |||
}; | |||
} __attribute__((packed)); | |||
|
|||
struct JsonConsT { | |||
JsonType* json_ptr; | |||
size_t bytes_used; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reduce it to uint32_t and keep the encoding inside the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint_32 limits to 4gb. It's extreme to expect any object would take more than that but you never know. As for enconding, I don't think it makes much sense to store this. It's fixed once it's set and that's also one of the reasons I refactored it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also use uint48:=uint8_t[6]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for storing it per object is that I expect we would want a heuristic at some point for encoding, managed per object basis. like we do with hash maps, sets etc. flat is awful for mutable operations, for example. we do not have to do it now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to pack this in the beginning and in fact I already mentioned to @adiholden that if in the future we wanted more space on the json we store within a compcat object we could get some. However I thought that the size_t suffices for now and until we actually need this there is no reason to take the bitfield route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if we ever want a heuristic, how many bits would that require? 4-8 max ? For that use case we will just make the bytes_used (and the length for flat json) a bitfield and mask the last n
bits we want to use for enconding. It would be a simple change that would serve our need.
What I am trying to say here is I already thought about us needing a few more bits in the future and it's more than doable :)
}; | ||
uint32_t json_len = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep both fields common for both encodings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO len is how many characters. The other one is how many bytes. I find it a little clearer 🤷
src/server/json_family.cc
Outdated
// It uses RAII to set the memory usage of json objects. | ||
class JsonMemTracker { | ||
public: | ||
explicit JsonMemTracker(const OpArgs& args, std::string_view key, bool op_set = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refrain from using default arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder :)
src/server/json_family.cc
Outdated
explicit JsonMemTracker(const OpArgs& args, std::string_view key, bool op_set = false) | ||
: op_args_(args), key_(key), op_set_{op_set} { | ||
start_size_ = static_cast<MiMemoryResource*>(CompactObj::memory_resource())->used(); | ||
auto it_res = op_args_.GetDbSlice().FindMutable(op_args_.db_cntx, key_, OBJ_JSON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we now call Find 3 times more times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 more times in the worst case (when the json does not exist). Otherwise it's one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use a similar pattern with already existing iterator? i.e. avoid additional finds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. The short answer is yes we can but it would clutter the code and I wanted to avoid that (unless what I do here is a huge regression on the flows which IMO I don't think it's the case as we already do this in a few places already -- OpMerge). Take for example JSON.SET
and it's implementation. It has two main flows.
- If the path is the root element, then call
FindReadOnly
and thenSetJson
which callsAddOrFind()
- If the path is not root, call
UpdateEntry()
The thing is that we need to track the memory from the start, at the moment we call a Find
variant because it could be that we replace the old json with a new one (case 1 above) and we would need to track the memory as follows (mem_of_prev_json + new_mem_delta) where new_mem_delta
can be negative.
So if I replaced the RAII
class I introduced to avoid the Find
at the start of a json operation, we would need to carefully go around the json_family
functions and track explicitly the starting memory after we call Find
on each flow which would be really error prone. I understand that we don't have many paths with this, since most operations are built around UpdateEntry
but there is also another issue.
To further complicate things, we usually call ShardJsonFromString
which we call std::move()
on its result to move it around in the interesting flows until we either store it as is, or use it to update existing jsons. This is the json item that we need to track its memory
. The problem is that we construct this item, before we call Find
on the flows I describe above but we need to know the memory before we construct this item.
So to fully answer, maybe we could avoid those finds but that would require a complicated refactor that seems to be cumbersome as it requires explicit control of memory tracking around each json flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify JsonMemTracker
to use the compact object directly (i.e. not to call Find()
) and modify UpdateEntry()
to use it?
And even if I missed some call that it won't work for, we can code around that instead, because it does seem like all (the majority?) of calls use it
src/core/compact_object.cc
Outdated
@@ -50,6 +52,8 @@ size_t QlMAllocSize(quicklist* ql) { | |||
return res + ql->count * 16; // we account for each member 16 bytes. | |||
} | |||
|
|||
const thread_local uint32_t kJsonEnconding = absl::GetFlag(FLAGS_experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work I'm afraid.
Globals are initialized before main()
runs, not lazily. This means that it would run after flags parsing.
Easiest workaround imho would be to wrap this in a function, and declare the thread_local
variable inside it. This would evaluate this lazily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right. For some reason I always forget about the order of initialization for globals. Same function as a I had before just with a thread_local within it :)
src/server/json_family.cc
Outdated
// It uses RAII to set the memory usage of json objects. | ||
class JsonMemTracker { | ||
public: | ||
explicit JsonMemTracker(const OpArgs& args, std::string_view key, bool op_set = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder :)
src/server/json_family.cc
Outdated
explicit JsonMemTracker(const OpArgs& args, std::string_view key, bool op_set = false) | ||
: op_args_(args), key_(key), op_set_{op_set} { | ||
start_size_ = static_cast<MiMemoryResource*>(CompactObj::memory_resource())->used(); | ||
auto it_res = op_args_.GetDbSlice().FindMutable(op_args_.db_cntx, key_, OBJ_JSON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify JsonMemTracker
to use the compact object directly (i.e. not to call Find()
) and modify UpdateEntry()
to use it?
And even if I missed some call that it won't work for, we can code around that instead, because it does seem like all (the majority?) of calls use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LG! Please see some last nits
src/core/compact_object.cc
Outdated
@@ -50,6 +52,11 @@ size_t QlMAllocSize(quicklist* ql) { | |||
return res + ql->count * 16; // we account for each member 16 bytes. | |||
} | |||
|
|||
uint32_t JsonEnconding() { | |||
static thread_local uint32_t json_enc = absl::GetFlag(FLAGS_experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static thread_local uint32_t json_enc = absl::GetFlag(FLAGS_experimental_flat_json); | |
static thread_local uint32_t json_enc = absl::GetFlag(FLAGS_experimental_flat_json) ? kEncodingJsonFlat : kEncodingJsonCons; |
src/server/json_family.cc
Outdated
@@ -36,7 +38,7 @@ | |||
ABSL_FLAG(bool, jsonpathv2, true, | |||
"If true uses Dragonfly jsonpath implementation, " | |||
"otherwise uses legacy jsoncons implementation."); | |||
ABSL_FLAG(bool, experimental_flat_json, false, "If true uses flat json implementation."); | |||
ABSL_DECLARE_FLAG(bool, experimental_flat_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expose the function instead of using the flag here directly
src/server/json_family.cc
Outdated
void SetJsonSize(PrimeValue& pv, bool is_op_set) { | ||
const size_t current = static_cast<MiMemoryResource*>(CompactObj::memory_resource())->used(); | ||
int64_t diff = static_cast<int64_t>(current) - static_cast<int64_t>(start_size_); | ||
// If the diff is 0 it means the object uses the same memory as before. No actio needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the diff is 0 it means the object uses the same memory as before. No actio needed. | |
// If the diff is 0 it means the object use the same memory as before. No action needed. |
There is a problem on how we track memory usage for
JSON
family. Specifically the expression below:is wrong and it always returns the same fixed size (16) regardless of the actual memory used by the json object. It's wrong because this returns the size of the object pointed to by
json_ptr
but it does not account the objects it allocates internally (it's internal pointers). For that to happen we would need to trackJson
's object memory somehow but this is complicated (see in comments).Addresses #3661