Skip to content

Latest commit

 

History

History
457 lines (392 loc) · 27.3 KB

wuwt-e02-dchecks.md

File metadata and controls

457 lines (392 loc) · 27.3 KB

What’s Up With DCHECKs

This is a transcript of What's Up With That Episode 2, a 2022 video discussion between Sharon (yangsharon@chromium.org) and Peter (pbos@chromium.org).

The transcript was automatically generated by speech-to-text software. It may contain minor errors.


You've seen DCHECKs around and been asked to use them in code review, but what are they? What's the difference between a CHECK and a DCHECK? How do you use them? Here to answer that is special guest is Peter, who works on UI and improving crash reports.

Notes:

Links:


00:00 SHARON: Hello, and welcome to What's Up With That?, the series that demystifies all things Chrome. I'm your host, Sharon. And today, we're talking about DCHECKs. You've seen them around. You've probably been told to add one in code review before. But what are they? What are they for, and what do they do? Our guest today is Peter, who works on desktop UI and Core UI. He's also working on improving Chrome's crash reports, which includes DCHECKs. Today he'll help us answer, what's up with DCHECKs? Welcome, Peter.

00:30 PETER: Thanks for having me.

00:32 SHARON: Yeah. Thanks for being here. So the most obvious question to start with, what is a DCHECK?

00:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure that what you think is true is true. Right? So this should never be called with an empty vector. You might add a CHECK for it, or you might add a DCHECK for it. And it's sort of similar to asserts, which you may have hit during earlier programming outside of Chrome. And what it means is when this line gets hit, we check and see if it's true. And if it's not true, we crash. DCHECKs differ from CHECKs in that they are traditionally only in debug builds, or local development builds, or on our try-bots. So they have zero overhead when Chrome hits stable, because the CHECK just won't be there.

01:24 SHARON: OK. So I guess the D stands for Debug. That make sense.

01:28 PETER: Yeah. I want debug to turn into developer, because now we have them by default if you're no longer - if you're doing a release build, and you're not turning them off, and you're not doing an official build, you get them.

01:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before. I heard it here first. So you mentioned asserts. So something that I've seen a couple times in Chrome, and also is part of the standard library, is static_assert. So how is that similar or different to DCHECKs? And why do we use or not use them?

02:00 PETER: Right. So static_asserts are - and you're going to have to ask C++ experts, who can probably take some of the sharp edges off of this - but it's basically, if you can assert something in compile time, then you can use a static_assert, which means that you don't have to hit a code path where it's wrong. It sort of has to always hold true. And whenever you can use a static_assert, use a static_assert, because it's free. And basically, you can't compile the program if it's not true.

02:31 SHARON: OK. That's good to know, because I definitely thought that was one of the C++ standard library things we should avoid, because we have a similar thing in Chromium. But I guess that's not the case.

02:41 PETER: Yeah. Assert is the one that is - OK, so this is a little complicated, right? static_assert is a language feature, not a library feature. And someone will tell me that I'm wrong about something about this. Asserts are just sort of a poorer version of DCHECKs. So they won't go through our crash handling. It won't print the pretty stacks, et cetera. static_asserts, on the other hand, are a compile time feature. And we don't, as far as I know, have our own wrapper around it. We just use static_assert. So what you would maybe use this for is like if you have a constant - like, say you have an array, and the code makes an assumption that some constant is the size of this array, you can assert that in compile time, and that would be a good use of a static_assert.

03:26 SHARON: OK. Cool. So you mentioned that some things have changed with how DCHECKs work. So can you give us a brief overview of the history of DCHECKs - what they used to be, people who have been using them for a while, how might they have changed from the idea of what they have as a DCHECK in their mind?

03:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating from what I've seen. And what I think originally was true is that a CHECK used to be this logging statement, where you essentially compile the file name and the line number. And if this ever hits, then we'll log some stuff and then crash. Right? Which comes with a little bit of overhead, especially on size, that you basically take the file name and line number for every instance, and that generates a bunch of strings and numbers that essentially add to Chrome's binary size. I don't know how many steps between that and where we currently are. But right now, our CHECKs are just, if condition is false, crash, which means that you won't, out of the CHECK, get file name and line number. We'll get those out of debugging symbols. And you also won't get any of the logging messages that you can add to the end of a CHECK, which means that your debug info will be poorer, but it will be cheaper to use. So they've gotten from being pretty heavy CHECKs to being really cheap.

05:01 SHARON: OK. So that kind of leads us into the question that I think most people want to have answered, which is, when should I use a DCHECK? When should I use a CHECK? When should I use neither?

05:13 PETER: I would say that historically, we've said CHECKs are expensive. Don't use them unless you sort of have to. And I don't think that holds true anymore. So basically, unless you are in really performance-critical code, then use a CHECK. If there's anything that you care about where the program state will be unpredictable from this point on if it's not true, CHECK it. It's not that expensive. Right? We have a lot of code where we push a string onto a vector, and that never gets flagged in code review. And it's probably like 10 times more expensive, if not 100 times more expensive, than adding a CHECK. The exception to that is if you're in a really hot loop where you don't want to dereference a pointer, then a CHECK might add some cost. And the other is if the condition that you're trying to validate is really expensive. It's not the CHECK itself that's expensive. It's the thing you're evaluating. And if that's expensive, then you might not afford doing a CHECK. If you don't know that it's expensive, it's probably not expensive.

06:20 SHARON: Can you give us an example of something expensive to evaluate for a CHECK?

06:24 PETER: Right. So say that you have something in video code that for every video frame, for every pixel validates the alpha value is opaque, or something. That would probably make video conferencing a little bit worse performance. Another thing would just be if you have to traverse a graph on every frame, and it will sort of jump all over memory to see if some reachability problem in your graph is true, that's going to be a lot more expensive. But CHECKing that index is less than some vector bounds, I think that should fall under cheap. And -

07:02 SHARON: OK.

07:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think it's just hurting us.

07:09 SHARON: OK. So since most places we should use CHECKs, are there any places where a DCHECK would be better then? Or any time you would have normally previously used a DCHECK, you should just make that a CHECK?

07:23 PETER: So we have a new construct that's called EXPENSIVE_DCHECKs, or if EXPENSIVE_DCHECKS_ARE_ON, I think we should add a corresponding macro for EXPENSIVE_DCHECK. And then you should be able to just say, either it's expensive and has to be a DCHECK, so use EXPENSIVE_DCHECK; otherwise, use CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably serve us better as CHECKs. But your code owner and reviewer might disagree with that. And it's not yet documented policy that we say CHECKs are cheap; just add a billion of them. But I would like to get there eventually.

08:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this should be a DCHECK, the person writing the CL can point them to this video, and then they can discuss from there.

08:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I can get further along this and say we make policy that CHECKs are cheap, so they are preferable. So a lot of foot-shooters with DCHECKs is that you expect this property to hold true, but you never effectively CHECK it. And that can lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that some origin for some frame makes some assumptions of site iso - I don't know site isolation well enough to say this. But basically, if you're DCHECKing that the code that you're running runs under some sort of permissions, then that is effectively unchecked in stable, right? And we do care about those properties, and it would be really good if we crashed rather than leaked information between sites.

09:12 SHARON: Right.

09:14 PETER: Yeah.

09:16 SHARON: So that seems like a good tie-in for the fact that within some security people, they don't have the most positive impression of DCHECKs, shall we say? So a couple examples of this, for listeners who maybe aren't familiar with this, is one person previously on security saying DCHECKs are pronounced as "code that's not tested". Someone else I told about this episode - I said, we're going to talk about DCHECKs - they immediately said, is it going to be about why DCHECKs are bad? So amongst the Chrome security folks, they are not a huge fan of DCHECKs. Can you tell us maybe why that is?

09:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs were only built for developers if they do a debug build. And Chrome has gotten so big that you don't want to do a debug build or the UI is incredibly slow. Unfortunately, it's sort of not that great an experience to work in a debug build. So people work in a release build. That doesn't mean that they don't care about the things they put under DCHECK. It just means they want to go on with their lives and not wait x minutes for the browser to launch, or however bad it is nowadays. And that means that they, unfortunately, lose coverage for the DCHECKs. So this means that if your code is not exercised well under tests, then this is completely not enforced. But it's slightly better than a comment, in that you're really expecting this thing to hold true, and that's clearly an expectation. But how good is the expectation if you don't look at it? So last year, I believe, we made it so that DCHECKs are on by default if you're not doing an official build. And this included release builds. So now, it's like at least if you're doing development and you hit this condition, it's going to explode, which is really good, because then you can find a lot of issues, and we can prevent a lot of issues from ever happening in the first place. It is really hard for you, as a developer, to make the assumption that if this invariant is ever false, I will find it during development, and it will never happen in the wild. And DCHECKs are essentially either, I will find this locally before I submit it, or all bets are off; or it is I don't care that much if this thing doesn't hold true, which is sort of a weird assertion to make. So I think we're in this little awkward in-between state. And this in-between state, remember, mostly exists as a performance optimization from when CHECKs used to be a lot more expensive, in terms of code size. So did I cover most of this?

12:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why people who are more concerned about security are not a fan of this.

12:13 PETER: I mean, if you care about it, especially if it causes privacy or security or user-harm sort of things, just CHECK. Just CHECK, right? If it makes your code animate a thing slightly weirder, like it will just jump to the end position instead of going through your fancy little... whatever. Maybe you can make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that bad. But most of the cases, you DCHECK something, where it's like the program is going to be in some indeterminate state, and we actually care about if it's ever false. So maybe we can afford to make it a CHECK. Maybe we should look more about our sort of vector pushbacks than we should look at our CHECKs, and then just have more CHECKs. More CHECKs. Because it's also like when things break, it's a lot cheaper to debug a DCHECK than your program is in some indeterminate state, because it was allowed to pass through a DCHECK that you thought was - and when you read the code, unless you're used to reading it as DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to figure out why the thing was doing the wrong thing in the first place.

13:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out.

13:27 PETER: I like that. I like that. And you might get pushback by reviewers, who aren't on my side of the fence yet. And then you can decide on which hill you want to die on, at least until we've made policy to just not complain about DCHECKs, or not complain about CHECKs.

13:45 SHARON: All right. That sounds good. So you mentioned stuff failing in the wild. And for people who might not know, do you want to just briefly explain what failing in the wild means?

13:54 PETER: OK. So there's two things. Just failing in the wild just means that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a CHECK that will crash and generate a crash report as if you had a memory bug, but it crashes in a deterministic way, at a deterministic spot - so you can find out exactly what assumption was violated. Say that this should never be called with a null pointer. Then you can say, look at this line where it crashed. It clearly got hit with a null pointer. And then you can try to figure out, from the stack, why that happened, rather than after you post this pointer to a task, it crashes somewhere completely irrelevant from the actual call site. Well, so in the wild specifically means it generates a crash report so you can look at it, or in the wild means it crashes at a user computer rather than - in the wildness outside of development. And as for the other part of in the wild, it's that we have started running non-crashy DCHECKs for a percentage of Windows Canary. And we're looking to expand that. And we're gathering information, basically, about which assertions or invariants that we have are violated in practice in the wild, even though we don't think that they should be. And that will sort of also culturally move the needle so that we do care about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care about CHECKs, is it really that important to make the big distinction between the two? Except for the case where you have really expensive DCHECKs, they might still be worth keeping separate. And those will be things like, if you do things for - say that you zero out memory or something for every memory block that you allocate and free, or you do things for every audio sample, or for every video frame pixel, those sort of things. And then we can sort of keep expensive stuff gated out from CHECKs. And then maybe we don't need this in-between where people don't know whether they can trust a DCHECK or not.

16:04 SHARON: So you mentioned that certain release builds now have DCHECKs enabled. So for those in the wild versus regular CHECKs in the wild, if those happen to fail, do the reports for those look the same? Are they in the same place? Can they be treated the same?

16:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing. They show up under a special branch. And you likely will get bugs filed to you if they hit very frequently, just like you would with crashes. There's a sort of slight difference, in that they say DumpWithoutCrashing. And that's just sort of a rollout strategy for us. Because if we made DCHECK builds incredibly crashy, because they hit more than CHECKs, then we can never roll this thing out. Or it gets a lot scarier for us to put this on 5% of a new platform that we haven't tested. But as it is right now, the first DCHECK that gets hit for every process gets a crash dump uploaded.

17:07 SHARON: OK. So I've been definitely told to use DumpWithoutCrashing at certain points in CLs, where it's like, OK, we think that this shouldn't happen. But if it does, we don't necessarily want to crash the browser because of it. With the changes you've mentioned to DCHECKs happening, should those just be CHECKs instead now or should those still be DumpWithoutCrashing?

17:29 PETER: So if you want DumpWithoutCrashing, and you made those a DCHECK, then you would only have coverage in the Canary channels that we are testing. Right? So if you want to get dump reports from the platforms that we're not currently testing, including all the way up to Stable, you probably still want to keep that a DumpWithoutCrashing. You want to make sure that you're not using the sort of - you want to make sure that you triage these, because you don't want to keep these generating crash dumps forever. You should still treat them as if they were crashes. And I think the same thing should hold true for DCHECKs. You should only add them for an invariant that you care about being violated, right? So as it is violated, you should either figure out why your invariant was wrong, or you should try to fix the breakage. And you can probably add more information to logging to figure out why that happened.

18:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a stack trace. And that's what you have to work on to figure out what went wrong for debugging. Right? So what are some things that you can do, as a developer, to make these CHECKs a bit more useful for you - ways to incorporate other information that you can use to help yourself debug?

19:01 PETER: So some of the stuff that we have is we have something called crash keys, which are essentially, you can write a piece of string data, essentially - there's probably some other data types - and if you write those before you're running DumpWithoutCrashing, or before you hit a CHECK, or before you hit a DCHECK, then those will be uploaded along the crash dump. And if you talk to someone who knows where to find them, you can basically go in under a crash report, and then under Fields, Product data, or something like that, you should be able to find your key-value pair. And if you have information in there, you'll be able to look at it. The other thing that I like to do, which is probably the more obvious thing, is if you have somewhat of a hypothesis that this thing should only fail if a or b or c is not true, then you can add CHECKs for those. Like, if a CHECK is failing, you can add more CHECKs to see why the CHECK was failing. In general, you're not going to get as much out of a mini-dump that you want. You're not going to have the full heap available to you, because that would be a mega-dump. You can usually find whatever is on the stack if you go in with a debugger. And I know that you wanted to lead me into talking about CHECK_GT and CHECK_EQ, which are essentially, if you want to check that x is greater than y, then you should use CHECK_GT(x,y). The problem with those, in this sort of context, is that, similarly to CHECKs - so CHECK_GT gets compiled into, basically, if not x is greater than y, crash. So unfortunately, the values of x and y are optimized out when you're doing an official build.

21:02 SHARON: So this makes me think of some stuff we mentioned in the last episode, which was with Dana. Check it out if you haven't. But one of the types we mentioned there was SafeRef, which enforces a certain condition. And if that fails - so in the case of a SafeRef, it ensures that the value you have there is not null. And if that's ever not true, then you do get a crash similar to if a CHECK fails. So in general, would you say it's better practice to enforce and make sure your assumptions are held in these other, more structural ways than relying on CHECKs instead?

21:41 PETER: So let me see if I can get at what you actually want out of that one. So if we look at - there's a RawRef type, right? So what's good with the RawRef is that you have a type that annotates that this thing cannot possibly be null. So if you assign to it, and you're assigning a null pointer, your program is going to crash, and you don't need to think about whether you throw a null pointer in or not. If you keep passing a RawRef around, then that's essentially you passing around a non-null pointer. And therefore, you don't have to check that it's not nullptr in every step of the way. You only need to do it when you're - I mean, the type will do it for you, but it only needs to happen when you're converting from a pointer to a ref, essentially, or a RawRef. And what's so good about that is now you have the - previously, you might just CHECK that this isn't called with nullptr or whatever. But then you would do that for four or five arguments. And you'd be like, null pointer CHECKs are this part of the function body. And then it just gets super-noisy. But if you're using the RawRef types, then the semantics of the type will enforce that for you. And you don't have to think about that when reading the code, because usually when you read the code, you're going to be like, it's a pointer. Can it be null or not? What does it point to? And this thing will at least tell you, it can't be null. And you still have the question of, what does it point to? And that's fine. So I like enforcing this through types more than checking those assumptions, and then checking inside of what happens. If you were assigned to this RawRef, then it's going to crash in the constructor if you have a null pointer. And then based on that stack trace, if we have good stack data, you're going to know at what line you created the RawRef. And therefore, it's equivalent to checking for not null pointer, because you can trust the type to do the checking. And since I know Dana made this, I can probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do have a couple of other places where you have a WeakPtr that shouldn't be dereferenced on the wrong sequence. And those are complicated words. And that, unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if that CHECK is actually expensive, or if it should be a CHECK, or if it could be a CHECK. I think, especially, if you're in core types, the size overhead of adding a CHECK is negligible, because all of the users of it benefit from that CHECK. So unless it's incredibly -

24:28 SHARON: What do you mean by core types?

24:30 PETER: Say that you make a scoped_refptr or something, that ref pointer is used everywhere. So if you CHECKed in the destructor, then you're validating all of the clients of your scoped_refptr. So for one CHECK, you get the price of a lot of CHECKing. Whereas if in your client code you're validating some parameters of an API call that only gets called once, then that's one CHECK you add for one case. But if you're re-use, then your CHECK gets a lot more value. And it's also easier to get parameters wrong sometimes if you have 500 clients that are calling your API. You can't trust all of them to get it right. Whereas if you're just developing your feature, and it's only used by your feature, then you can be a little bit more certain with how it's being called. I would say, still add CHECKs, because code evolves over time. It's sort of like how you can add unit tests to make sure that no one breaks your code in the future. If you add CHECKs, then no one can break your code in the future.

25:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is the long-term goal and plan for CHECKs and DCHECKs.

25:53 PETER: So currently what's in the work is we've made sure that some libraries that we use, like Abseil and WebRTC, which is a first-party third-party library, that they both use Chrome's crashing report system, which means that you get more predictable crash stacks because it's using the IMMEDIATE_CRASH macro. But also, you get the fatal logging field that I talked about. That gets logged as part of crash dumps. So you hopefully have more glanceable, actionable crash reports whenever a CHECK is violated inside of Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know that there's some issues with getting that fatal log field in the GPU process, and I'm working on fixing that as well. So hopefully, it just means more reports for the things you care about and easier to action on reports. That's what we're hoping.

27:03 SHARON: If people think that this sounds really cool, want to have some more involvement, or want to ask more questions, what's a good place for them to do that?

27:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack, the #base channel on Slack, the #halp channel on Slack is really good. #halp is really, I think, unintimidating. You can just throw whatever question you have in there, and I happen to be around there. If you can find out what my last name is through sheer force of will, you can send me an email to my Chromium username. What else would we have? I think if they want to get involved, just add CHECKs to your code. That's a really good way to do it. Just make sure that your code does what you expect it to in more cases.

27:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by cleanup, you can turn some DCHECKs into CHECKs also?

27:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise, you can just try to hope for us making that policy that we use CHECKs - if it's something we care about, we use a CHECK instead of a DCHECK, unless we have a really good reason to use a DCHECK. And that would be performance.

28:15 SHARON: That sounds good. And one last question is, what do you want people to take away as their main takeaway from this discussion?

28:26 PETER: I think validating code assumptions is really valuable. So you think that you're pretty smart when you're writing something, or you remember - I mean, you're sometimes kind of smart when you're writing something. And you're like, this can't possibly be wrong. And in practice, looking at crash reports, these things are wrong all the time. So please validate any assumptions that you make. It's also, I would say, better than a comment, because it's a comment that doesn't get outdated without you noticing it. So, I think, validate your assumptions to make sure that your code is more robust. And validate properties you care about. And don't be afraid to use CHECKs.

29:13 SHARON: All right. That sounds like a good summary. Thank you very much for being here, Peter. It was great to learn about DCHECKs.

29:18 PETER: Yeah. Thanks for having me.

29:24 SHARON: Action. Hello.

29:26 PETER: Oh. Take four.

29:29 SHARON: [LAUGHS] Take four. And action.