Replies: 2 comments 1 reply
-
Using Regarding the
This method might also work for getting the names of enum values, but I haven't tested that. Alternatively, you could try comparing the output of If that doesn't work, there are many more commands in All in all, there are some hacks / workarounds to get some more information on the decompiler in its current state. As mentioned before, there are some ambiguity issues with the output of Another interesting conclusion is that there already is code in the decompiler to print a lot more information that just the final C code. Exposing these functions to the Java-side seems like significantly less development effort, while still providing a benefit by allowing Ghidra scripts to make more meaningful improvements to the decompiler output. |
Beta Was this translation helpful? Give feedback.
-
Is there a significant difference in the amount of available information between ...I should probably learn more about the decompiler capabilities. Unfortunately most of the Fun fact: I wrote a ClangTokenGroup parser that can convert ClangTokenGroup expressions back into AST:
Nice thing about this is that this representation knows about HighVariables, in contrast to high pcode which is still operating on varnodes. I plan to play with this a bit, and see if there are useful analyses that can be done on AST but can't be efficiently expressed in PCode. If yes, I'll clean this up and opensource (but it's in a rough state right now). |
Beta Was this translation helpful? Give feedback.
-
I have been tinkering with a plugin to rewrite the decompiler output similar to https://github.com/patois/abyss
One of the main downsides with this is that the Ghidra only really has three intermediate representations:
Graph AST Control Flow
Action)e.g. the decompiled code:
has the ClangToken tree:
usually this tree even contains the whitespace tokens, my graphing code removes those for clarity already though.
The corresponding "high-level" P-Code via the
Graph AST Control Flow
action isand as the string representations:
the lifted P-Code is:
The problem is that the ClangToken tree is not really suitable for any kind of further analysis, but also contains significantly more information than the high level P-Code. For example the C Statement
HVar1 = HAL_OK
whereHVar1
is defined as the typeHAL_StatusTypeDef
earlieris a
ClangStatement
of 3 children:ClangVariableToken
,ClangOpToken
,ClangVariableToken
. The only way to know that this is an assignment is to check the text of theClangOpToken
. The information that this is an assignment isn't present in the type of the ClangTokenGroup itself. In theory I could also check the underlying pcodeOp, but at that point I am basically starting to invent my own Intermediate Representation on top of a mix of the ClangToken tree and high level P-Code.The high level P-Code isn't suitable on it's own, because for this statement it is simply
(register, 0x20, 1) COPY (const, 0x0, 1)
which has no information about theHighVariable
hVar1
that is written to, and the enum that gives the meaningful nameHAL_OK
instead of the constant1
.So my underlying question here is: Are there any plans for a new Intermediate Representation? How many other people in the community ran into this limitation? I know at least one person who tried implementing some vulnerability scanning tool on top of Ghidra and gave up because the provided abstractions were unsuitable compared to e.g. the Binary Ninja Intermediate Representations.
I have been tinkering with a plugin for dynamic rewrite rules on the JVM side and most interesting rules are hindered by the fact that I basically need to reinvent a C-Parser.
One possibility might be to extend the underlying decompiler to emit an XML tree that has more information. The decompiler obviously knows that it is producing e.g. an assignment or an
if {} else {}
block, but this is lost when the syntax tree is produced. This could be done as a new "language" in the decompiler, but the currentprintc.cc
file is 3000 lines, and a newprintcIR.cc
file might be a similar amount of new code that would need to be written.Beta Was this translation helpful? Give feedback.
All reactions