You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a way to discover all function symbols in a target executable from a source executable.
Possible directions are:
Mach-O to x86 Release.
x86 Release to x86 Debug.
Step 1
Discover addresses to functions in the target executable(s). This simply gives a long list of function addresses without knowing what these functions are.
Potential approach 1
Crawl over assembly instructions beginning from the entry point and enter all calls and jumps once to try discover all function addresses.
Potential approach 2
Crawl over all assembly instructions from top to bottom and detect function headers and footers of functions that have them.
Potential approach 3
Export a list of function addresses from a disassembler tool if it can generate it already.
Check quality
To verify the quality of the address list, compare it with the addresses known from the disassembler tool.
Step 2
Discover names and properties to function addresses in the target executable(s). This gives a long list of function properties for the known function addresses.
Possible approach 1
Process the assembler instructions of the source and target executables and break them down into simplified instructions that can be matched across the executables. Pick a function of the source executable and try to match it with all functions of the target executable. Give scores for the match quality and let the best score win. Search locality and score optimizations can apply if nearby function addresses are already known.
Simplified instructions can contain, among others:
reading a global X or offset N
jumping short
calling a global X or offset N
returning a value of size N
Ordered combinations of simplified universal instructions should give a function a somewhat unique print. The more complexity the function has, the better the match quality can be. Simplified instructions may not compare optimal in case compilers use different strategies to generate code, such as unrolling loops or wild jumps. In this case the matching strategy needs to be smart and lenient. In theory, even a bad matching score could still present the right winner.
Possible approach 2
Train an Ai to understand how a function body from a source executable matches to a target executable. To do this, we would need to compile programs with relevant compilers, crawl through their symbols, and train the AI to recognize how the function instructions of the source executable translate to the target executable. After the training is complete, the Ai Model can be used to match all functions from the target executable.
Check quality
To verify the quality of the matching, check for collisions. If too many matches are colliding, then it is not right.
Also, compare it with the addresses and symbols from the disassembler that have been mapped by hand already.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We need a way to discover all function symbols in a target executable from a source executable.
Possible directions are:
Mach-O to x86 Release.
x86 Release to x86 Debug.
Step 1
Discover addresses to functions in the target executable(s). This simply gives a long list of function addresses without knowing what these functions are.
Potential approach 1
Crawl over assembly instructions beginning from the entry point and enter all calls and jumps once to try discover all function addresses.
Potential approach 2
Crawl over all assembly instructions from top to bottom and detect function headers and footers of functions that have them.
Potential approach 3
Export a list of function addresses from a disassembler tool if it can generate it already.
Check quality
To verify the quality of the address list, compare it with the addresses known from the disassembler tool.
Step 2
Discover names and properties to function addresses in the target executable(s). This gives a long list of function properties for the known function addresses.
Possible approach 1
Process the assembler instructions of the source and target executables and break them down into simplified instructions that can be matched across the executables. Pick a function of the source executable and try to match it with all functions of the target executable. Give scores for the match quality and let the best score win. Search locality and score optimizations can apply if nearby function addresses are already known.
Simplified instructions can contain, among others:
Ordered combinations of simplified universal instructions should give a function a somewhat unique print. The more complexity the function has, the better the match quality can be. Simplified instructions may not compare optimal in case compilers use different strategies to generate code, such as unrolling loops or wild jumps. In this case the matching strategy needs to be smart and lenient. In theory, even a bad matching score could still present the right winner.
Possible approach 2
Train an Ai to understand how a function body from a source executable matches to a target executable. To do this, we would need to compile programs with relevant compilers, crawl through their symbols, and train the AI to recognize how the function instructions of the source executable translate to the target executable. After the training is complete, the Ai Model can be used to match all functions from the target executable.
Check quality
To verify the quality of the matching, check for collisions. If too many matches are colliding, then it is not right.
Also, compare it with the addresses and symbols from the disassembler that have been mapped by hand already.
Beta Was this translation helpful? Give feedback.
All reactions