Side-channel attack in pyc program reversing
A simple example
Recently, I am reversing a highly obfuscated pyc program. There are a lot of junk codes in the program, restoring the logic of all the functions is very time consuming and laborious for me. But I got a interesting idea when I reviewing the function call. Here is a simple example to describe my idea.
import random
import base64
def checkstr(_str,encrypt_str):
random.seed(233)
encodestr = ""
for i in _str:
tmp = chr(ord(i) ^ random.randint(0,255))
encodestr += tmp
try:
if(encrypt_str == base64.b64encode(str(encodestr.encode('hex'))).encode('hex')):
return 1
except:
return 0
def checklen(_str):
if len(_str) <=0 :
return 0
else:
return 1
def test():
encrypt_str = "5a47466a4e575931596a673d" #test
print 'Input your string'
_str = raw_input()
if checklen(_str):
if checkstr(_str,encrypt_str):
print 'ok'
return 1
else:
print 'error'
return 0
else:
print 'len error'
test()
Next, I will use some code obfuscation measures to compile this code into a pyc
file that is difficult to decompile.
First of all, we need to know the general function of this program in advance, for example, the above example is an authentication program.
So we can know that the program finally has an instruction to determine whether the strings are equal. (Or other situations, such as directly comparing in the middleware, database etc. But we can easily find them in logs)
if func1(your_input) == func2(encrypted_password)
Now, we compile a magically modified python and output the left and right expressions in the comparison process to stdout.
The code corresponding to the comparison string function is in Objects/stringobject.c
We change the code like this
string_richcompare(PyStringObject *a, PyStringObject *b, int op)
{
...
if (op == Py_EQ) {
/* Supporting Py_NE here as well does not save
much time, since Py_NE is rarely used. */
printf("left string : %s\n",a->ob_sval);
printf("right string: %s\n",b->ob_sval);
if (Py_SIZE(a) == Py_SIZE(b)
&& (a->ob_sval[0] == b->ob_sval[0]
&& memcmp(a->ob_sval, b->ob_sval, Py_SIZE(a)) == 0)) {
result = Py_True;
} else {
result = Py_False;
}
goto out;
}
...
}
Compiling it , and we will get a python that can output the comparison strings.
We tried to use this python to run the example obfuscated file.
Now we can see there is a static string 5a47466a4e575931596a673d
.
if func1(your_input) == func2(encrypted_password)
But we don't know func1
and func2
.
We can use marshal
to have simply look at all the functions called.
>>> code.co_names
('sys', 'zlib', 'base64', 'marshal', '_getframe', 'f_code', 'yield finally', 'co_code', 'continue as', 'len', '^ + dict', 'from --', 'elif &&', 'as as assert', 'range', '/ with', 'chr', 'ord', 'loads', 'decompress', 'b64decode', '&& isdecoded with', 'True')
And using strace
command to see all the libraries that are called after the user input.
strace python aaa.pyc 2>&1 | grep "python2.7.*pyc" | grep -v "No such"
The final step is hex_codec.py
, and we inject some codes to output the useful information.
We can see hex_encode
is called twice.
There is a base64 string , and we can change the function of base64 too.
Now, we can easily crack out the password.
Because the program is simple, so we can guess the logic trough some different input.
We know the final encrypt string is 5a47466a4e575931596a673d
after hex_decode
=> ZGFjNWY1Yjg=
after b64decode
=> dac5f5b8
after hex_decode
=> \xda\xc5\xf5\xb8
We don't need to know the code logic before , just brute force it by bytes .
Others
I think that this kind of side channel attack can attack the opcode program formed by many kinds of interpreted languages after code obfuscation in the case of black box, it is suitable for programs that are inconvenient to debug or disable debugging .
发表评论