Friday, July 5, 2013

An Introduction to SWF Obfuscation

I keep running into people who need a good explanation of obfuscation. Since it's a pretty difficult subject to go into any detail in, I thought I should do a post about it so I don't have to spend half an hour explaining it to people every time someone wants to learn more about it.

So, what is obfuscation in general?
Obfuscation is the art of making code, compiled or otherwise, unreadable. In programming, it's generally used to make code harder to decompile, disassemble and reverse-engineer.

There are two main forms of obfuscation used on SWFs. Name obfuscation, and Bytecode obfuscation. Normally, the two are used together, and SWFs that only use one are quite rare.

Name obfuscation is simple. Take the names of every class, variable and function and change it to some randomly generated junk.
For example, if you had a class named "player", it might be renamed to "d+.{8]R0%9r".
This makes it difficult to easily identify what classes/functions/properties do what.
Name obfuscation is impossible to reverse. The best you can do is change it to something like class1, class2, class3... function1, function2, function3... ect. Or alternatively, you can reverse engineer the classes and manually re-name them as you figure out what they do, but that is incredibly time consuming, impossible to automate and you still won't end up "reversing" the obfuscation as such, just giving useful labels to the obfuscated classes.

Two lines of  AS3 with compiled AVM2 Actionscript Bytecode
Bytecode obfuscation is a much more complicated subject.
There are many forms of bytecode obfuscation. Some add junk code, some add extra code branches, some restructure the code. There's a crapload of different ways to do bytecode obfuscation.
In order to understand bytecode obfuscation, you have to understand the difference between AS3 and AVM2 Actionscript bytecode. A SWF does not contain any AS3 code in it, but rather a compiled lower-level language. Comparing the code in a SWF to AS3 is like comparing Assembly to C++ (google them if you want).
This difference in languages means that you could in theory have AVM2 Bytecode that has no equivalent in AS3. Bytecode obfuscators use principals like this to their advantage.
They can crash decompilers by adding invalid code to the SWF that in practice, will never be run, and the decompiler will be unable to decompile the code, since the obfuscated AVM2 code in the SWF has no AS equivalent.
Over the years, bytecode obfuscation has gotten more and more advanced. Back in the day, you could remove it with a hex editor if you knew what to search and replace for. Nowadays, it takes complex programs purpose-written to take out specific obfuscation algorithms to remove such defenses.

Apart from completely thwarting the use of decompilers, bytecode obfuscation has another use. By adding lots of junk bytecode and restructuring the bytecode, it makes it much harder to reverse-engineer the disassembled code. This, coupled with name obfuscation can make it near impossible for most hackers to make hacks and reverse engineer the SWF in general. As you can see above, two lines of AS came out to be about 15 lines of AVM2 ABC  Bytecode. It is not uncommon for an obfuscated class to contain well over a thousand lines of such code, barely readable even without obfuscation.

However, most if not all bytecode obfuscation is theoretically removable. That being said, it's almost always impractically difficult to do so.

So, how do we actually deal with obfuscated SWFs, you ask?
The Free and open-source decompiler JPEXS FFDec has some very good deobfuscation routines in it. If that doesn't work:
My biggest piece of advice would be to look for useful unobfuscated strings. Name obfuscation does not obfuscate all strings. names and packages of Adobe classes tend to stay unobfuscated. So do events a lot of the time. you can always go through the strings constant pool (using Yogda or another bytecode editor). Finding unobfuscated strings can quite often allow you to figure out what's what, and a lot of the time it's pretty much the only option you have.
Another piece of advice is don't tackle obfuscated SWFs until you are very confidant with working in bytecode. It's not easy. Even the best hackers tend to dread dealing with obfuscated games.

I would also recommend using a variable scanner if you can, however there are not many (if any?) good AS3 var scanners (there's my shitty one, and AS3Watson), since they really only started being publicly released a year ago, and I don't think there's a single tutorial on the internet on AS3 var scanning. I'll probably make one some time, but not for a while.

I could have written more. Maybe I'll do another post, or a series of posts on obfuscation in the future. There's heaps to write about, and I only scratched the surface on most of the things I mentioned.

Hopefully this is readable and makes sense. I'm pretty exhausted right now.