Archive for the ‘shell-codes’ Category

# faked Adobe PDF.SWF exploit on milw0rm

on July-23, milw0rm uploaded “Adobe Flash (Embedded in PDF) LIVE VIRUS/MALWARE Exploit” written by @hdmoore who states that it’s (I quote) “live exploit sample for the new Flash bug (embedded in PDF)“, which is far from the truth.

the truth is – it’s the old getIcon exploit having nothing to do with the new vulnerability in ActiveScript Virtual Machine. the real worms (described here ) uses PDF with two embedded SWF files, one – triggers the bug, another performs heap-spraying and generates the shell-code on the fly! yeah! it uses Active Script byte-code (which is not plain text like JavaScript, it’s more like Java byte-code) to generate the shell-code, so there is no unescape strings, so my shell-detector fails to find it (of course it fails, it does not support Active Script byte code, at least not yet).

I will write about the real SWF exploit tomorrow. today we’re going talk about that faked exploit. it’s pretty interested as well. the first thing we have to do is to decompress all streams. it’s easy. zlib support that format, we just should write PDF parser… should we?! oh, not really!!!

according to RFC-1950 a zlib stream has the following structure: CMF_FLG (more–>). so, we can just look for CMF_FLG header, trying to decompress every stream we meet – very useful universal decompressor, supporting now only PDF, but much more (HTTP streams for example).

FLG filed has 4 bits FCHECK checksum and the header itself is quite predicable, so it’s easy to find a potential ZLIB header inside a byte stream. how to defeat false positives? (2byte header is too short to be reliable enough). well, no problem guys! if we found something looks like CMF_FLG just try to unpack the first 512 bytes by zlib inflate() function. if it fails it means – false positive, otherwise we have to call it again to unpack the rest.

ok, all streams of hereEvil.pdf are decompressed. 15th stream is JavaScript with a large Array contains unescaped string. looks like a shell-code, but hell no! decode it with a simple deURI converter and… ops!!! another JavaScript!!! yes!!! exploit inside exploit, nested obfuscation. could you believe me?! I just improved my shell-code locator, adding recursive filtering support (zlib-decompror and unescape decoder – basically are external filters for the locator engine). I have not released the new version yet, just was testing in and… wow!!! I met the exploit that really uses the nested JavaScripts for better obfuscation! well, just in time, just in time…

NOTE: if you have no idea how to write deURI decoder, download ECMA-262.pdf (ECMAScript Language Specification) and go to section “B.2.2 unescape (string)“. there you will find unescape decoder, written in pseodo-code.

the second (underlined) layer is not interested. it’s just Array with uneascape string contains the real shell-code includes well-known ["doc"]["Collab"]["getIcon"]. do they look familiar?! of course they do!!! it’s the old getIcon exploit, just more obfuscated.

now, about the shell-code. it’s very simple, don’t even encrypted. this is what my shell-code locator said:

XOR key: 00 00 00 00 (00000000h)

ok, open the file with HIEW, go to 19h offset and see:

00000019: mov eax,[eax][0C]
0000001C: mov esi,[eax][1C]
0000001F: lodsd
00000020: mov eax,[eax][08]
00000023: jmps 00000002E —

yep, a typical KERNEL32 base address finder. what’s else?! the most interesting thing is — the shell-code has text strings. just look at them:

URLMON.DLL, URLDownloadToFileA, update.exe, crash.php,

wow!!! the domain name!!! I checked it and found out that is down, so I went to who is who service and… ops! surprise!!!

WHOIS information for
* Registration Service Provided By: DOMAIN NAMES REGISTRAR REG.RU LTD.
* Contact: +7.4955801111
* Domain Name: VIORFJOJ-2.COM

Private person
Dmitry Ostupin (
ul. Malaya Semenovskaya, d.5, kv. 28
g. Moskva
g. Moskva,107023
Tel. +7.4952240537

Creation Date: 08-Jul-2009
Expiration Date: 08-Jul-2010

Russian guy! that’s a deal! I have no idea whether he is the author of the exploit or maybe his server was used by another person, but I wonder… I wonder… going to give him a call tomorrow just out of curiosity.

well, maybe I should not public his contact info here because of etiquette, but… why not?! the exploit was taken from the public source, the hard-coded domain name was found, so… everyone can use the whois service to get this contact info.

well, what we’re going to do on ISP side? if you meet a packet from/to it means the host is infected and the packet should be blocked. well, since the server is down – obviously all major ISPs had blocked it already.

faked exploit on milw0rm - it has nothing to do with the real SWF security hole

faked exploit on milw0rm - it has nothing to do with the real SWF security hole


# IDA-Pro steals RIP — introduction in relative addressing

intraarterial injection: i was involved into a project on design a software-level protection, based on anti-dbg tricks that should work in 32- and 64-bit environment causing no conflict with legal apps. also, my shell-code locator has to learn how to recognize x86-64 exploits, so… I took a deep breath and dived into 64-bit word. well, I’m not newbit here, but digging up the anti-dbg tricks working everywhere sounds like a challenge. ok, anti-dbg tricks, shell-codes… good point to begin with.

kotal: x86 does not allow to address EIP register directly (PDP-11 does), but supports relative addressing in the flow control commands (“the” means “all”), for example: CALL L1 it’s a relative call. in the machine representation it looks like: E8 61 06 00 00, where E8h is opcode of CALL and 61 06 00 00 – a relative 32bit signed offset of the target, calculated from the _end_ of the CALL.

it’s very important for shell-codes, because it gives them ability to work being loaded at any offset. for protections it’s useful well. to prevent dumping – just allocate the memory on the heap and copy your procedure there. no dumper is able to create a workable PE image out of heap!

drawbacks: aside of benefits of relative addressing it has its own disadvantages. guess, what happens if we copy our function which calls the function we can’t copy (for example, API). the delta between CALL and the target will be changed, forcing us to recalculate all relative addresses, or… (turn your mind on) start to use absolute addressing, for example: mov eax, offset API_func/CALL eax;

home and dry: x86-64 does not allow to use RIP (former EIP) as a general purpose register (MOV RAX, RIP does not work), but it supports relative addressing almost everywhere (let me to quite the Intel manuals:”RIP-relative addressing allows specific ModR/M modes to address memory relative to
the 64-bit RIP using a signed 32-bit displacement. This provides an offset range of -/+2GB from the RIP
“). what it does mean?! for shell-code writers it means a lot!!! from now on we don’t need in GetPC subroutine (usually, CALL L1/L1:POP r32) and can use RIP directly. and this is the part where we meet the problem of the stolen RIP.

anaphylactic shock: please, consider the following code. this is how IDA-Pro 5.5 disassembles it. remember: it’s a piece of a real shell-code, so, concentrate your mind into fuming acid and do not miss the point (see the picture bellow as well):

.code:0000000000401000 start proc near
.code:0000000000401000 mov ecx, 69h
.code:0000000000401005 jmp short loc_40100C
.code:0000000000401007 loc_401007:
.code:0000000000401007 nop
.code:0000000000401008 xor [eax+ecx], cl
.code:000000000040100C loc_40100C:
.code:000000000040100C lea rax, loc_401013
.code:0000000000401013 loc_401013:
.code:0000000000401013 loop loc_401007
.code:0000000000401015 mov r9d, 0

how do you like it?! ok, let me to be more specific. how do you like the line: “lea rax, loc_401013″?! what?! did you say: “looks clear!” hello no!!! look closely!!! Option -> Text representation -> Number of opcode bytes -> 9. do you see _now_ what IDA-Pro hides from us?!

.code:000000000040100C 48 8D 05 00 00 00 00 lea rax, loc_401013
.code:0000000000401013 loc_401013:

oh, my unholy cow!!! “LEA RAX, loc_401013” turns out to be “LEA RAX, [RIP]“, thus we’re dealing with position-independent code. in a way, IDA-Pro is correct. she is just calculates RIP on the fly and replaces it by the effective offset. but, we – hackers – want to know if the code is position independent or not!!!

breakdown: HIEW also replaces RIP by effective offset. please consider the following line: 0040100C: 488D05000000001 LEA RAX, [000401013]

ok, do you want to get high? well, let’s do it, ppl!

00000000: 488D0500000000 lea rax,[7]
00000007: 488D0500000000 lea rax,[00000000E]
0000000E: 488D0500000000 lea rax,[000000015]

the same opcodes produce different targets, how funny!!! of course, it’s an opcode of LEA RAX, [RIP] command and I would like to have an option which enables/disables showing RIP, because I do need in very much!!!

updated: Igor Skochinsky pointed out (see his comment below) that IDA-Pro allows us to show RIP (Options -> Analysis options -> Processor specific analysis options -> Explicit RIP-addressing). ok, lets enable it and see what happens:

.code:000000000040100C loc_40100C: ; CODE XREF: start+5j
.code:000000000040100C lea rax, [rip+0]

well, say hello to “RIP”! it’s explicated now, but… the rest of the code is almost damaged and unvoyageable (means: inconvenient for navigation):

.code:000000000040101B lea r8, [rip+0FDEh] ; “x86-64 program!”
.code:0000000000401022 lea rdx, [rip+0FEEh] ; “hello world!”
.code:0000000000401029 mov rcx, 0 ; hWnd
.code:0000000000401030 call qword ptr [rip+2016h]
.code:0000000000401036 mov ecx, eax ; uExitCode

we see relative offsets like 0FDEh, 0FEEh, 2016h, etc. they’re red colored (means: IDA-Pro does not recognize these offsets) and if we move cursor to the constant – we can’t jump by ENTER and we need to calculate the target address manually. so, the problem is still unsolved.

in passing: look at the encoder again. don’t you think that it damages the loop?! ok, lets trace the code with any debugger or with our own mind if we have no 64-bit debugger under our hands.

“loop loc_401007″ has E2h F2h opcode. in binary representation F2h is “011110010″, so the lowest bit is zero, thus, when ECX = = 1, the target of loop will change from 401007h to 401008h (401007h ^ 1 = 401008h). as result – NOP will be skipped. of course, it might be INC EBX (opcode 43h) – in that case, EBX would be increased not by ECX (as it’s expected), but by (ECX – 1). how interesting…

well, when ECX = = 0, LOOP just does not pass the control to the target, so everything works fine.

updated: Sol_Ksacap (from pointed out that (let me to quote him): “the target of loop will indeed change, but there won’t be any loop – “loop” instruction first decreases RCX, and only then checks if it’s zero“. and he is definitely right. this post was written in hurry. sorry for the mistakes I made and big thanks all guys who pointed it out.

off the record: in normal shell-codes you probably meet something like LEA EAX, [RIP-1] (opcode: 8B05FFFFFFFF), since commands with the positive offsets have zeros in opcodes and shell-codes do not like zeros very much (because of ASCIIZ, where Zero is a string terminator).

updated on:
Wed, Juli-15: enable-RIP option in IDA-Pro, loop patching bugs;

an example of real 64bit shell-code with hidden RIP

an example of real 64bit shell-code with hidden RIP


# MS DirectShow MPEG2 (msvidctl.dll) worm was fired out!

Internet is under attack, the Chinese worm hits, disease is spreading fast, the number of infected machines is grown rapidly. it’s a new outbreak!

Let me let you in on a little inside info. McAfee has a solution, but due to the size of the company does not apply it. as a part of the former Endeavor Security Team I’m working on the shell-code locator. this is my own project here and some modules were integrated into Active Malware Protection commercial product, now renamed to NTR. and it works!!! it did catch the worm giving the green flag (means: 99% chances of invasion).

long before McAfee exposed the interest to it, the locator was demonstrated to NDS company (Jerusalem, Israel), Sec++ Group (Tel-Aviv, Israel), Sense-Post company (Pretoria, South Africa), Soft-Forum (Seoul, South Korea) and many other hackers, so, basically, it’s a collective solution. it’s not only about me and my ideas… yeah, it was my idea from the beginning, but it has been improved over the discussions, and of course it was discussed inside our company, big thanks to Alice Chang, Kun Luo, Zheng Bu, Yichong Lin, Vitaly Zaytsev and many others.

ok, lets take the shell-code locator, feed the worm to it and check out what the heuristic module says:

$shell-codes_locator_v3.exe stream.bin
+DETECTED FSTENV-based encoder @ 000002ABh
XOR key: B9 8E A9 13 (13A98EB9h)

in fast, there is an encoder and the rest of the worm is encrypted, but it does not help the worm to escape. my shell-code locator was designed to perform heuristic search inside encrypted steams, without decrypting them, without emulation and of course without brute forcing because we should care about resources and we just don’t have enough CPU power for virtualization of any kind.

well, lets load the worm into HIEW and see the encoder with your own eyes (the picture bellow). wow! indeed, the encoder is located at the same address, but… it uses another key. just look at the following code (taken from the encoder) “XOR D, [EBX+13], 0A98EB913” and compare the key with my shell-code locator outputs: 13A98EB9h.

is my locator wrong?! not at all! because A98EB913h and 13A98EB9h is the same key, just rotated by 8bit. since, XOR is a stream operation, no matter which byte is first and which is the second if the offset is given. if we apply A98EB913h at 318h offset – we get the same result.

it proofs that my shell-code locator does not look for “XOR” in order to extract the immediate DWORD, (the plain key). my shell-code locator does need in it at all. if the encoder is missed or not recognized – never mind! isn’t impressive?

however, at this moment URI decoder (Chinese worm keeps the shell-code inside an unescaped string) is still under construction (pre-pre alpha stage of development), so the worm was caught by the internal version of the shell-code locator, but it inspires me to continue working on it.

FSTENV-based encoder recognized by my shell-code locator

FSTENV-based encoder recognized by my shell-code locator


# shell-codes analysis: where is EP?

as a reverser, working for a security company, I used to analyze a lot of shell-codes every days. most of them were caught by AMP and stored in .pcap format with packet header(s) and other stuff like this that has nothing to do with actual machine code. a typical shell-code looks like this:

typical shell-code

typical shell-code

where is our Entry Point? how to find it fast? a common approach – just to start disassemble the code from different locations until we get the most sentence listing, but sometimes shell-code starts from an encrypted block followed by the decryptor and we just waste our time.

by accident I came upon the better way. just look for “EBh” (jmp short) opcode. it works well in most cases, brining us very close to real entry point. maybe you will get a few false positives, but not more.

it works coz almost every shell-code has “jmp short” command near to the entry point, so looking for EBh opcode is a good approach to find the entry point very fast! don’t ask me, “why EBh, man? why EBh works better than anyone elese?!” I don’t know. I just know this works. theoretically, CALL and LOOP should work as well, but… they don’t. EBh works much better helping me to reverse shell-code faster :-) it takes about 3 seconds to find Entry Point with HIEW in average.

in our case, EBh has been found at 544h offset. of course, this is _not_ the read entry point, but something very close to it. EBX is not used (it’s reinitialized by following commands), so just skip 53Ch line. two second lines opens stack frame to allocate some memory for local variables, thus 53Dh _probably_ is the real entry point. “probably” coz we can’t be sure until analyze the rest of shell-code, but in this case our suggestion is absolute correct and finding the entry point took just a second.

Entry Point has been found

Entry Point has been found