1. Abstract

This assembly source code file, polytest.asm, serves as an exploratory study of polymorphic engines, focusing primarily on x86-64 assembly language. The code is intended for educational purposes and was written as part of an individual learning exercise. It offers an in-depth analysis of the components involved in a polymorphic engine: a random number generator, a junk code generator, and a decryptor generator. ##Components of the Engine

The code begins with an exploration of the core components of a polymorphic engine, adhering to principles outlined in referenced works. It identifies three key elements:

Random Number Generator: Generates random numbers to be used in other components.
Junk Code Generator: Adds 'junk' or 'garbage' code to the real code to obscure its actual functionality.
Decryptor Generator: Generates code that can decrypt the payload of the engine.

2. Instruction Encoding

A significant part of the discussion revolves around the intricacies of instruction encoding. It provides insights into how opcodes, modes, and operands are encoded into machine instructions, taking examples from the x86-64 architecture.

3. Encryption Primitives

The file also delves into the encryption primitives required before building a polymorphic engine. It outlines the basic assembly techniques for encryption and discusses four types of ciphers:

Substitution
Sliding Key
Long Key
Transposition

4. Permutations and Random Registers

The code then talks about permuting the decryptor and the usage of random registers. This is an important aspect as it gives each instance of the polymorphic code a different ‘signature’, making it harder to detect. Additional Insights

Additional notes on code obfuscation and optimization are provided. This includes notes on the generation of ‘garbage code’ and its potential drawbacks, as well as the efficacy of using simple XOR, ADD, and SUB ciphers.

The file serves as a comprehensive resource for understanding polymorphic engines and includes extensive code snippets and references for further reading.

;  <polytest.asm>   -   polytest source
;                         December 2020
;
;  ---------------------------- abstract ----------------------------
;  This file contains a discussion of polymorphic engines, written
;  as an individual learning exercise that may be shared with others
;  for educational purposes.
;  ------------------------------------------------------------------
;
;           I. Introduction - the Components of the Engine
;
;  Distilling the poly engine into three components according to [1]:
;  the first component of the engine is a random number generator. 
;  Easy enough. The second is a junk code generator. Slightly harder.
;  The last component is a decryptor generator. Enough said.
;
;  Focusing on the last of the three first - the decryptor generator -
;  According to [1] the algorithm is as follows:
;       1. Select random set of registers
;       2. Choose a compressed pre-coded decryptor 
;       3. Enter a loop where junk code is added to the real decryptor
;           (potentially uneccesary [10])
;
;               II. Instruction Encoding In-Depth
;
;  Before even adding junk code, how is the pre-coded decryptor made 
;  compatible with the random selection of registers? From what I 
;  understand from [1], it appears that the polymorphic vxer will 
;  take advantage of patterns in opcodes related to registers/addres-
;  sing modes. This is not a trivial detail but in my experience most 
;  other sources see to gloss overer it. I could be wrong. Regardless,
;  the basics are this:
;
;  Each basic instruction in (for example) x86 64-bit mode has
;  different bits set for different addressing modes. Take XOR
;  for example [5]:
;
;       hex      bin       mode operand 1   mode operand 2
;       0x30     110000    r/m8             r8
;       0x31     110001    r/m16/32/64      r16/32/64
;       0x32     110010    r8               r/m8
;       0x33     110011    r16/32/64        r/m16/32/64
;       0x34     110100    AL               imm8
;       0x35     110101    rAX              imm16/32       fig. 1
;   
;  The commonality between all six instructions is they begin with
;  the first two bits set and the third unset. What differs between
;  them is the last three bits, each apparently indicating a different
;  addressing mode.
;
;  In [1], the author mentions building a 'skeleton instruction table'.
;  Cursory internet searches don't turn up anything like this. To
;  start off with some quick wins in creating one, we can see that its 
;  readily apparent from [5] that ADD, OR, ADC, SBB, AND, SUB, XOR,
;  and CMP share close opcode values for each addressing mode, 
;  (off by 1). But to build a skeleton table, we need only pick out
;  the first examples of these. To make things easier, these groups of  
;  different opcodes for each operation are aligned nicely (0x00-0x08, 
;  0x1-=0x0d). Other opcodes aren't as neatly organized (e.g. PUSH, POP). 
;  Regardless, just by using [5] we can simply pick out the skeleton  
;  instructions we need:
;
;       hex      opcode
;       0x00     ADD    
;       0x08     OR
;       0x10     ADC
;       0x18     SBB
;       0x20     AND
;       0x28     SUB
;       0x30     XOR
;       0x38     CMP        fig. 2
;
;  But if we look at the encoding of real instructions from an assembl-
;  ed encryption loop, we can quickly see that theres much more going
;  on than the addressing mode variations between each instruction:
;
;       a) 48 C7 C3 9A 02 00    // mov rbx, 29Ah  
;       b) 48 2B CB             // sub rcx, rbx 
;
;  Why do both instructions begin with a 48? We can assume that C7 is
;  an opcode for MOV since 2B is an opcode for SUB. What does the
;  third byte do, and why does the assembler use four bytes to repres-
;  ent an integer that fits in a word?
;
;  It took some effort to track down useful resources and make sense
;  of all the different interpretation of the data describing these
;  encodings, but between [5, 12, 13, and 14], I think I was able to
;  understand it. If we take the following instruction as an example, 
;  we can decode the meanings of the bit positions [5] [12]:
;
;   -----------------------------------------------------------------------
;       hex            
;                       __REX.W  
;                      /   ____MOV r/m16/32/64
;                     /  /  ______________Mod-Reg-R/M Byte
;                    /  /  /  ,______,____________________QWORD immediate
;                a) 48 C7 C3 9A 02 00 
;                b) 48 2B CB
;                c) 89 05 B9 1F 00 00
;                   \   
;                    \
;                     \__MOV r/m16/32/64
;       instruction         
;                a) mov rbx, 29Ah              
;                b) sub rcx, rbx           
;                c) mov dword ptr ds:0x1fb9,eax               
;
;       binary                         
;                                      ______0x29a_____
;       a)                            /                \
;         01001000 11000111 11000011 [10011010 00000010] 00000000 000000000  
;         \_____/  \_____/       \/   \__________________________________/
;           |          |         |                     |    
;   rex prefix      MOV           register             quad word  
;               
;       b)         
;                   01001000         01010110        11001011
;                   \__/wrxb         \_____/           \/ \/
;            fixed___/  \_/            |               |   \             
;                        |         sub immediate       rcx   rbx
;           64-bit operand ('W' bit set) 
;
;       c) 
;        10001001 00000101 10111001 00011111 00000000 000000000
;                                                                    fig. 3
;   -----------------------------------------------------------------------
;
;   i. Note on Quick Visualization of Binary with Python
;
;       # One off
;       ... data = "48 C7 C3 9A 02 00 00"
;       ... bytes = bytearray.fromhex((data))
;       ... [print(bin(bytes[i]), end=" ") for i in range(len(bytes))]
;       ...
;       # As a function
;       ... def bin_from_bytes(data):
;       ...   bytes = bytearray.fromhex(data)
;       ...   [print(bin(bytes[i]), end=" ") for i in range(len(bytes))]
;   
;                   III. Encryption Primitives                        
;  
;  Before the poly engine comes the encryption primitives. One mistake
;  I seem to have made is in seeking out to understand the concepts in
;  section I. without understanding how to perform the basic encryption
;  operations in assembler. I now see that I need to have some mastery
;  over that subject to move forward.
;
;  In [6, 8-9], the author 'MidNyte' discusses the fundamentals of 
;  encryption/enciphering in the context of the virus or poly engine.
;  In Part I, four x86 assembly techniques (or as I like to think of
;  them, 'primitives') are presented. In Part II, they then present
;  four methods of 'armouring' the encryption. These articles seem
;  to have everything necessary to understand the basics.
;
;  The four ciphers presented in [6] are the following:
;       - substitution
;       - sliding key
;       - long key
;       - transposition
;   
;  In [8], the following armoring techniques are presented:
;       - variable length transposition
;       - boundary scrambling
;       - integrity-dependent decryption
;       - date-dependent decryption
;
;  While they're important, I'm going to avoid the latter four
;  techniques for the time being as I'm trying to focus on the
;  basics.
;
;       ii. A Note on Generating Garbage and Simple Ciphers
;
;  Garbage code generation is still code generation. At that, the
;  quality of the garbage matters [10]. It is even suggested that
;  it is more worthwhile to use stronger and more complicated 
;  encryption than to add junk code at all [10]. Since generating
;  garbage does not seem reward the time investment at the moment,
;  I will revisit this later.
;
;  One contesting idea regarding this however is the benefits of
;  the sheer simplicity of XOR, ADD, and SUB ciphers. For example,
;  in [18] one of the most advanced cyber attacks in history, a s-
;  liding XOR cipher was employed to decent success.
;
;                   IV. Permuting the Decryptor
;
;  [11] Provides a notation and description for a decryption algor-
;  ithm very similar to the ones provided by [6]. The following is
;  an adaptation of the described algorithm and the permutation ru-
;  les provided:
;
;   -----------------------------------------------------------------------
;     decrypt proc near
;       1)  mov length_register, length         ; get the code length       
;       2)  mov pointer_register, startcode     ; load pointer register     
;       3)  mov destination_register, startcode ; load the destination reg. 
;       4)  mov key_register, key               ; get the key               
;     main_loop
;       5)  mov code_register, pointer_register ; take an encrypted word    
;       6)  call unscramble                     ; decrypt it (*)            
;       7)  mov destination_register, code_reg. ; write the decrypted word  
;       8)  add key_register, key_increment     ; increment the key         
;       9)  dec length_register                 ; decrement length          
;       10) inc pointer_register                ; increment pointer (x2)    
;       11) jnz main_loop                       ; loop until length=0       
;       12) ret                                 ; return pc                 
;     decrypt endp                                                  fig. 4
;   -----------------------------------------------------------------------
;       1) permutable, can be placed anywhere
;       2) permutable, can be placed anywhere
;       3) permutable, can be placed anywhere
;       4) permutable, can be placed anywhere
;       5) not permutable
;       6) not permutable
;       7) permutable, can be placed anywhere after [6]
;       8) permutable, can be placed anywhere after [6]
;       9) permutable, can be placed anywhere after [6]
;       10) permutable, can be placed anywhere after [6]
;       11) not permutable
;       12) not permutable                                          fig. 5                    
;   -----------------------------------------------------------------------
;
;  Just as in [11], this general description of the algorithm with 
;  permutation rules can be used to "make a matrix of bytes". In
;  this case, since the algorithm is slightly different. We can n-
;  ow describe permutations of the same algorithm that are logica-
;  lly equivalent but different in signature, for example:
;
;   matrix a)
;       permutation: [4, 1, 2, 3, 5, 6, 8, 9, 7, 10, 11, 12]
;       place:       [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
;
;   matrix b)
;       permutation: [3, 1, 2, 4, 5, 6, 7, 10, 8, 9, 11, 12]
;       place:       [1, 2, 3, 4, 5, 6, 7, 8,  9, 10, 11, 12]
;
;                   V: Random Registers
; 
;  Now that we know what logical operations must be included and 
;  have an idea of how what bit-level characteristics each instru-
;  ction might have (fig 3.), the next fundamental characteristic
;  of the polymorphic engine to understand is the random selection
;  of registers mechanism.
;
;       [rax, rbx, rcx, rdx, r8, r9, r11, r12, r13, r14, r15]
;
;  To get a random register from this list, one need only to gene-
;  rate a random number in the range and use it as an index. This
;  is a good starting point for the discussion on randomness in
;  general. Fortunately for the poly-engine author, there are so-
;  me extremely simple algorithms that are 'good-enough' for now.
;  The following code is lifted straight from [16]:
;
;      __uint128_t g_lehmer64_state;
;      
;      uint64_t lehmer64() {
;        g_lehmer64_state *= 0xda942042e4dd58b5;
;        return g_lehmer64_state >> 64;
;      }
;
;  This could be implemented in asm as follows:
;   
;      get_lehmer:
;        mov     rax, rcx
;        mov     rcx, 0da942042e4dd58b5h
;        mul    rax, rcx
;        shr     rax, 64
;        ret
;
;  And after invocation, an index for a register can simply be
;  derivied by taking the generated random, shifting right 59 bits
;  (max value is now 0b11111 or 31 in base-10). From there, calc-
;  ulating the modulo 10. This should be sufficient for an index.
;  This is just my off-handed way of deriving a number withing a
;  range from another larger number. I'm sure there's other and 
;  better ways to do it. In assembly:
;
;       get_rand_reg:
;         shr   rax, 59
;         xor   rdx, rdx
;         mov   rcx, 10
;         div   rcx
;         mov   rax, rcx
;
;  Given the time elapsed between this writing and the introduct-
;  ion of the RDRAND instruction, its a good bet that it will be
;  available in most target environments, however it is a magic
;  black box that in my opinion is better left untouched. 
;  Instead, entropy an be collected in other ways. For future
;  reference, [17] is a good introduction to understanding and
;  calculating entropy.
;
;  The last bit to do is to go about shuffling the list of regi-
;  sters. The initial list can be described as follows:
;
;   [length_reg, source_reg, dest_reg, key_reg]
;
;  References:
;   [1] https://vx-underground.org/archive/VxHeaven/lib/vbb01.html
;   [2] https://vx-underground.org/archive/VxHeaven/lib/vda01.html
;   [3] https://paul.bone.id.au/blog/2018/09/05/x86-addressing/
;   [4] https://www.agner.org/optimize/instruction_tables.pdf
;   [5] http://ref.x86asm.net/coder64.html
;   [6] https://vx-underground.org/archive/VxHeaven/lib/vmn04.html
;   [7] https://harrisonwl.github.io/assets/courses/malware/spring2017/slides/FinalWeeks/EncryptedOligomorphic.pdf
;   [8] https://vx-underground.org/archive/VxHeaven/lib/vmn05.html
;   [9] https://vx-underground.org/archive/VxHeaven/lib/vmn06.html
;   [10] https://vx-underground.org/archive/VxHeaven/lib/vts01.html
;   [11] https://vx-underground.org/archive/VxHeaven/lib/vlj04.html
;   [12] https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html
;   [13] https://wiki.osdev.org/X86-64_Instruction_Encoding
;   [14] http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm
;   [15] https://github.com/vxunderground/MalwareSourceCode/blob/6919f569b56cdbf91fad247753571673b1eac083/LegacyWindows/Win98/Win98.BlackBat.asm
;   [16] https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush/
;   [17] https://machinelearningmasteyr.com/what-is-informatin-entropy
;   [18] https://vxug.fakedoma.in/samples/Exotic/UNC2452/SolarWinds%20Breach/
;
;  Not directly related but still relevant:
;   [i] https://github.com/Battelle/sandsifter
;   [ii] https://en.wikipedia.org/wiki/Hexspeak#Notable_magic_numbers

option win64:3      ; init shadow space, reserve stack at PROC level

DATA$00 SEGMENT PAGE 'DATA'

DATA$00 ENDS

TEXT$00 SEGMENT ALIGN(10h) 'CODE' READ WRITE EXECUTE

    Main PROC	
        call    simple_substitution_cipher
        call    simple_substitution_cipher
        call    sliding_key_cipher
        call    sliding_key_cipher
        call    long_key_cipher
        call    long_key_cipher
        call    transposition_cipher
        call    transposition_cipher
        ret
    Main ENDP

;-----------------------------------------------------------------------------
;  Encryption Primitives
;-----------------------------------------------------------------------------
    
;  Simple substition encryptor [6]
;
simple_substitution_cipher:
    simple_substitution_cipher_setup:
        mov     rcx, (offset payload_code_ends - offset payload_code) / 2
                                                        ;  Calculate payload body size in words                         
                                                                                                                        
        mov     rsi, rbx                                ;  source = start of encrypted code                             
        mov     rdi, rsi                                ;  destination = same as the source                             
        mov     rbx, 029Ah                              ;  rbx = key                                                                                                                                              
                                                                                                                        
    simple_substitution_cipher_loop_begin:                                                                              
        lodsw                                           ;  MOV's word from [si] to ax, and increases si by 2            
        xor     ax, bx                                  ;  The actual decryption                                        
        stosw                                           ;  MOV's word from ax to [di], and increases di by 2            
                                                        ;  Notice the segment must be marked RWX to modify the code     
        loop    simple_substitution_cipher_loop_begin   ;  DEC's cx, and jumps to start_loop if CX > 0                  
    simple_substitution_cipher_end:
        ret

;   Sliding key cipher [6]
;
sliding_key_cipher:
    sliding_key_cipher_setup:
        mov     rbx, offset payload_code
        mov     rcx, offset payload_code_ends
        sub     rcx, rbx
        shr     cx, 1                                   ;  Calculate payload body size in words

        mov     rsi, rbx                                ;  source = start of encrypted code
        mov     rdi, rsi                                ;  destination = same as source
        mov     rbx, 02828h                             ;  bx = decryption key
    sliding_key_cipher_loop_begin:
        lodsw                                           ;  MOV's word from [si] to ax, and increases si by 2
        xor     rax, rbx                                ;  The actual decryption
        inc     rbx                                     ;  Increment the loop
        stosw                                           ;  MOV's word from ax to [di], and increases rdi by 2
        loop    sliding_key_cipher_loop_begin           ;  DEC's cx, and jumps to looop head if CX > 0
    sliding_key_cipher_end:
        ret

;   Long key encryption [6]
;
long_key_cipher:
    long_key_cipher_setup:
        mov     rbx, offset payload_code
        mov     rcx, offset payload_code_ends
        sub     rcx, rbx
        shr     cx, 1                                   ;  Calculate payload body size in words 

        mov     rsi, rbx                                ;  source = start of the encrypted code
        mov     rdi, rsi                                ;  dest = same as source
        mov     rbx, offset long_key                    ;  bx = key indexing register
        mov     rdx, long_key_ends - long_key           ;  Length of key (even sized key)

    long_key_cipher_loop_begin:
        lodsw                                           ;  MOV's the word from [si] to ax, and increases si by 2
        xor     rax, [rbx]                              ;  The actual cryption
        add     rbx, 2                                  ;  Moves the key register to the next word in the key
        cmp     bx, dx                                  ;  Compare index to key length
        jb      long_key_cipher_loop_next               ;  Skip next instruction if not yet reached

    long_key_cipher_loop_next:
        stosw                                           ;  MOV's word from ax to [di], and increases di by 2
        loop    long_key_cipher_loop_begin              ;  DEC'c cx, and jumpts to start_loop if CX > 0

    long_key_cipher_loop_end:
        ret    

    long_key:
        dq      0FEEDDEADBEEFh
    long_key_ends:

;   Transposition (Order) Encryption [6] (TODO: Make this work)
;
transposition_cipher:
    transposition_cipher_setup:
        mov     rcx, (offset payload_code_ends - offset payload_code) / 2
        mov     rsi, offset payload_code                ;  source = start of encrypted code
        mov     rdi, rsi                                ;  dest = same as source

    transposition_cipher_loop_start:
        lodsw                                           ;  Load first word from source
        mov     bx, ax                                  ;  Stores first word in bx
        lodsw                                           ;  Loads second word from source
        stosw                                           ;  Puts the second word into the first word's place in dest
        mov     ax, bx                                  ;  Restores first word from bx to ax
        stosw                                           ;  Puts first word in second word's place in dset
        loop transposition_cipher_loop_start            ;  Decrements cx and jumps to loop head if cx > 0

    transposition_cipher_loop_end:
        ret

;  Get random seed
;
get_rand_seed:
    mov     rax, 0FEEDDEADBEEFh                         ;  Just return a test seed for now
    ret


; qword get_lehmer64(ecx=lehmer_state);
;
; __uint128_t g_lehmer64_state;
; 
; uint64_t lehmer64() {
;   g_lehmer64_state *= 
;
;   return g_lehmer64_state >> 64;
; }
;
get_rand:
    mov     rax, rcx
    mov     rcx, 0da942042e4dd58b5h
    imul    rax, rcx
    shr     rax, 64
    ret

; Get a register index from a random uint64
;
get_reg_from_rand:
    shr     rax, 59
    xor     rdx, rdx
    mov     rcx, 10
    div     rcx
    mov     rax, rcx
    ret

; Get a random number from a range
; uint64 RandFromRange(rcx=max)
rand_from_range:
    push    rcx                                        ;  Store the max value
    call    get_rand_seed                              ;  Get a uint64 seed
    mov     rcx, rax                                   ;  Move the seed to param_1
    call    get_rand                                   ;  Get a random uint64
    xor     rcx, rcx                                   ;  Create a counter
_rand_range_loop:
    shr     rcx, 1                                     ;  Remove one bit place
    inc     rcx                                        ;  Increment the counter
    test    rcx, rcx                                   ;  Did that shift zero it?
    jnz     _rand_range_loop                           ;  No, there's more data. 
    mov     r8, 64                                     ;  Max number of places
    sub     r8, rcx                                    ;  Subtract our number of places
    shr     rax, r8                                    ;  Shift the random down into our range
    pop     rcx                                        ;  Get the max value back
    xor     rdx, rdx                                   ;  Clear for division
    div     rcx                                        ;  Get the random's representation within the max value by modulo
    mov     rax, rcx                                   ;  
    ret                                                ;

;-----------------------------------------------------------------------------
;  Payload Code
;-----------------------------------------------------------------------------

payload_code:
    mov     rax, 1
    ret

payload_code_ends:

TEXT$00 ENDS

END