There was a time when antivirus evasion was easy. There was even a time, around 2015/2016, when it was trivial, and several open-source “silver bullets” existed that could evade defenses almost at will. From reflectively embedding payloads in memory, to shellcode packers, to PE encryption wrappers, the means of achieving stealth were as numerous as they were accessible.

In my experience however, not only has this not been the case for some years, it also is worsening : evasion techniques are scarcer, and are technologie dependent. Tooling used to be a cat-and-mouse race where the mouse often had the upper hand, but now the tables have turned. In the rare cases where a somewhat universal evasion technique is found, it usually becomes obsolete within months. This create a vicious circle where attackers are less likely to share their tradecraft as to not lose months of work, which means defensers have fewer techniques to optimize against, which means techniques are made obsolete even faster, which means attackers are more likely to keep their tradecraft, etc.

It has come to the point that, I feel, as an attacker, you either develop your own tooling (and keep it to yourself), or you use existing tools and commit to the painful, time-consuming task of customizing them until their signatures and event traces are sufficiently distinct (a process you must repeat for every single one of your tools). Either way, the task is complex. The time you must sink into developing your own tools is significant, and I believe this problem is shared by most pentesters and red teams worldwide.

It’s with all that in mind that I stumbled across a blogpost from foxit a year ago, called “Red Teaming in the age of EDR: Evasion of Endpoint Detection Through Malware Virtualisation“, wrote by Boudewijn Meijer and Rick Veldhoven.

In this article, I’ll give a quick breakdown of the current state of detection mechanisms as I understand it, how this approach can help us in bypassing them, and some glimpse into my own implemenation.

Evolution of antivirus

Historically, antivirus were mostly glorified pattern-searching engines. Given enough bytes in common with a previously discovered virus, a file was deemed malicious. There were two characteristics that attackers abused to evade those kinds of engines.

First, this byte-sequence comparison meant that, if I managed to produce a PE with the same functionalities but different byte sequences, the antivirus wouldn’t catch the payload, even though the end result is the exact same. This was done in several ways. Manually editing the source code was, of course, one way of doing it. But the easy way out was to modify the whole payload at once through encoding (e.g., shikata_ga_nai), encryption (e.g., Veil-Evasion), or polymorphism (e.g., well, also shikata_ga_nai, but for its decoding stub).

Secondly, this file-centered paradigm meant that, if we somehow managed to execute our payload without it being an actual file, the antivirus would be completely blind. This was mostly done by creating small launchers that fetched the actual payload through an HTTP request (or any other channel) and launched it reflectively ; meaning the bulk of what got executed never touched the disk to begin with. This discovery, tremendously popularized by PowerSploit and Empire, induced that a complete antivirus bypass was as easy as writing a single PowerShell line. This was the golden age of antivirus evasion.

However, as time went by, techniques to catch both approaches were either invented or improved with new optics to guide their judgment. We can classify those detections methods in two broad categories: Pre-execution heuristics and Runtime monitoring. Here’s a basic run down of the main (not all) detections that those categories encompass:

Pre-execution heuristics

Entropy analysis

Entropy is one of the most effective way of catching most encryption based wrappers. Indeed, truely random data should be somewhat rare inside executable files. Instructions are not random, strings are not random, resources files of most kinds should not be random (compressed data like images or archives being the notable exception here). Thus, higher than average entropy is considered a decent indicator that a given file is malicious.

Import Address Table Analysis

The IAT contains the external functions (resolved by the loader) the executable might use at run-time. As such, an executable referencing well known functions often seen in malicous code (such as VirtualAlloc, VirtualProtect, etc.) is another mark that the PE might be malicious.

Pattern matching

Of course, the historical way of catching payloads still exists, and section data such as strings or sequences of instructions found in previous identified malwares are matched against analyzed files, in order to detect wether or not they are malicious.

Runtime monitoring

Userland API hooking

Userland API hooking intercepts calls to sensitive Windows APIs within user-mode processes. EDRs commonly monitor functions related to memory allocation, code injection, and process creation (e.g., VirtualAlloc, WriteProcessMemory, CreateRemoteThread). By capturing these calls and their arguments, the agent is able to flag suspicious behavior post obfuscation.

Event Tracing for Windows

ETW works on a provider-controller-consumer basis. A part of the operating system, ranging from user-mode applications to the kernel, provides events. Providers can be enabled or disabled through controllers. Security products can start a session with the adequat providers through their controller and use their consumer agent to access events, and take actions based on those.

Kernel level callback routines

EDRs often include a kernel level agent (so, a driver) that registers callbacks on process and object notifications. Those notifications occur when key objects are created or modified. This allows kernel-level monitoring of process spawns and suspicious handle access, among other things. The monitoring of process spawning is what, I believe, triggered the switch in popular C2s from fork&run to inside agent execution. 

Memory scanning

Agents can inspect process memory for indicators such as executable pages without a file backing, regions marked RWX, byte patterns resembling known shellcode, etc. Memory scanning is often triggered when a suspicous event is identified through another sensor. 

It’s also important to say that the runtime sensors collect events, but unless one specific event is absolutly known to be malicious, events are often correlated with each others to classify a process as benign or not. This correlation can be either done through human fed rules, or through ML heuristic. Also, a lot of the sensors we just described only raise the “suspicious” score of the payload, and only when this suspicious score exceeds a particular threshold, then the payload is actually deemed malicious.

While as I said the given list of detection mechanisms is not exhaustive, it provides a basic checklist on what we want our evasion tools to bypass.

Rundown of the approach

Before beginning to describe the approach itself, let’s make some parallels with known systems that work in analog ways. 

In Java or .net for example, source code is compiled into an intermediate representation (Java bytecode for Java, and CIL for .net), which is then executed in a runtime environment (the Java Virtual Machine for Java, and Common Language Runtime for .net). This managed runtime has multiple responsabilities, the main one being turning the intermediate language into native code, but also, for example, managing memory through garbage collection, ensuring threads synchronization, etc.

In the approach Boudewijn Meijer and Rick Veldhoven described, instead of turning source code into intermediate representation, an executable is transpiled into an intermediate representation, which is then executed in a runtime environment. 

The transpiler responsabilities are:

  1. Transform assembly instructions into encrypted instructions
  2. Do it in a manner that allows the managed runtime to decrypt instructions one at a time
  3. Do it in a manner that do not rises entropy

While Boudewijn Meijer and Rick Veldhoven do not do this, in my project, I also transpiled other sections of the executables, mainly .rdata (containing read only data, so strings) and .data (mainly containing global variables) ; into encrypted blob.

The intermediate file is  then executed in the runtime environment, which has a few responsabilities:

  1. Executing the intermediate file
  2. Ensuring that cleartext incriminating data remains in memory for the shortest possible time
  3. Confusing event driven analysis
Phantomerie execution flow

Pre-execution evasion

This way of executing intrinsicaly displays several interesting properties.

Because all instructions and data sections are stored in an encrypted format that do not raises entropy, pexes contain no recognizable instruction sequences or strings. This makes them immune to classical byte-pattern matching signatures. Other forms of static analysis (e.g., ML classifiers, anomalous section structure) could still apply, but raw bytes matching is no longer effective

Another way of catching encrypted instructions is to wait for it to get stored decrypted in memory. This method is also ineffective: instructions are decrypted, executed, and re-encrypted one at a time. Unlike standard shellcode execution, there is never a window where a large decrypted payload exists in memory. Detection would require flagging a single instruction at the precise moment it is decrypted, or to reach a sensor post-decryption (e.g. going through a hooked VirtualAlloc function. We’ll explore later what evasion tools this approach enables against those sensors).

Just like that, we have pretty good defenses against static pattern matching, memory analysis and entropy analysis, at least from the pexe perspective. The runtime environment could also be detected, which we will also address later.

Currently the runtime environment has no linker nor loader. Instead, functions are imported reflectively within the payload. As a result, the runtime environment does not import suspicious functions directly, rendering Import Address Table analysis ineffective. However, I will mention that PEB walking, which you have to do in order to reflectivly load functions, can be seen as suspicous by EDRs. Pragmatically, I’ve never seen a payload detected because of this alone, but it’s something to keep in mind.

What about event analysis based on what sensors detect?

While this approach does not directly make your suspicious events disappear, it does provide a means to complicate heuristic and event-tree-based detection. The architecture of this technique allows us to run multiple runtime engines from the same thread, one executing our malicious payload, the other pouring legitimate events at the same time, or in-between suspicious calls. 

From the outside, all events appear tied to a single thread. An EDR attempting to reconstruct a timeline will see legitimate API usage surrounding or overlapping with malicious activity, making it harder to separate intent from noise. In practice, this disrupts correlation: the same thread may appear to allocate memory, free it, perform harmless file I/O, then suddenly inject code ; but without a clear causal chain, thus obsuring the malicious pattern.

In short, this design does not remove visibility, but it corrupts context. Security tools still see events, yet the interleaving of benign and malicious actions makes it harder to assemble a conclusive picture, which can foil some event based detection.

Executing multiples runtimes inside the same thread

It will not, however, make unitary incrimating events appear legitimate. Drowning creation of an LSASS handle in legitimate events won’t help, since it is usually enough proof on its own of malicious activities. Interleaving only helps against detections that depend on chaining multiple events together. Uniquely incriminating events, such as sensitive handle creation, remain incriminating regardless of surrounding noise.

Transpilation process

The transpilation is actually quite straightfoward. A PE is made of several sections, the one we’re the most interested with being the .text section, which contains the encoded assembly instructions executed by the CPU. Going from an assembly instruction to a Phantomerie instruction (pinstruction for short) is the main objective of transpilation. This transformation operates as follow:

First, the instruction is decoded through the iced rust library, which we re-encode in a specific format.

pub struct Instruction {
    pub opcode: u8,
    pub left_operand_type: u8,
    pub right_operand_type: u8,
    // left_operand and right_operand can be:
    // RegisterOperand, MemoryOperand, ImmediateOperand, or NoneOperand
    pub left_operand: u64,
    pub right_operand: u64,
}

The opcode, like in assembly, represents the operation specified by an instruction. In our runtime environment, which reimplements basic assembly operations, the opcode’s byte value is mapped to the corresponding instruction that should be executed.

e.g., if transpilation created a prinstruction whith opcode 0x0D, which maps in our implementation to POP, the following code will get executed by the runtime:

opcode maps the pinstruction to the actual code

Most assembly operation comes with one operand, two operands, or none. Those are encoded in the left_operand and right_operand fields. This means that, for instructions with less that two operands, all fields but opcode are actually padding. As in assembly, operands can be either immediate values, or indirect values, i.e values pulled from a register or the memory. That information (which type of operand we’re dealing with) is encoded in the left_operand_type and right_operand_type fields. 

In our case, the registers and memory are virtuals and maintened by the runtime environment.  We’ll dwelve into how virtual memory and virtual registers are implemented by the runtime. For now, all there is to know is that operands are encoded in a 8 bytes structures that closely follows they way they are encoded in assembly.

pub struct RegisterOperand {
    /// The index of the register. Only the 64 bits registers are indexable (RAX, RBX, etc.)
    pub name: Registers,
    /// Specifies the chunk of the register to start at (e.g., low byte, high byte, word).
    /// In practice, this is how we encode registers that do not start at the lower bytes, e.g. AH)
    pub chunk: u8,
    /// The size of the operand in bits (e.g., 8, 16, 32, or 64).
    /// For example, if the name is RAX, then a size of 32 will actually encode EAX, while a size of 16 will encode AX, etc.
    pub size: u16,
    /// Reserved space to align the struct to 64 bits.
    pub padding: u32,
}
pub struct MemoryOperand {
    /// The effective size pointed by the operand in bits (e.g., 8, 16, 32, or 64).
    pub size: u8,
    /// The index of the base register. This is the starting address for the calculation.
    pub base: u8,
    /// The index of the register used for scaled indexing.
    pub index: u8,
    /// A multiplier for the index register (valid values are 1, 2, 4, or 8).
    pub scale: u8,
    /// A constant value added to the calculated address.
    pub displacement: i32,
}
pub struct ImmediateOperand {
    // As described in the original article, using an union facilitates the translation to different register sizes.
    pub value: Value,
}
pub union Value {
    pub u8: u8,
    pub u16: u16,
    pub u32: u32,
    pub u64: u64,
}

This data format, which is the exact format of the original article, allows us to represent most assembly instructions ; but not all. For example, one form of the IMUL operation form uses three operands, and some operations work differently wether or not a prefix is present, such as the REP instruction ; which indicates that the current instruction has to be repeated until the counter register reaches 0.

For those instructions, we added a “reserved” field, so additional data can be encoded when standard fields are not sufficient.

pub struct Instruction {
    pub opcode: u8,
    pub left_operand_type: u8,
    pub right_operand_type: u8,
    pub left_operand: u64,
    pub right_operand: u64,
    // for now, only used for IMUL and REP prefix
    pub reserved: u32
}
Instruction vs pinstruction

We intentionally use a degenerate keystream (even-multiplier LCG with low-byte output) so that XOR obfuscation does not materially increase sliding-window entropy. Our goal is obfuscation while preserving the statistical profile of .text/.rdata, not confidentiality.

use crate::{ encryption_key::{ ENCRYPTION_SEED, LCG_CONSTANT_1, LCG_CONSTANT_2 }, Instruction };

pub struct SimpleStreamCipher {
    state: Wrapping<u32>,
}

impl SimpleStreamCipher {
    pub fn new(seed: u32) -> Self {
        SimpleStreamCipher {
            state: Wrapping(seed),
        }
    }

    pub fn next(&mut self) -> u8 {
        // Simple Linear Congruential Generator (LCG) - not cryptographically secure but we don't care
        self.state = self.state * Wrapping(LCG_CONSTANT_1) + Wrapping(LCG_CONSTANT_2);
        (self.state.0 & 0xff) as u8
    }

    // Encrypt/Decrypt data by XORing with generated keystream
    pub fn apply_keystream(&mut self, data: &mut [u8]) {
        for byte in data.iter_mut() {
            *byte ^= self.next();
        }
    }
}

pub fn encrypt_decrypt_instruction(instr: &mut Instruction) {
    let instr_bytes = unsafe {
        std::slice::from_raw_parts_mut(
            instr as *mut Instruction as *mut u8,
            std::mem::size_of::<Instruction>()
        )
    };

    let mut cipher = SimpleStreamCipher::new(ENCRYPTION_SEED);
    cipher.apply_keystream(instr_bytes);
}

Instructions from PEs is not the only thing that gets encoded. Some headers are also stored in the resulting pexe, in order for the runtime environment to know how to deserialize and / or execute the file.

pub struct PhantomerieHeaders {
    pub entry_point: u64,
    pub phantomerie_headers_size: u8,
    pub instructions_section_size: u64,
    pub instructions_number: u32,
    pub arguments_section_size: u32,
    pub arguments_number: u16,
    pub rdata_size: u32,
}

Finally, we decided to also put other sections (for now, .rdata and .data) inside the transpiled pexe. This is the first real departure from the original implemation described by. The goal was to support more than PIC or stringless PE, even though I don’t know how useful this really is in practice.

The sections are imported as is from the PE, and the only difference is that they are encrypted using the same algorithm as the one used for instructions.

Exe and pexe matching
pub struct Phantomexe {
    pub headers: PhantomerieHeaders,
    pub arguments: Vec<Argument>,
    pub instructions: Vec<Instruction>,
    pub rdata: Section,
    pub data: Section,
}

Transpilations quircks: offsets translation

As described, one of the main differences between an instruction and a pinstruction is their size: native instructions are variable-length, while each pinstruction is exactly 23 bytes.

This means that, in the current state of transpilation we described, when our runtime executes pinstructions, all offsets that were originally expressed in bytes will be wrong.

For example, in the original PE, a JMP +0x230 would jump 0x230 bytes forward relative to RIP.

But in our runtime, +0x230 from RIP does not point to the same place, since the size of instructions has changed. Therefore, during transpilation, we must translate every offset.

In practice, contrary to how the actual RIP functions, our virtual RIP value will not represent a byte offset from the base of the image, but an instruction index. Similarly, offsets used by operations such as JMP or CALL must no longer be expressed in bytes, but in number of instructions.

Consequently, at transpilation, we must list every instruction that uses a RIP-relative offset, determine which instruction the offset actually refers to, and then compute the instruction delta. We then replace the original offset with this adjusted value.

Thus, if JMP +0x230 in the native binary lands 10 instructions ahead, the transpiled pinstruction becomes JMP +10.

Runtime environment

After transpiling an exe or a dll into a pexe, the runtime environment is what will execute it on the targetted system.

First, it unserialize the pexe file back into a Phantomexe struct.

It then initializes its virtual CPU, which contains virtual registers, flags and stack memory. Those are what the pinstruction will modify. For example, the pinstruction MOV RAX 0 will set the virtual RAX to 0. MOV [RBP+0x10] 0x1 will set 0x1 on the virtual stack, at the virtual address contained by the virtual RBP register plus 16 bytes. Finally, an instruction such as TEST EAX EAX will update the virtual flags of the CPU.

To summarize, the pexe is made of pinstructions, which, when executed by the runtime environment, update the components of our virtual CPU. All of this works very closely to how exes, assembly, and the actual CPU work, albeit in a simplified manner.

The structures used by our implemenation are as follow:

pub struct RuntimeEnvironment {
    pub cpu: CPU,
    pub phantomexe: Phantomexe,
}
pub struct CPU {
    /// Number of instruction executed since the runtime initialization
    pub number_of_instructions_executed: usize,
    /// Stores the state of the CPU flags (e.g., zero, carry, overflow).
    pub flags: Flags,
    /// Contains general-purpose registers.
    pub registers: [Register; 17],
    /// Manages the stack
    pub stack_manager: StackManager,
    /// used to access and write sections from the pexe
    pub sections_manager: SectionsManager,
}
pub struct Register {
    pub register_name: Registers,
    pub value: Value,
}
pub enum Registers {
    NONE,
    RAX,
    RBX,
    ...
}
pub struct Flags {
    /// Carry Flag (CF): Indicates an overflow for unsigned arithmetic operations.
    /// The carry flag is set if the addition of two numbers causes a carry
    /// out of the most significant (leftmost) bits added.
    /// 1111 + 0001 = 0000 (carry flag is turned on)
    pub cf: bool,

    /// Zero Flag (ZF): Indicates if the result of an operation is zero.
    pub zf: bool,

    /// Sign Flag (SF): Indicates the sign of the result (0 for positive, 1 for negative).
    pub sf: bool,
    
    ...
}    


The stack manager handles initialization, access, and write to the stack.

pub struct StackManager {
    pub stack: Vec<u8>,
    pub maximum_stack_size: usize,
}

impl StackManager {
    pub fn new() -> Self {
        // standard stack size seems to be 1 MB.
        // https://learn.microsoft.com/en-us/windows/win32/procthread/thread-stack-size
        let maximum_stack_size = 1 * 1024 * 1024;
        StackManager {
            stack: vec![0; maximum_stack_size],
            maximum_stack_size,
        }
    }
}

The section manager handles initialization, access, and write to sections.

pub struct SectionsManager {
    pub idata: Section,
    pub rdata: Section,
    pub data: Section,
}

pub struct Section {
    /// size take on disk
    pub disk_size: u32,
    /// size in memory
    pub memory_size: u32,
    pub is_encrypted: bool,
    pub raw: Vec<u8>,
}

Since there is a lot of similarites between those two components, they both implement the same Rust trait, called EncryptedReaderWriter.

pub trait EncryptedReaderWriter {
    fn write_u8(&mut self, address: usize, value: u8) {
        let bytes = [value];
        self.write_bytes(address, &bytes);
    }

    fn write_u16(&mut self, address: usize, value: u16) {
        let bytes = value.to_le_bytes();
        self.write_bytes(address, &bytes);
    }

    fn write_u32(&mut self, address: usize, value: u32) {
        let bytes = value.to_le_bytes();
        self.write_bytes(address, &bytes);
    }

    fn write_u64(&mut self, address: usize, value: u64) {
        let bytes = value.to_le_bytes();
        self.write_bytes(address, &bytes);
    }

    fn read_u8(&self, address: usize) -> u8 {
        self.read_bytes(address, 1)[0]
    }

    fn read_u16(&self, address: usize) -> u16 {
        let bytes = self.read_bytes(address, 2);
        u16::from_le_bytes([bytes[0], bytes[1]])
    }

    fn read_u32(&self, address: usize) -> u32 {
        let bytes = self.read_bytes(address, 4);
        u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
    }

    fn read_u64(&self, address: usize) -> u64 {
        let bytes = self.read_bytes(address, 8);
        u64::from_le_bytes([
            bytes[0],
            bytes[1],
            bytes[2],
            bytes[3],
            bytes[4],
            bytes[5],
            bytes[6],
            bytes[7],
        ])
    }

    /// Reads bytes from the specified section/stack address and length, returning a Value.
    ///
    /// # Arguments
    ///
    /// * `address` - The address to read from.
    /// * `length` - The number of bites to read (must be 8, 16, 32, or 64).
    ///
    /// # Returns
    /// A `Value` containing the read data.
    fn read_bytes_as_value(&self, address: usize, length: usize) -> Value {
        match length {
            8 => Value::new_u8(self.read_u8(address)),
            16 => Value::new_u16(self.read_u16(address)),
            32 => Value::new_u32(self.read_u32(address)),
            64 => Value::new_u64(self.read_u64(address)),
            // the only case where size "should" be 0 is for memory operands in LEA instruction.
            // Those operands are not supposed to actually be accessed, only the effective address is supposed to be calculated
            // Thus their size is 0.
            // Here, we treat it as a 64 bit value for debugging purpose.
            _ => Value::new_u64(self.read_u64(address)),
        }
    }

    /// Reads bytes from the specified section/stack address and size, returning a Value.
    ///
    /// # Arguments
    ///
    /// * `address` - The address to read from.
    /// * `size` - The number of bites to read (must be 8, 16, 32, or 64).
    ///
    /// # Returns
    /// A `Value` containing the read data.
    fn read_encrypted_bytes_as_value(&mut self, address: u64, size: u8) -> Value {
        self.decrypt();
        let unencrypted_value = self.read_bytes_as_value(address as usize, size as usize);
        self.encrypt();
        unencrypted_value
    }

    fn write_encrypted_value(&mut self, address: u64, value: Value, size: u8) {
        self.decrypt();
        match size {
            8 => self.write_u8(address as usize, unsafe { value.u8 }),
            16 => self.write_u16(address as usize, unsafe { value.u16 }),
            32 => self.write_u32(address as usize, unsafe { value.u32 }),
            64 => self.write_u64(address as usize, unsafe { value.u64 }),
            _ => self.write_u64(address as usize, unsafe { value.u64 }),
        }
        self.encrypt();
    }

    fn with_decryption<F, R>(&mut self, f: F) -> R where F: FnOnce(&mut Self) -> R {
        self.decrypt();
        let result = f(self);
        self.encrypt();
        result
    }

    fn decrypt(&mut self);

    fn encrypt(&mut self);

    fn write_bytes(&mut self, address: usize, data: &[u8]);

    fn read_bytes(&self, address: usize, length: usize) -> Vec<u8>;
}

By doing so, we only have to implement decrypt, encrypt, write_bytes, and read_bytes for the Section and StackManager structure. Here’s for example the Section implementation:

impl EncryptedReaderWriter for Section {
    fn read_bytes(&self, address: usize, length: usize) -> Vec<u8> {
        self.raw[address..address + length].to_vec()
    }

    fn write_bytes(&mut self, address: usize, data: &[u8]) {
        self.raw.splice(address..address + data.len(), data.iter().cloned());
    }

    fn encrypt(&mut self) {
        if self.is_encrypted {
            return;
        }
        encrypt_decrypt_raw(&mut self.raw);
        self.is_encrypted = true;
    }

    fn decrypt(&mut self) {
        if !self.is_encrypted {
            return;
        }
        encrypt_decrypt_raw(&mut self.raw);
        self.is_encrypted = false;
    }
}

Because all code and data exist only inside our virtual sections/stack, decryption/re-encryption happens on ordinary heap buffers. This does not trigger the same telemetry as decrypting real executable pages in place (which looks like self-modifying code). To outside sensors, it’s just variable churn inside a process.

We talked about the effect of executing a pinstruction on our virtual CPU, but what does it mean for our runtime environment to “execute” a pinstruction?

First, execution follows a principle analogous to the principle of least privilege which I call least exposure principle. This principle stipulates that, for each instruction, the runtime should only decrypt the least amount of data necessary for the execution of the instruction.

Thus, for a given instruction, the instruction itself is decrypted. If the instruction needs to access the stack, it decrypts the current stackframe. If the instruction accesses a section, the section is decrypted. It would be ideal to only decrypt what is accessed on the section, but since we don’t have type knowledge post-compilation, this is either complex to implement or straight impossible.

Once the runtime is done executing the instruction, everything is encrypted again: the instruction, and if needed, the sections or the current stackframe. Here’s a visual illustration of what whould be encrypted and decrypted for three instructions, one of them accessing rdata:

rdata is decrypted only when the instruction that needs it is executed

As an example, let’s follow the execution of a simple pinstruction, such as MOV RAX [RBP+0x10]. After the runtime unserializes the pexe, it dispatches each instruction. At one point, it will reach our mentionned pinstruction.

   fn dispatch_instruction(&mut self) -> Option<()> {
   
        let original_rip = self.get_rip() as usize;
        encrypt_decrypt_instruction(&mut self.context.instructions[original_rip]);
        let execution_result = self.cpu.execute_instruction(&mut self.context.instructions[original_rip]);
        // We'll explain what this does later
        self.cpu.update_pointers(&mut self.context.instructions[original_rip]);
        encrypt_decrypt_instruction(&mut self.context.instructions[original_rip]);

        match execution_result {
            // if we executed the last RET we notify execution has stopped
            Some(_) => {
                self.next_instruction(original_rip as u64);
                Some(())
            }
            None => {
                return None;
            }
        }
    }
When execute_instruction() is called for the pinstruction MOV RAX [RBP+0x10], the opcode is matched to the correct virtual operation.
    pub fn execute_instruction(&mut self, instruction: &Instruction) -> Option<()> {
        match instruction.opcode {
            OperationCode::ADD => self.add(instruction),
            OperationCode::AND => self.and(instruction),
            OperationCode::SUB => self.sub(instruction),
            ...
            OperationCode::MOV => self.mov(instruction),
            ...
            OperationCode::RET => {
                if self.ret().is_none() {
                    return None;
                }
            }
            _ => panic!("Opcode not supported : 0x{:X}", instruction.opcode),
        }

        return Some(());
    }
        
        
The runtime keep every instruction encrypted but the current one