The Lost Art of Assembly Programming: Self-modifying Code

2024-04-30

On Von Neumann architectures, programs are data, therefore they can modify their own code while running.

Self-modifying code is mostly a matter of machine code even if some high level languages like LISP allow to manipulate their own code. So this practice was more common when developers wrote programs directly in assembly.

Self-modifying code makes decompilation impossible in the general case unless you solve the halting problem. Fortunately with the use of high level languages and modern architectures, it has become a forgotten technique and is almost impracticable today. Decompilation is difficult enough, and as far as I know, existing decompilers usually ignore modified code except to detect self-decyphering.

Self-modifying code is sometime used to make reverse engineering more difficult. Just In Time (JIT) compilation or other kind of code generation on-the-fly for graphics rendering, matrix multiplication, expression evaluation, ... can be considered as self-modifying code too. I'll ignore those cases and I'll focus on simple instruction changes that was common on old architectures and I'll show how that simple yet common cases can be automatically decompiled.

Examples

Here are few examples using the Z80 instruction set.

Storing a variable

If a variable is read as an immediate value instead of an indirect memory access, it saves a memory access:

4100 3A 00 50   ld a, ($5000)
...
4200 21 00 50   ld hl, $5000
4203 34         inc (hl)
...
5000 01

Just store the variable at 4101 instead of 5000 and use an immediate addressing mode.

4100 3E 01      ld a, 1
...
4200 21 00 50   ld hl, $4101
4203 34         inc (hl)

It will work only for one read access but other access can read it using ld a, ($4101).

Switch/case

Similar example but with an indirect jump.

4100 C3 03 41   jp $4103 ; Initial state: mode 1
4103            ; The mode '1' case
...
4140            ld hl, $4200 ; now switch to mode 2
4143            ld ($4101), hl
4146            ret

4200            ; The mode '2' case
...
4240            ld hl, $4300 ; now switch to mode 3
4243            ld ($4101), hl
4246            ret

4300            ; The mode '3' case
...

This is a rough equivalent to a C switch:

int mode = 1;

    switch(mode)
    {
        case 1:
            ...
            mode = 2;
            break;
        case 2:
            ...
            mode = 3;
            break;
        case 3:
            ...
    }

Not only you save a variable but you also save a lookup table. Instead of storing a mode and then looking for the corresponding label in a lookup table, you just store the label in the jump instruction. The mode is the label. It works well when the value itself is not relevant and when there is only one switch statement for this enumeration.

Double indirection

Another use is to do a double indirection that is not natively supported by the processor:

4000            ld hl, ($8000)
4003            ld ($4101), hl
...
4100            ld a, ($0) ; this is virtually a 'ld a, (($8000))'

Those examples are not just theoritical examples, they have been used in real programs for good or bad reasons.

Why is it good?

Performance

The main reason to use self-modified code is certainly for performance. There are many cases where it can be more effective. Using an immediate addressing mode instead of an indirect addressing mode saves a memory access. It can also save registers which are very limited on 8 bit processors.

Memory

On old machines with few kilobytes of memory, each byte counted. Saving few bytes with this technique could make a difference between a program that fit in memory and one that doesn't.

Bypassing CPU limitations

On some CPU there is no way to do an indirect jump, a self-modifying instruction is the only way to do an indirect jump. Read this blog post Subroutine calls in the ancient world by Raymond Chen.

Why is it bad?

With new generations of CPU, self-modifying code became a bad practice for multiple reasons:

Cache: if an instruction is modified, the instruction cache must be invalidated, affecting the performance. Some CPU did not invalidate the instruction cache automatically such as the Motorola 68020, therefore some programs for 68000 did not work properly with a 68020.
Prefetch: modifying the next instruction may not be taken into account since it has already been read.
It is usually not suitable with reentrancy or recursion.
It does not work with program stored in a read only memory (ROM or RAM with Memory Management Unit).

Can it be decompiled?

Yes. In some limited case at least, and it is surprizingly easy with a decompiler that can do multiple passes on disassembly.

The easiest and the most common case is where immediate values are modified (loading a constant in a register, adding a constant to a register, jumping to fixed address, ...).

The principle is to translate an immediate access into an indirect access when a memory address is both identified as code and written by a machine instruction. So a global variable is created in the middle of the code like any other global variable.

The disassembler is made aware of those addresses and can disassemble the instructions differently. For instance, to disassemble an instruction with a 16-bit immediate value such as:

4100 21 CD AB   ld hl, $ABCD

the disassembler has a readImmediate16() function to read an immediate operand.

If this function finds that the immediate value is flagged as data, it returns an indirect access instead using the address where the immediate value is stored instead of the value itself. i.e. instead of disassembling 21 CD AB as ld hl, $ABCD, it disassembles it as ld hl, ($4101).

So it's just a little hack in the disassembler, it's not even handled by the decompiler (the decompiler has just to know that an instruction must be disassembled again when the data marks are set).

Implement this for all functions of the disassembler that read immediate values and now it will be handled for all instructions: copy, arithmetic operations, logical operation, jumps, ...

Self-modifying jump instructions just become indirect jumps and are handled as other indirect jumps.

Indirect access or relative jumps are more tricky but it can be done with some special addressing modes that may not exist in the CPU but that can be decoded as valid expressions.

4100            ld hl, ($1234)

can be interpreted as:

4100            ld hl, (($4101))

It is not a valid z80 code, but let's call it 'extended' z80 and let the decompiler handles this and it's done.

4100            jr $4122

can be interpreted as:

4100            jr $4102 + ($4101)

This addressing mode requires to sign-extend the value, but it's not difficult to handle this.

The tricky part is not to disassemble the self-modifying code, but to know when to re-decompile a function because of self-modifying code. The worst case is the switch example shown above: Each possible values must be tracked and each new case requires a new decompilation that in turn can add new cases. A function is going to be decompiled up to n times, where n is the number of distinct cases in the switch statement.