The Lost Art of Assembly Programming: Self-modifying Code
2024-04-30
On Von Neumann architectures, programs are data, therefore they can modify their own code while running.
Self-modifying code is mostly a matter of machine code even if some high level languages like LISP allow to manipulate their own code. So this practice was more common when developers wrote programs directly in assembly.
Self-modifying code makes decompilation impossible in the general case unless you solve the halting problem. Fortunately with the use of high level languages and modern architectures, it has become a forgotten technique and is almost impracticable today. Decompilation is difficult enough, and as far as I know, existing decompilers usually ignore modified code except to detect self-decyphering.
Self-modifying code is sometime used to make reverse engineering more difficult. Just In Time (JIT) compilation or other kind of code generation on-the-fly for graphics rendering, matrix multiplication, expression evaluation, ... can be considered as self-modifying code too. I'll ignore those cases and I'll focus on simple instruction changes that was common on old architectures and I'll show how that simple yet common cases can be automatically decompiled.
Examples
Here are few examples using the Z80 instruction set.
Storing a variable
If a variable is read as an immediate value instead of an indirect memory access, it saves a memory access:
4100 3A 00 50 ld a, ($5000)
...
4200 21 00 50 ld hl, $5000
4203 34 inc (hl)
...
5000 01
Just store the variable at 4101 instead of 5000 and use an immediate addressing mode.
4100 3E 01 ld a, 1
...
4200 21 00 50 ld hl, $4101
4203 34 inc (hl)
It will work only for one read access but other access can read it using ld a, ($4101)
.
Switch/case
Similar example but with an indirect jump.
4100 C3 03 41 jp $4103 ; Initial state: mode 1
4103 ; The mode '1' case
...
4140 ld hl, $4200 ; now switch to mode 2
4143 ld ($4101), hl
4146 ret
4200 ; The mode '2' case
...
4240 ld hl, $4300 ; now switch to mode 3
4243 ld ($4101), hl
4246 ret
4300 ; The mode '3' case
...
This is a rough equivalent to a C switch
:
int mode = 1;
switch(mode)
{
case 1:
...
mode = 2;
break;
case 2:
...
mode = 3;
break;
case 3:
...
}
Not only you save a variable but you also save a lookup table. Instead of storing a mode and then looking for the corresponding label in a lookup table, you just store the label in the jump instruction. The mode is the label. It works well when the value itself is not relevant and when there is only one switch statement for this enumeration.
Double indirection
Another use is to do a double indirection that is not natively supported by the processor:
4000 ld hl, ($8000)
4003 ld ($4101), hl
...
4100 ld a, ($0) ; this is virtually a 'ld a, (($8000))'
Those examples are not just theoritical examples, they have been used in real programs for good or bad reasons.
Why is it good?
Performance
The main reason to use self-modified code is certainly for performance. There are many cases where it can be more effective. Using an immediate addressing mode instead of an indirect addressing mode saves a memory access. It can also save registers which are very limited on 8 bit processors.
Memory
On old machines with few kilobytes of memory, each byte counted. Saving few bytes with this technique could make a difference between a program that fit in memory and one that doesn't.
Bypassing CPU limitations
On some CPU there is no way to do an indirect jump, a self-modifying instruction is the only way to do an indirect jump. Read this blog post Subroutine calls in the ancient world by Raymond Chen.
Why is it bad?
With new generations of CPU, self-modifying code became a bad practice for multiple reasons:
- Cache: if an instruction is modified, the instruction cache must be invalidated, affecting the performance. Some CPU did not invalidate the instruction cache automatically such as the Motorola 68020, therefore some programs for 68000 did not work properly with a 68020.
- Prefetch: modifying the next instruction may not be taken into account since it has already been read.
- It is usually not suitable with reentrancy or recursion.
- It does not work with program stored in a read only memory (ROM or RAM with Memory Management Unit).
Can it be decompiled?
Yes. In some limited case at least, and it is surprizingly easy with a decompiler that can do multiple passes on disassembly.
The easiest and the most common case is where immediate values are modified (loading a constant in a register, adding a constant to a register, jumping to fixed address, ...).
The principle is to translate an immediate access into an indirect access when a memory address is both identified as code and written by a machine instruction. So a global variable is created in the middle of the code like any other global variable.
The disassembler is made aware of those addresses and can disassemble the instructions differently. For instance, to disassemble an instruction with a 16-bit immediate value such as:
4100 21 CD AB ld hl, $ABCD
the disassembler has a readImmediate16()
function to read an immediate operand.
If this function finds that the immediate value is flagged as data, it returns an indirect access instead using the address where the immediate value is stored instead of the value itself. i.e. instead of disassembling 21 CD AB
as ld hl, $ABCD
, it disassembles it as ld hl, ($4101)
.
So it's just a little hack in the disassembler, it's not even handled by the decompiler (the decompiler has just to know that an instruction must be disassembled again when the data marks are set).
Implement this for all functions of the disassembler that read immediate values and now it will be handled for all instructions: copy, arithmetic operations, logical operation, jumps, ...
Self-modifying jump instructions just become indirect jumps and are handled as other indirect jumps.
Indirect access or relative jumps are more tricky but it can be done with some special addressing modes that may not exist in the CPU but that can be decoded as valid expressions.
4100 ld hl, ($1234)
can be interpreted as:
4100 ld hl, (($4101))
It is not a valid z80 code, but let's call it 'extended' z80 and let the decompiler handles this and it's done.
4100 jr $4122
can be interpreted as:
4100 jr $4102 + ($4101)
This addressing mode requires to sign-extend the value, but it's not difficult to handle this.
The tricky part is not to disassemble the self-modifying code, but to know when to re-decompile a function because of self-modifying code. The worst case is the switch example shown above: Each possible values must be tracked and each new case requires a new decompilation that in turn can add new cases. A function is going to be decompiled up to n times, where n is the number of distinct cases in the switch statement.