My Take on Exceptions
2024-12-14
Table of Content
Some time ago, I tried to implement exceptions in my main programming language. It was an interesting experiment even if in the end I dropped everything.
Before telling my journey, I'll give a bit more context on my view of error handling.
Part 1 - Random Thoughts on Exceptions
Correct error handling is hard, with or without exceptions.
Handling Errors in Memory Allocation
I use extensively a kind of Model View Controller (MVC) pattern in my applications: there can be many views that are unknown to the model. When the model changes, it notifies all views so they can update their state according to the change.
What happens if I want to handle memory allocation failures in a program using MVC?
Let's say I write a 3D modeller, I have a 3D model of a scene with a 3D view and a side 2D view. When I do a change to the model, this one notifies all views. Each view updates its state to reflect the change:
- the model first notifies the 2D view; it updates its state successfully according to the change
- then the model notifies the 3D view but this view cannot allocates memory to reflect the change.
I cannot cancel the change since the first view may fail during the revert.
The correct solution is to have a kind of transaction: the model notifies the changes to the views, each view prepares to update, and if all views are ok, the change is committed, otherwise the operation is rolled-back and the error is reported to the user. It is fine for database operations but seems overkill for a graphical user interface. I can see a couple of alternatives:
- ignore errors, just crash
- render with a special message in the view ("Something went wrong")
The latter seems better at first glance: at least it does not crash but if there is not enough memory, it's likely that the user won't even be able to save his document.
It's better to assume that the memory allocation never fails and avoid all the burden of safe memory management for something that will likely never happen. Of course this apply to a GUI application, not a critical software in embedded systems, database engines, kernels, ...
The Forgotten Error Code
A usual argument against error code is that the return code can be forgotten. This is an issue with C and its descendants where it is possible to ignore the value returned by a function.
This is solved with some C compilers allowing a warn_unused_result
directive on functions to enforce the read of the return value. And C++ has finally added a [[nodiscard]]
attribute in C++ 17.
I've solved it by attaching a mustcheck
directive to a type. So with a proper error type, an interger as well as an object, any error returned by a function cannot be accidentally ignored.
Avoid Errors
I would say that the best way to handle errors is to have no error. There are many way to reduce or eliminate error handling:
- Read a whole file into memory when possible instead of sequentially. Once loaded you can not have IO error anymore.
- Parse, sanitize or validate input early, so the core of the processing does not have to deal with related errors in the middle.
- Embed resources in the binary instead of loading them from files
- ...
My favorite trick was to eliminate error handling in the syntactic analyser of my compilers. I described it shortly in my how to write a fast compiler post.
Programmer's Errors
Some languages don't make distinction between programmer's errors (division by zero, invalid memory access, index out of bound, ...) and legitimate errors (IO errors, invalid inputs, ...) and consider everything exceptions.
Programmer's errors are just bugs that should be fixed by the development team and that should be handled differently depending on the goal:
- If the most important is correctness, the program should halt as soon as such an anomaly is detected.
- If the most important is robustness, the program should try to continue anyway with reasonable defaults.
While handling other errors is just the normal part of the program:
- retry the failed operation,
- report the error to the user,
- ...
Exception-Oriented Programming
Do you know exception-oriented programming? it is when exceptions are not just for exceptional events or even regular error handling but when they are used as the part of the algorithm: don't check for null pointers, just catch the exception, don't check for array size, just catch the out-of-bound exception.
The most common technique is the use of the NullPointerException
in Java so that you don't need to check for null pointers, catch the exception instead.
The following horror comes from a real program, it is a simple function that compares two strings.
bool matches(string str1, string str2) {
try {
byte[] bytes1 = Encoding.Unicode.GetBytes(str1);
byte[] bytes2 = Encoding.Unicode.GetBytes(str2);
for(int i = 0; i < bytes1.Length; i++) {
if (bytes1[i] != bytes2[i])
return false;
}
return true;
} catch (IndexOutOfRangeException) {
return false;
}
}
Here are few tips on what's wrong:
- It makes impossible to turn off bound checking to speed up execution.
- It makes the control flow of a simple iteration convoluted.
- It may catch an out-of-bound exception from a sub-function that is not related to the loop.
- It can make debugging more difficult if you want to break on an exception.
- The algorithm is not even correct: if the first array is shorter, the function will return true which is not the expected behavior.
I don't want to throw the baby out with the bathwater but exceptions has led to the worst abuses.
Part 2 - My experiments
Beyond the usual throw and catch, I needed few additions such as an unwind statement and a no-exception barrier.
The unwind statement
My language does not have RAII facilities like C++, instead I have a defer
statement that is executed when leaving a block. With exceptions, I need a special construct for creation or initialization functions.
class MyObject
def init
self.obj1 = createObject1
self.obj2 = createObject2
self.obj3 = createObject3
end
If createObject3
fails, the object is not considered initialized but obj1
and obj2
won't be cleaned up.
It can be solved with a unwind
statement similar to defer
but that is only invoked in case of exception thrown:
class MyObject
def init
self.obj1 = createObject1
unwind self.obj1.destroy
self.obj2 = createObject2
unwind self.obj2.destroy
self.obj3 = createObject3
end
I found later that Zig had an equivalent: errdefer
.
The 'No Exception' Barrier
Whatever the language I use, I always think object-oriented. One of the principle of OO is encapsulation, an object must always keep its integrity. When I have to deal with errors, I usually split a method in two parts:
- The first part can have errors and the state of the object is not modified.
- The second part cannot have errors and is reached if no error occurred. Here I can safely modify the state of the object.
def setPeriod(start: String, duration: String)
// Part 1: exceptions can be thrown, I don't change the state of the object
var f = parseTime(start)
var d = parseDuration(duration)
// Part 2: exceptions cannot be thrown after this point, I can change
// the state of the object
self.from = f
self.to = f + d
end
This example seems obvious, when changing the state of the object there can be no exceptions thrown. But what if an exception is thrown with the +
operator for any reason? The from attribute will be modified but not the to, leaving the object in an inconsistent state.
To clearly separate those two parts, I've created a special statement: ---
, the compiler detects and forbids exceptions after this. I did not use a keyword, but 3 minus signs instead to make it more visually explicit.
def setPeriod(start: String, duration: String)
var f = parseTime(start)
var d = parseDuration(duration)
---
self.from = f
self.to = f + d
end
This statement solves partially the problem of the hidden exit point since the function can be interrupted anywhere above the ---
but there can be no hidden exit below.
Exception Dispatching
My language is a low level language, a kind of C with genericity and some object-oriented helpers, it does not have any Run Time Type Information (RTTI) that is needed to discriminate between the kind of exceptions. The program must implement its own system to handle it. The standard library can help but a manual dispatch is needed:
catch e
if e.is(FileException)
...
elsif e.is(NetworkException)
...
else
throw e // forward
end
end
Where is
is a function from the standard library that looks for a class attribute and checks in the hierarchy of the expected type.
// 'is' is a generic function (one instance generated for each value of T)
def is(T: *): Bool
var myClass = self.klass
var expectedClass = T.klass // a class constant
// search in the hierarchy
...
end
I may be wrong but in practice you rarely care of the cause of the error, you just want to know whether an operation failed. In practice it is a minor inconvenient.
Anddd It's Gone
In the end, the experiment was mostly positive, exceptions did not add too much complexity to the language and it was useful but I decided to drop it for several reasons:
- Even if it adds only 1000 Lines of Code to the compiler, it is still a lot for a 20.000 LoC program.
- I want to keep my language minimalistic. It's easy to add features, it's difficult to remove them. Look at this presentation by the author of Lua that sums it very well: How much does it cost.
- I'm still reticent to use exceptions, mainly because of the hidden control flow but also because it raises too many questions: should it be for something unexpected (exceptional) or any kind of errors? should a parseInt function return a status or throw an exception (some chose both)? ...
Later I did unrelated experiments on sum types and I found that it could make exceptions almost useless and I think I made the right choice.
Sum and Union Types
Here, I'm going to show that given an implementation of exceptions and an implementation of union types, both can be stricly equivalent.
When implementing exceptions in my compiler, I chose a simple way to raise an exception by returning it like a regular return but with the Carry Flag (CF) set to distinguish it from normal return. It made the implementation very simple and easy, especially for unwinding (freeing all memory, closing files...) and it is very fast even if not zero cost:
- Insert a
CLC
(clear carry flag) for normal returns for functions that can throw. - Insert an
STC
(set carry flag) when throwing an exception. - Insert a
JC handleException
after calling a function that can throw.
After giving up and working on union types (tagged unions) for yet another programming language, I realized that my implementation of exceptions was just an equivalent of union types.
An union type is a list of two or more alternate types.
// T is an union type
const T = String | Int | Bool
// v can takes a string, an integer or a bool
var v: T
v = "hello" // ok
v = 123 // ok
v = true // ok
The implementation stores the variable in a couple (tag, value) where the tag is an enumeration giving the type of the value. For the example above, the tag will take
- 0 for String
- 1 for Int
- 2 for Bool
For an x64 CPU, the return value in the standard Application Binary Interface (ABI) is in rax
. We can extend this calling convention with the pair rdx:rax
when there is a tag. If the union type has only two types, A|B
, we don't need a 64-bit register for the tag, 1 bit is enough, 0 for A
, 1 for B
, so the ABI can use the carry flag instead, cf:rax
.
In my language with exceptions, I write:
def getItem(index: Int): Item throws OutOfBoundException
if index >= self.size
throw OutOfBoundException.new
else
return self.array[index]
end
end
In my language with union type, I write:
def getItem(index: Int): Item|OutOfBoundException
if index >= self.size
return OutOfBoundException.new
else
return self.array[index]
end
end
With the ABI described above, both versions will generate exactly the same code!
To handle union types, I've created a match
statement that works like a switch
but test against the type of the value.
def f(v: String|Int|Bool)
match v
case String as str
print("It is a string")
case Int as i
print("It is an integer")
else as b
print("It is a boolean")
end
end
The match
statement is also an expression, in order to prevent too many level it is possible to extract the remaining type. Both examples below are equivalents:
def f(v: A|B)
match v
case A as a
// handle a
else as b
// handle b
end
end
def f(v: A|B)
var b = match v
case A as a
// handle a
return
end
// handle b
end
With functions that return either a value of type T
or an Error
, the code looks like:
def f(name: String, out: Buffer): T|Error
var file = match File.open(name) // returns File|Error
case Error as e
return e
end
defer file.deinit
var count = match file.read(out) // returns Int|Error
case Error as e
return e
end
match file.close // returns None|Error
case Error as e
return e
end
// ...
end
This pattern quickly becomes very common an very verbose so it's tempting to create a shortcut. I've created ...
so I can rewrite the code above as:
def f(name: String, out: Buffer): T|Error
var file = File.open(name)...
defer file.deinit
var count = file.read(out)...
file.close...
// ...
end
Compared to a language with exceptions:
def f(name: String, out: Buffer): T throws Error
var file = File.open(name)
defer file.deinit
var count = file.read(out)
file.close
// ...
end
It is the same code without the ...
operators.
In the end, a language with union types plus a shortcut for early exit on errors allows the same conciseness as a language with exception but without the hidden exits.
Rust developers will have noticed that my ...
is equivalent to the Rust ?
operator. I couldn't use the same operator as I had already reserved T?
as a shortcut for T|None
.
Conclusion
At a small performance cost, it is possible to get the same feature with sum types and without the main drawback, the hidden control flow.
From my understanding:
- Sum types make exceptions obsolete (unless you absolutely need zero-cost exceptions)
- Sum types are fantastic. It is a general solution that can definitely eliminate the billion dollar mistake and make an end to 50 years of debates on exceptions.
I've used sum and union terms indifferently. It does not seems that there is an universally adopted terminology yet. For some people, union type means untagged union. I prefer to consider union types as sum types but where
A|A
is not allowed or is equivalent toA
. Anyway the distinction was not relevant in this post.
My evolution as a programmer regarding exceptions:
- 80s: exceptions? you mean hardware exceptions?
- 90s: looks nice, seems to be the future. Java, Python, Ruby, C++, ...
- 200x: why so many heated debates?
- 201x: what! Go, Rust ... modern languages without exceptions?
- 202x: definitely a thing of the past