Character Types
Copper 3 has no more builtin character type. It is completely implemented in the library. The standard library define a CodeUnit type for UTF-8 or UTF-16 strings depending on the platform and a Unicode 32 bit CodePoint type.
To show how it is implemented, let's start with a simple example by defining an ASCII Char type.
The compiler uses the Smalltalk style for literal characters, e.g. $A for the 'A' character. From the compiler point of view, there is absolutely no difference between '$A' and '65'; both are the same literal integer. So $A can be a 32 bit signed integer as well as an unsigned 8 bit integer depending on the context.
Creating the Type
Just create a sub-type of Unsigned8.
stype Char : Unsigned8 .... end
Now we have a character type that is an Unsigned8: it inherits all operations from its parent type.
Defining Special Characters
There is no syntax for special characters, if you want a tab character, just use '9' or define a symbol for that.
stype Char : Unsigned8 'nul = 0 'tab = 9 'lf = 10 'cr = 13 end
Defining Additional Methods
In addition to inherited operations (addition, increment, ...), you may want to add useful methods to the new type.
stype Char : Unsigned8 'nul = 0 'tab = 9 'lf = 10 'cr = 13 function isNul return self == 'nul end function isUpper return self >= $A and self <= $Z end function toUpper return self isLower cond self + $A - $a else self end end
Now we have a fully operational character type we can use.
var i : Int32 i = $B // valid integer value 66 var c : Char c = 'tab c = $y c = c + 1 if c isNul return else c = c toUpper end
Unicode Character
To implement a Unicode character type, just repeat the same but make the Char type a sub-type of Unsigned32.
stype Char : Unsigned32 'nul = 0 ... end
As strings won't be implemented as an array of Unicode characters but more likely encoded in UTF-8 or UTF-16, the String type is implemented as an array of code units.
stype CodeUnit : Unsigned8 ... end // A String is a pointer to an array of code units stype String : *[]CodeUnit ... function eachChar // Reassemble all code units into characters // before passing them to a block. end ... end
The user may not have to worry about the encoding: it can iterate through the characters using eachChar:
var str = ... // a string str eachChar do c // c is a 32 bit Unicode character end