Identifiers in Zinc
Zinc allows the use of whitespaces in identifiers. This syntax makes the language very unique, so I think this feature deserves its own page.
Languages with infix operators does not permit the use of the characters '-' in identifiers as it would raise ambigous expressions: is a-b a single word or the substraction of two variables?
Such languages usually leave only two choices to combine several words:
- The camel case notation
- The underscore notation
There are a lot of passionate discussions and rare studies to find which one is the best. But compared to using white spaces, both are equally ugly.
How it Works
To design a language that enable whitespaces in identifiers, the grammer must follow this rule:
(1) Never allows two adjacent identifiers.
An additional rule has been added to enable the use of keywords in the middle of identifiers:
(2) A keywords is never preceded by an identifier.
There is no magic, and there is no ambiguity: the compiler uses a classical syntax analyser on top of a lexical analyser.
As a reminder, a lexical analyser converts a stream of chars (the source code) into a stream of lexical units. e.g.:
- def x = 1
is converted to:
- keyword "def"identifier "x"operator =constant 1
The syntax analyser converts the stream of lexical units into an abstract syntax tree.
As long as an identifier never follows another identifier, it is easy to get them at the lexical analysis: any word or number following a word is concatenated, consecutive blanks count as one.
It is not difficult to design a compliant grammar: identifiers are used in types and expressions, so just ensure that expressions are separated by a special characters (usually an operator).
After having been working with Zinc for years and written over 100.000 lines of codes, here are my thoughts on this syntax.
It is great. Everything in lowercase makes the code easy to type and to read. As types are in a separate namespace, name of structures and enumerations does not have to follow the usual convention where the first letter is a capital letter, so even type names are in lowercase.
Once accustomed, it is really easy to read. Syntax highlighting makes a clear separation between keywords and identifiers, anyway the list of keywords is very limited, around 20, so they are easy to remember.
The main drawback is the amount of parenthesis used to prevent ambiguities.
A drawback of the rule (2) is the use of more special characters and operators when keywords would have been better. Instead of:
each do item ... end
each ? item ... end
a or b
a || b