FAQ
1. How do I rebuild the library for 64 bit systems ?
You can't. AsmXml is written in 32 bit assembly. It must be completely re-written in order to be linked with 64 bit applications.
A 64 bit version is not planned yet.
2. Why fast parsing is important even if CPU is not the bottleneck ?
- It consumes less CPU, so it consumes less power.
- It allows the use of cheaper hardware.
- It leaves time for the processor to do something else.
3. Why assembly ?
- Assembly is fun.
Do you really think I would have written a parser in assembly in my spare time otherwise.
- Assembly is at least as fast as any other language.
Just use the compiled code as a starting point.
- Assembly is simple.
Few operand, few mnemonics, few addressing modes, ...
- Assembly does not favor any paradigm: use an OO approach, use a functional approach or just use GOTOs.
- Assembly is portable: this library runs on Mac OS X, Linux and Windows. This is far more portable than many applications written in C++ :-)
4. Why all in assembly ?
The main loop is a huge loop, with almost no subroutine call. It is a kind of big finite state automaton, so it is not really possible to write some parts in a high level language without losing the advantage of assembly.
5. Why not DOM and not SAX ?
Because it is not just a parser but a parser and a decoder.
DOM and SAX parsers, are not only slower at parsing but also they do only half of the job: you still have to perform some lookups and string comparisons to find attributes and elements. So you can not process XML document efficiently with these APIs.
Doing the parsging and decoding in one pass saves a lot of memory access and CPU time because elements and attributes are accessed in O(1) time.
6. Why in-situ parser ?
It saves a lot of copy. Strings are copied only when containing escape characters or when the text is fragmented because of comments, CDATA or Processing Instructions.
The strings are referenced with a pair of pointers (begin, end) and are not null terminated.
Converting values to null terminated strings would be a terrible waste of time when strings are numerical values or symbolic constants that will be stored as numbers.