Documentation
Building the Library
The library is released as an object file (.o or .obj) instead of a dynamic linking library. It will increase the size of your application by about 20K. I'm sure that if you are working with XML, you won't worry about 20K :-)
The binaries are already available in the distribution in the MS-COFF, ELF and Mach-O formats, so you shouldn't have to build the library from sources unless you want to do modifications.
Adding the Library to Your Project
- Include the "include/asm-xml.h" file in your source file.
- Link your project with the AsmXml object file.
Here are some tips to use it with various configurations:
- MSVC 6: Add the obj\ms-coff\asm-xml.obj file to your project.
- MinGW: Link your project with obj\ms-coff\asm-xml.obj.
- Linux: Link your project with obj/elf/asm-xml.o.
- Mac OS X: Link your project with obj/mach-o/asm-xml.o.
Defining a Schema
Like DOM, the parser creates an in-memory tree structure from the source document. Attributes and text-only elements can be directly accessed: you don't need lookup to find the value of an attribute or text element.
To decode attributes and elements, the parser needs a description of the structure of the document. AsmXml does not support (yet) DTD, XSD or Relax NG. It uses instead a simple XML schema file.
The object that define the possible attributes and child elements of a given element is referred to as a class.
Example:
<schema> <document name="employees"> <collection name="employee"> <attribute name="id"/> <attribute name="managerId"/> <text name="firstName"/> <text name="lastName"/> </collection> </document> </schema>
This schema describes a document where the root element is <employees> and include a list of <employee> elements.
Example of document matching this schema:
<?xml version="1.0" encoding="UTF-8"?> <!-- List of employees of the company --> <employees> <employee id="001"> <firstName>Brian</firstName> <lastName>Williams</lastName> </employee> <employee managerId="001" id="123"> <lastName>Smith</lastName> <firstName>John</firstName> </employee> </employees>
A parsed document will give an indexed access to attributes and text elements. The index depends on the order of definition of properties of elements. For instance, in the previous example, the id attribute will be at position 0, managerId at 1, firstName at 2 and lastName at 3.
The parser won't take care of the order of the text elements since they will be assigned to a particular slot depending on the class definition.
Child elements will be added in a linked list in their original order.
The schema element
The root element is the <schema> element and it includes:
- Zero or more <collection> elements that can be reused. This element cannot contain <attribute> or <element> items.
- One <document> element. This element defines the root class, i.e. the definition of the root element. It supports the same attributes and elements as a <collection>, see below.
The collection element
A collection defines an element that can occur zero or more time in a parent element. These elements will be accessibles from the parent's link list of child elements.
Attributes
Name | Comment |
---|---|
name | The name of the element. |
type | The type of the element. It defines what should be found between the tags:
|
id | An integer that uniquely identifies the class of the element against its siblings. This id is useless when an element can contain only one kind of children, but it becomes mandatory if there is several type of children. |
Elements
Name | Comment |
---|---|
attribute | An attribute. |
text | A text element. |
collection | A list of child elements. |
element | A single element. |
reference | A reference to a collection defined under the <schema> element. |
include | Include the content of a group defined under the <schema> element. |
The element element
The element is similar to a collection except that it can occur at most once in the parent element and you have a direct access to its value instead of enumerating the list of children.
In fact, a cell of the attribute[] array holds an AXElement* instead of an AXAttribute.
The attribute element
Adds an attribute to the class.
Attributes are directly accessed from an array, their index corresponds to their order of definition starting to 0, or 1 if the element is of type 'text'.
Attributes
Name | Comment |
---|---|
name | The name of the atrtribute. |
ignore | Forces the parser to ignore this attribute. The attribute is just skipped. This improves performances and saves memory.
|
The text element
Adds a child element without attribute and containing only text.
text elements are directly accessed from an array, their index corresponds to their order of definition (including attributes) starting to 0.
Attributes
Name | Comment |
---|---|
name | The name of the element. |
ignore | Forces the parser to ignore this text element. The element is just skipped. This improves performances and saves memory.
|
The reference element
Allows to include a collection that is defined under the <schema> element.
The reference can appear before the target definition as well as afterward.
Attributes
Name | Comment |
---|---|
name | The name of the collection defined under the <schema> element. |
The include element
Allows to include the content of a group that is defined under the <schema> element.
the target must be defined before the include.
Attributes
Name | Comment |
---|---|
name | The name of the group defined under the <schema> element. |
The group element
A group is just a container to be included in a collection or an element. It support the same child elements as a collection and is identified by a name.
Exploring the Document
The parse function will return, if succeeded, a pointer to an AXElement object. All you need to read the parse document are the AXElement and AXAttribute structures.
AXElement
Name | Type | Comment |
---|---|---|
id | int | The id (the type) of the element |
nextSibling | AXElement* | The next sibling element |
firstChild | AXElement* | The first child element |
attributes | AXAttribute[] | The array of attributes and text element |
The first attribute corresponds to the first <attribute>, <text> or <element> declared in the class definition, the second attribute corresponds to the next <attribute>, <text> or <element>, etc...
The id is an integer that uniquely identifies the element. It is defined in a <collection> element. The id is required when you need to discriminate between one element type and another.
Example:
<schema> <document name="body"> <collection name="b" id="1" type="text"/> <collection name="i" id="2" type="text"/> </document> </schema>
The id '0' is reserved for text elements appearing in mixed content.
AXAttribute
Name | Type | Comment |
---|---|---|
begin | char* | Beginning of the value |
limit | char* | Last char + 1 |
Functions
ax_initialize
Initializes the library.
void ax_initialize(malloc, free)
Arguments
malloc The memory alllocation function. free The free memory function. Description
This function initializes the library, it must be the first invoked function of the library.
ax_initializeParser
Initializes the parse context.
int ax_initializeParser(AXParseContext* context, uint chunkSize)
Arguments
context The parse context. chunkSize The default size of chunk. Return Value
The error code, 0 if ok.
ax_releaseParser
Releases the parse context.
void ax_releaseParser(AXParseContext* context)
Arguments
context The parse context. chunkSize The default size of chunk. Description
This function will release all memory resources allocated by this context, i.e. all documents parsed with this context will be deleted.
ax_parse
Parses and decodes an XML string.
AXElement* ax_parse(AXParseContext* context, const char* source, AXElementClass* type, int strict)
Arguments
context The parse context. source The source of the document to parse. type The type of the document to parse. strict If this value is zero, attributes and elements not defined in the schema will be ignored without error. Otherwise, the function will stop parsing at the first unknown element or attribute. Description
Returns the root element or NULL if the parsing fails. In case of failure, checks the context->errorCode value.
ax_initializeClassParser
Initializes the class parser.
int ax_initializeClassParser(AXClassContext* context)
Arguments
context The class parser. Return Value
Returns error code, 0 if ok.
ax_releaseClassParser
Releases the class parser.
ax_releaseClassParser(AXClassContext* context)
Arguments
context The class context. Description
Releases all resources allocated by this context. All classes created with this context will be deleted.
ax_classFromElement
Creates a class from an element.
AXElementClass* ax_classFromElement(AXElement* element, AXClassContext* context)
Arguments
element The class definition. context The class context. Return Value
Returns the created class or null if an error occurred.
ax_classFromString
Creates a class from a string.
AXElementClass* ax_classFromString(const char* string, AXClassContext* context)
Arguments
string The source of the class definition. context The class context. Return Value
Returns the created class or null if an error occurred.
Error Codes
When an error occurs, check the errorCode attribute of the AXClassContext or AXParseContext for more information on the type of error.
Name | Comment |
---|---|
RC_OK | Everything is ok |
RC_MEMORY | Out of memory |
RC_EMPTY_NAME | name empty or not defined |
RC_ATTR_DEFINED | Attribute already defined |
RC_INVALID_ENTITY_REFERENCE | Must be amp, quot, lt, gt, or apos |
RC_UNEXPECTED_END | Found last char too early |
RC_INVALID_CHAR | Wrong char |
RC_OVERFLOW | Number to big in char reference |
RC_NO_START_TAG | XML does not start with a tag |
RC_TAG_MISMATCH | Invalid close tag |
RC_INVALID_TAG | Invalid root element |
RC_INVALID_ATTRIBUTE | Attribute not defined in schema |
RC_INVALID_PI | Invalid processing instruction (<?xml...) |
RC_INVALID_DOCTYPE | Duplicate doctype or doctype after main element |
RC_VERSION_EXPECTED | 'version' is missing in xml declaration |