[jdev] A rapidxml fork for XMPP

mat henshall mat at squareconnect.com
Thu Dec 5 18:39:06 UTC 2013


On Thu, Dec 5, 2013 at 12:54 AM, Dave Cridland <dave at cridland.net> wrote:

> On Thu, Dec 5, 2013 at 6:15 AM, mat henshall <mat at squareconnect.com>wrote:
>
>> I do agree on the need for using optimized parsers (we have an xml parser
>> that fits in a few hundred bytes of code and read only memory) that is
>> extremely fast and efficient on small devices. BTW the trick to avoid the
>> tag comparison is to convert the various tags and namespaces to hash values
>> as they come in during parsing and then only compare on the hash value. We
>> have a small utility that calculates the hash values and writes as a header
>> file as part of the compile time all the possible namespaces and tag names
>> that we are interested in. Extremely efficient in terms of both computation
>> and memory space. Careful choice of the hash algorithm means very little
>> chance of collision.
>>
>>
> That's a neat idea - I was intending to add in attribute hashing in order
> to handle the well-formedness constraints (and also optimize attribute
> searches in general), but I'd not thought about compile-time precalculation
> of hashes, that's terribly clever, and an idea I'll shamelessly steal.
> FWIW, I was also going to build a simplistic single-hash Bloom on parse to
> elide some searches, too.
>
> Are you using any particular hash algorithm, or are you generating a
> perfect hash at buildtime too?
>

Can't get much simpler….

The utility for compile time simply takes a list of name/value pairs and
calculates the hash on each string and then outputs as an enum structure.

The utility checks for any collisions… never had one yet.

At run time, the element names, attribute names and namespaces are all
computed as each character is being parsed… In a constrained embedded
environment, the saving of constant memory can be significant, as well as
the need for much smaller buffers while parsing/processing.

The use of a 32 bit hash value allows me to use 'switch' statements for the
decision of what to do for a particular namespace etc. In some 32 bit
environments using a switch is extremely efficient and results in much
smaller code than the alternatives as well (e.g. table driven).


*#define* MULTIPLIER 37


uint32_t *sq_addToHash*( uint32_t hash, *unsigned* *char* c);


uint32_t *sq_addToHash*( uint32_t hash, *unsigned* *char* c)

{

*return* MULTIPLIER *hash + c;

}



If you are interested, I'll check with my business partners and see if we
can release the XML parser and hash utility as open source.



-- 

Mat Henshall
Founder and CEO, Square Connect, Inc.
San Jose, CA
www.squareconnect.com
cell: 650.814.7585
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.jabber.org/jdev/attachments/20131205/4d72d789/attachment-0001.html>


More information about the JDev mailing list