<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Dec 5, 2013 at 12:54 AM, Dave Cridland <span dir="ltr"><<a href="mailto:dave@cridland.net" target="_blank">dave@cridland.net</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="im">On Thu, Dec 5, 2013 at 6:15 AM, mat henshall <span dir="ltr"><<a href="mailto:mat@squareconnect.com" target="_blank">mat@squareconnect.com</a>></span> wrote:<br> </div><div class="gmail_extra"><div class="gmail_quote"><div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><div>I do agree on the need for using optimized parsers (we have an xml parser that fits in a few hundred bytes of code and read only memory) that is extremely fast and efficient on small devices. BTW the trick to avoid the tag comparison is to convert the various tags and namespaces to hash values as they come in during parsing and then only compare on the hash value. We have a small utility that calculates the hash values and writes as a header file as part of the compile time all the possible namespaces and tag names that we are interested in. Extremely efficient in terms of both computation and memory space. Careful choice of the hash algorithm means very little chance of collision.</div> <div><br></div></div></div></blockquote><div><br></div></div><div>That's a neat idea - I was intending to add in attribute hashing in order to handle the well-formedness constraints (and also optimize attribute searches in general), but I'd not thought about compile-time precalculation of hashes, that's terribly clever, and an idea I'll shamelessly steal. FWIW, I was also going to build a simplistic single-hash Bloom on parse to elide some searches, too.</div> <div><br></div><div>Are you using any particular hash algorithm, or are you generating a perfect hash at buildtime too?</div></div></div></div></blockquote><div><br></div><div>Can't get much simpler….</div><div><br></div> <div>The utility for compile time simply takes a list of name/value pairs and calculates the hash on each string and then outputs as an enum structure.</div><div><br></div><div>The utility checks for any collisions… never had one yet.</div> <div><br></div><div>At run time, the element names, attribute names and namespaces are all computed as each character is being parsed… In a constrained embedded environment, the saving of constant memory can be significant, as well as the need for much smaller buffers while parsing/processing.</div> <div><br></div><div>The use of a 32 bit hash value allows me to use 'switch' statements for the decision of what to do for a particular namespace etc. In some 32 bit environments using a switch is extremely efficient and results in much smaller code than the alternatives as well (e.g. table driven).</div> <div><br></div><div><br></div><div><p style="margin:0px;font-family:Helvetica"><span style="color:rgb(147,26,104)"><b>#define</b></span> MULTIPLIER 37</p> <p style="margin:0px;font-family:Helvetica;min-height:7px"><br></p> <p style="margin:0px;font-family:Helvetica"><span style="color:rgb(0,97,65)">uint32_t</span> <b>sq_addToHash</b>( <span style="color:rgb(0,97,65)">uint32_t</span> hash, <span style="color:rgb(147,26,104)"><b>unsigned</b></span> <span style="color:rgb(147,26,104)"><b>char</b></span> c);</p> <p style="margin:0px;font-family:Helvetica"><br></p> <p style="margin:0px;font-family:Helvetica"><span style="color:rgb(0,97,65)">uint32_t</span> <b>sq_addToHash</b>( <span style="color:rgb(0,97,65)">uint32_t</span> hash, <span style="color:rgb(147,26,104)"><b>unsigned</b></span> <span style="color:rgb(147,26,104)"><b>char</b></span> c)</p> <p style="margin:0px;font-family:Helvetica">{</p> <p style="margin:0px;font-family:Helvetica"><span class="" style="white-space:pre"> </span><span style="color:rgb(147,26,104)"><b>return</b></span> MULTIPLIER *hash + c;</p> <p style="margin:0px;font-family:Helvetica">}</p> <p style="margin:0px;font-family:Helvetica;min-height:7px"><br></p><p style="margin:0px;font-family:Helvetica;color:rgb(147,26,104)"><span style="font-family:arial;color:rgb(34,34,34)"><br></span></p><p style="margin:0px;font-family:Helvetica;color:rgb(147,26,104)"> <span style="font-family:arial;color:rgb(34,34,34)">If you are interested, I'll check with my business partners and see if we can release the XML parser and hash utility as open source. </span></p><p style="margin:0px;font-family:Helvetica;color:rgb(147,26,104)"> <span style="font-family:arial;color:rgb(34,34,34)"><br></span></p><p style="margin:0px;font-family:Helvetica;color:rgb(147,26,104)"><span style="font-family:arial;color:rgb(34,34,34)"><br></span></p><p style="margin:0px;font-family:Helvetica;color:rgb(147,26,104)"> <span style="font-family:arial;color:rgb(34,34,34)">-</span><span style="font-family:arial;color:rgb(34,34,34)">- </span><br></p></div></div><br>Mat Henshall<br>Founder and CEO, Square Connect, Inc.<br>San Jose, CA<br><a href="http://www.squareconnect.com" target="_blank">www.squareconnect.com</a><br> cell: 650.814.7585 </div></div>