[jdev] Algorithms and XMPP

Thu Mar 4 04:14:17 CST 2010

Sorry about the late reply, I had been quite distracted by others
things this week.

On Sun, Feb 21, 2010 at 8:18 PM, Sebastiaan Deckers <cbas at pandion.im> wrote:
>
> Problems with reference implementations:

I should note that while I think reference implementations are a good
idea, I wasn't suggesting an official reference implementation (not
that I wouldn't be happy for having some)..

And we seem to mean entirely different things by the phrase 'reference
implementation'. I think you misunderstood what I proposed. I'll just
point to the Wikipedia article:
http://en.wikipedia.org/wiki/Reference_implementation_(computing)

Your points about official reference implementations are incorrect however:

> - Programming language dependent (eg. does a Python reference
> implementation help an Erlang developer?)

Yes it does. I'll explain below.

> - Platform dependent

It should be reasonably platform independent. It would be useful even
if it weren't, but more useful when it is.

> - Not subject to same design goals as other implementations

That's a plus point. This isn't One Implementation To Rule Them All.
This is an implementation with the main goals being correctness and
readability above all else. It serves as an example to other
implementations.

> - Impossible to create one software which implements every XEP.

I didn't suggest that. I'm focusing specifically on isolated
algorithms which people tend to get wrong.

> Compatibility issues between various "references."

Avoiding compatibility issues is sort of the point. Having examples to
go with any complicated abstract description always helps. The spec is
the abstract description, the reference implementation is an example
of how to go about implementing it.

> - Huge resource sink (time spent on an implementation that may not be
> used by many)

The point of a reference implementation is to be used as a reference
(that is, to be used as an example which others can follow). It can
certainly be reused, but that's not the main goal. We aren't
implementing all XEPs, only isolated algorithms, which are not much of
a time sink.

> - Will still have bugs which may then become de-facto standard

That's where a reference implementation helps. It highlights bugs in
the standards. And standards do have bugs, as seen in the problems in
the caps hash algorithm. Having an implementation is likely to bring
spec bugs into focus.

> - (Perceived) reduction in openness of XSF and XEP process

I don't get this. How? I repeat: This isn't One Implementation To Rule
Them All. It's one which serves as an example. If the phrase
'reference implementation' seems too high and mighty, feel free to
call it 'sample implementation' or 'example implementation'.

> - Political fighting over which is the "official" implementation
>

What's the incentive for a fight? The reference implementation serves
as an extension and example of the specification. I haven't really
seen anyone fighting over the rights to author a specification in the
XSF, much less an example in it.

> The only meaningful references are open standards and protocol/data
> specs. I agree that there are many compatibility problems, because
> specs are not easy to understand, but that's a fact of life in such a
> heterogeneous community as XSF.
>

The references implementation is supplementary to the specification.
Reference implementations are frequently included as part of
specifications (again, this wasn't what I was suggesting). See most
RFCs which define isolated algorithms. I'll refer you to the MD5 and
SHA-1 RFCs:

http://tools.ietf.org/html/rfc3174#section-7
http://tools.ietf.org/html/rfc1321

Think of it as pseudo-code, which just happens to be executable.

> IMO the most effective answer to these problems is testing. Create a
> list of challenge/response cases for servers or clients, validate
> logged XMPP data in all XEP namespaces, write functional tests for
> XMPP libraries, and so on. The topic of protocol test suites has come
> up often but I don't know of any real progress.
>

I'm +1 to test cases and a functional test suite. A reference
implementation is orthogonal to this. I'll explain below.

> Sebastiaan
>

What I expect of an official reference implementation:

 - correct
 - readable (clean code, with comments sprinkled where they make sense)
 - runnable (preferably without too much effort)
 - reasonably self-contained (too many dependencies make reuse and
porting difficult)
 - reasonably simple to port (which readable code usually is)

Why I would want to have a reference/sample/example implementation,
official or otherwise:

Having an example to follow saves time, even when it isn't in the
language I'm working with. I hope that's evident without a detailed
explanation.

Having a set of test cases (list of challenge/response cases) is
certainly a good thing. But a working implementation can be just that,
and more. It can be used to generate those test cases (generating them
by hand is error prone). If my implementation is failing, I can
compare my implementation's state at each step with the reference
implementation's, which would help track down issues. Given only test
cases, if my output isn't correct, it just tells me one thing: I did
it wrong, and nothing more. With a reference implementation I can know
at exactly which step I deviated from the standard. For complex
multi-step algorithms, this can be a huge time saver.

In addition, an example to follow can help prevent non-obvious
mistakes. I'll present a real example: authzids in DIGEST-MD5. You are
not supposed to send them when they are the same as the ID you are
authenticating with. I have filed bug reports for three stable clients
which do send them regardless. Those clients weren't even consistent
in what they sent. Just in the past 30 days there have been two people
asking for help with incorrect authzid usage in digest-md5
implementations on the jabber mailing lists. I'd argue that having a
piece of reference code which you see actively checking for this would
prevent such an issue. Another example: There is at least one server
and one client implementation which gets the caps hash algorithm wrong
in their current latest stable releases. Another example: Some samples
(effectively test cases) in the JID escaping XEP were incorrect. Hand
generated test cases are error prone.

You can freely blame the developers or spec authors for this, but that
isn't in any way a helpful or useful response. My point is that an
implementation being used as a reference would help in all these
cases.

For some reason I tend to write a lot more than usual when presenting
an argument, sorry :)

--
Waqas Hussain