Adieu!

Well, that was fun. I finally tracked down the bug in Adium that was causing problems with the Bonjour protocol. A little background I guess.. Bonjour (aka Rendezvous or ZeroConf) is a protocol that is done mostly with multicast packets. The idea is, you plug into a network, and have no idea who or what is around – but all the devices on the network send out these announcements now and then, saying “Hey, I’m a printer” or “I’ve got iChat running, so you can talk to me” and such. It’s a really neat idea, and makes setup for printers a snap; just show up on a network, search for network printers, and there they are. It’s also what allows multiple copies of iTunes on the same network to talk to each other and find the shared music without the computers otherwise talking to one another (or requiring a central server to say “here they are”). Ya know what? This is going to get long-winded and probably boring real quick. Better insert a story break before that happens…


The problem was, when Adium fired up its Bonjour system, it would send out its announcement that it was listening for connections.. with the wrong port! On my laptop, if I connected to the desktop, it would work fine; but if the desktop first tried to contact the laptop, it would fail. Of course, because the desktop was trying to talk to the wrong port – but if the laptop initiated the connection, the desktop would return chat on the same port as the laptop started it. So now I just needed to figure out where the problem was. First time through, I found what I thought to be the problem, but fixing that made things worse. Well, the protocol worked fine now – the port advertised was the port where it was listening for connections. But they were both wrong! However, that led me to realize what the problem was – somewhere, where the port number was being passed to whatever function would advertise the availability, it needed to be “massaged” a bit first and it wasn’t. So I reverted the source code back to what it was before I patched things, and went hunting again. Found it not too long after that.

The reason for the problems was a lack of a call to the function htons(). See, computers store data in different ways – some are “little-endian”, meaning the least significant digits are first, and others are “big-endian” meaning the most significant digits come first. Have a look at Wikipedia for a full discussion on endianness. The problem is that the Internet is all done in big-endian notations, but not all computers are. So, there’s certain functions which can be called to make sure that the number you have is properly translated. On a big-endian computer, calling the htons() function returns the same value you fed it – because from a Host to a Network (hton), the endianness is the same. But, on a little-endian computer, there’s some math performed on the number. By doing things this way, the same program can be compiled and run on either machine, and achieve the proper answer every time (without needing to know what kind of machine it’s being compiled for).

So, what happened? When the function call to open a listening socket was applied, the port number was fed to htons() and the result went to the function to open the socket. That’s fine – the result was the socket opened on the correct port. But later on, when another function was getting called to put together the advertisement for the service (which includes the port number where you’re listening for connections), htons wasn’t called – so on the Intel Mac, the number was shifted in endianness and was advertised incorrectly. But on a PowerPC Mac, there’s no difference – so they never saw the bug. Cute, eh?

Once I fed the port into htons() and had that result go to the function that generates the service record, suddenly everything started working properly. And this is why writing programs that run on multiple platforms can be a nightmare – it’ll work just fine for half the people, but not for the other half. Eh, fine with me, I enjoyed tracking this down. Got to learn a bit about Apple’s XCode development tools and such, and realize that they’re everything I loved about DDD back in the day.

2 comments

  1. Hi Steve,

    I’ve been having Adium problems and stumbled across you blog. My wife and I use Adium to communicated between floors in our house. We started having problems with Messenger, so we switched to Bonjour. It worked great for a little bit and now is flaky. We’re both on Intel Macbooks. How does someone not as advanced as you, implement your htons() fix?

    Thanks,

    Herb

Leave a Reply