Friday, November 14, 2008

Gentlemen, we can compile him. We have the technology.

Today, esr posted his thoughts on the Linux Hater's Blog which, among other things, led to a discussion about binary distribution vis-a-vis source distribution. I will now share my thoughts on the matter.

Gentoo Linux was my second Linux distro (my first was Mandrake). I know source-based distribution can have many benefits especially when the distribution is designed to cater to it as Gentoo is. Before the Ricers descend on us, let me say that I know optimization is NOT one of the benefits; you can -funroll-all-sanity all you want, but it is likely to do nothing, crash the app, or make it even slower.  USE flags, when they work properly, and the options are actually supported, can be a great way to customize a system to suit your particular needs. 

When I was new, I thought this kind of customization was awesome because it let me only use what I wanted to use. Since I used plain Fluxbox (what fun would Gnome or KDE have been with Gentoo; plus it compiled quicker), I would disable any support for KDE and Gnome, since I did not use them, and if I did not, it would drag in a whole bunch of libraries, and it would take 10-15 hours to compile.  I do not want to think about the amount of time I spent messing with various USE variables to reduce the amount of 'unnecessary' dependencies for an app I wanted to install. Of course, I would have to remember to add the USE variable modification to /etc/portage/packages.use for that particular application, or it might screw up the next time I updated the system.

Ahh updating! I remember that well. Gentoo was a bitch to update! It always appeared to be simple: "emerge --sync; emerge --update --deep --newuse world". However, it took many hours, and the computer was rendered unusable for most of the time. Since it was such a pain, I would go months without updating the system; then I would have to go through hell because the developers changed a whole lot and I had to do the emerge world dance two or three times! Of course this was the best case scenario. If something failed to compile . . . 

Apart from the pain of installing and updating the system, I remember the rest of the time being a breeze. At its peak, around 2004-2005, Gentoo gave me the best user experience with *nix I ever had. Most applications just worked. I had very good things to say about its 32-bit chroot. I could compile lots of programs, and my desktop was still responsive. Multimedia support was bar none! Xine + MPlayer + XMMS could handle anything you threw at it. Gentoo's versions made it easy to add support for MP3, DVD, WMV, etc. I still have not found a better or more versatile pair of media players than Gentoo's Xine (for DVDs) and MPlayer (for everything else). MPlayer could play practically any file you threw at it no matter how corrupted it was; sure, sometimes there would be no sound, some sound, skipping, linear viewing only, random freezes, but it WOULD play). No matter how much experience I have with freetardism, I will never understand why the distros seem to be dumping these two great players for the utterly brain-damaged GStreamer (which is another rant for another time).

The packages seemed to work so well that, now that I think about it, I am not sure if I completely agree with Linux Hater that distro-maintained packages (or ebuilds in Gentoo's case) are NECESSARILY a bad thing. Maybe the package maintainers have more influence over the quality of the application than most people realize.  Maybe whatever Debian/RedHat and their followers do to produce binaries is the really fucked up thing. Of course, this site lists certain, uh, problems people have had with Gentoo, so maybe LHB's point still stands.

However, not all packages worked well. If I ever used ANY masked packages, I was preparing myself for a world of hurt.  Since masked packages usually featured a lot of masked dependencies, I would usually have to install a bunch of unstable applications just to run one I wanted. Often, I managed to install all or most of the dependencies, but the app I wanted or one of its last dependencies would not install or run properly, so I was then stuck with a bunch of unstable libraries with no obvious (to me) way to go back. This was actually a general problem with Gentoo. By emerging only my essentials, I thought I was getting a lean, mean optimized machine, but as I added new applications, I would have to add a bunch of new dependencies. When I unmerged that app, the dependencies would still be there which soon made my Gentoo system just as full of cruft as any other distro. Sure I could "emerge --depclean; revdep-rebuild", but I would first have to update the entire system (which I hated doing because it took forever), and sometimes the orphaned dependencies would trouble the install. Now, I know that some of this mess was my fault for installing unstable but 'shiny' applications. I wonder how much of the Linux annoyances are caused by the lusers themselves who scream for the latest ub3rc001 but highly unstable application? If most of Linux's usability issues arise from the demands of the users themselves, then FLOSS has a major systemic problem with mass-market adoption.

Now that I have reminisced enough, let's get back to my original point regarding binary distribution. Let's take the (admittedly anecdotal) information above and apply it to source-based distribution as a whole. This model will treat all maintainers of upstream projects as a single source-based distribution. This model will treat the package maintainers of binary distributions as the users of the source-based distribution.  Now, first off, we can see that the package maintainer's task is a bit harder since he does not have automatic dependency resolution. Sure, the project documentation usually lists its dependencies, and most heavily-used libraries are already packaged in most distributions, but it still does not beat good ol' emerge app.  Now, the packager is in the same boat at the Gentoo user. He knows the specific needs of the distro better than upstream knows them, but upstream knows the software better than the packager knows it. Like Gentoo users and their USE variables, the package manager can add patches to the code and configure it with various options to better integrate it into the distro, but sometimes the modifications will break certain assumptions upstream has made, and all hell will break loose

Which one should we trust: upstream/source_distro or the packager/source_distro_user? I think we should trust upstream more, since they know the code better and can better avoid doing stupid things. Now, the best solution is when the maintainer and the distro packager are the same people, since they can then develop their app with integration in mind. This is probably why FreeBSD, despite orders of magnitude less funding, always felt more coherent and polished than any Linux distribution.

Of course esr does list some relevant problems with binary distribution:
I actually used to build my own RPMs for distribution; I moved away from that because even within that one package format there's enough variation in where various system directories are placed to be a problem. Possibly LSB will solve this some year, but it hasn't yet.
First, why were you building RPMs? What advantages do RPMs have over plain old TGZs except signature support and various metadata that could be included in the filename? By only building RPMs, you were excluding all the other non-RPM distros for no appreciable gain.

Second, wasn't the Linux Filesystem Hierarchy Standard supposed to solve this by standardizing the system directories? Wasn't it released 15 years ago? If that didn't work, the the Linux Standards Base should definitely have fixed it, but it seems to have failed. If, after 15 years, you still cannot determine the location of a mail spooler or logfile, then OSS has a MAJOR problem!

Third, what exactly did you need in those system directories anyway? In your book, the Art of Unix Programming, you wrote about this
Often, you can avoid this sort of dependency by stepping back and reframing the problem. Why are you opening a file in the mail spool directory, anyway? If you're writing to it, wouldn't it be better to simply invoke the local mail transport agent to do it for you so the file-locking gets done right? If you're reading from it, might it be better to query it through a POP or IMAP server?
If you were looking for applications, couldn't you just use /usr/bin/env? I am sure that is present on any Linux distro worth mentioning. If you were looking for libraries, then maybe you should statically compile your program.

I know there are some downsides to distributing statically compiled binaries. They take up more RAM and Hard Drive space, but RAM and Hard Drive space are both really cheap nowadays. Even low end notebooks feature 2-3 GB of RAM and 300-500GB hard drives; the typical user has RAM, Swap and Disk space to burn. The other problem with statically compiled binaries is that it is a bigger problem to update a library with a serious bug or security hole. However, the major proprietary software applications for Linux also have this problem, and they seem to have done okay. If the software developer is halfway competent, he will be tracking the development lists of all the libraries his app depends on, and he can then issue an update as soon as a patch for the affected library is released. Plus, open source has the advantage that, if the developer is being lazy or whatever, anyone who cares can (theoretically) download the source code for the app and its dependencies and produce a fixed binary. However, all of these downsides melt away after the feeling of navigating to a project's home page, downloading the Linux binary, installing it and running it just like in Windows and OS X!!!

In short, binary distribution has many advantages over source-based distribution, and Linux crusaders would do well not do dismiss them.

No comments:

Post a Comment