RedHat 6.1 considered harmful

Note: I'm going to be adding more background and detail shortly, so check back in a few days.

I had a lot of major problems during and after my upgrade from RedHat 5.2 to 6.1, some of them show-stoppers. This was the first time I'd actually performed an upgrade rather than just wiping my OS partitions and doing a fresh install. I'd be interested to know if the increased problems I'm seeing are due to doing an upgrade rather than an install from scratch, or if they're just about 6.1 being a flakier release than 5.2.

RedHat 5.2 was really solid. However, 6.1 had a number of features that made me want to upgrade:

TrueType support
Newer versions of some packages (including Ghostscript and mkisofs, which I'd had to build myself under RedHat 5.2 in order to get features I needed)
Improvements in the kernel
Compatibility with newer third-party RPMs

Unfortunately, I had so many problems with 6.1, that I wiped my OS partitions and went back to 5.2 (and will probably be shopping around for a different distribution). I've been a RedHat customer for a long time and spent quite a lot of money on RedHat software. This is the first time I've felt like that money was wasted.

RedHat customer support

(There will eventually be a description of RedHat's hideous and almost unusable Web-based trouble-tracking system here.)

Can't see installation CD in ATAPI CD-ROM drive (potential showstopper for some people)

(Since I've already accomplished the install, this is a matter of curiosity for me and a bug report, rather than something I personally need a fix for.)

When I booted from floppy (from the boot-RHEA-1999:044.img disk), the installer couldn't see my CD in the ATAPI CD-ROM. After I was asked where the installation media was and said `Local CDROM', the screen cleared to blue and there was a very very long pause (several minutes) during which my IDE CD-ROM spun up and down a couple times and its light came on. Then the installation program said `I could not find a RedHat CD in any of your CD-ROM drives'. (Yes, there was one in the drive. :-)

Although an ATAPI CD-ROM is pretty vanilla hardware, I guessed that maybe some prior hardware probe might have been causing it to misbehave somehow, so I tried an expert-mode installation. That did not give me an option to read from an ordinary ATAPI CD-ROM drive. However, fortunately I happen to have a SCSI CD-R, which expert mode was able to see. I put the RedHat CD in the SCSI CD-R, and the installer found it and I was able to do the upgrade.

By the way, the pause after `running /sbin/loader' is extremely long; it might be useful to change that message to `running /sbin/loader; please wait' or precede it with `It is normal for the following step to take a long time' or something like that.

Update: I've discovered that:

I can mount most CDs in my ATAPI CD-ROM drive, including the Linux Applications Library and PowerTools CDs that came with my deluxe boxed set as well as lots of other CDs,
I can mount the Official RedHat 6.1 CDs (disk 1 with installation RPMs and disk 2 with SRPMs), as well as any other CDs, in my SCSI CD-R, but
I can not mount the Official RedHat 6.1 CDs (installation CD or SRPMs CD) in my ATAPI drive.

I'm guessing that there's something slightly off about the mastering of those disks that my SCSI drive is able to handle but my ATAPI drive can't.

(Incidentally, the StarOffice CD, right out of the shrinkwrap, was badly scratched. I was able to mount it; I haven't yet tried to install from it. Of course, that CD is produced by Sun rather than by RedHat.)

I see errors like the following in the output of dmesg:

  VFS: Disk change detected on device ide0(3,64)
  ATAPI device hdb:
    Error: Unit attention -- (Sense key=0x06)
    Not ready to ready transition, medium may have changed -- (asc=0x28, ascq=0x00)
    The failed "Test Unit Ready" packet command was:
    "00 00 00 00 00 00 00 00 00 00 00 00 "
  cdrom: open failed.

Original boot disk from box would not boot

When I found the updated boot disk couldn't find the CD in my ATAPI CD-ROM drive, I had tried booting from the original floppy included in my boxed set. That got partway through booting and displayed `boot failed...press a key to continue'. (That's when I tried the expert-mode installation from the updated boot disk.)

Can't access parallel port (SOLVED)

I am to print, because the RedHat 6.1 kernel can't see my (perfectly normal) parallel port out of the box. It was lp1 under RedHat 5.2. Running printtool and trying to add a printer tells me `Not detected' for lp0, lp1, and lp2, and catting something to /dev/lp1 says `no such device: /dev/lp1'. rmmodding and insmodding lp doesn't help. It worked fine under RedHat 5.2.

I also get kernel messages that say `lp: driver loaded but no devices found'.

Update: I found a bunch of similar bug reports in RedHat's Bugzilla database, and it turns out that the fix is to add the line

    alias parport_lowlevel parport_pc

to /etc/conf.modules. Oddly, though, my printer now shows up as lp0, whereas it was lp1 under RedHat 5.2

amd problems, locking and otherwise

(I originally thought this was an NFS problem; it turned out to be an amd problem.)

Both my mail spool (/var/spool/mail) and my MH folders are served via NFS from a RedHat 5.2 box. Some of the binaries in the nmh package seem to have trouble with that. When I try to send mail, I get a message (in the background, after comp has exited) saying something like `send: unable to lock and open 825: No locks available, continuing...' The mail does appear to get sent, though. The anno command fails with messages like `anno: unable to lock and open 3350: No locks available, continuing...'. In both cases the message appears after a long pause, as though something is timing out.

I do delivery directly to my mailbox via procmail (on the mail hub, on which my mail folders are local), so I don't know if inc would have similar problems, but I suspect it would.

The nmh binaries in the 5.2 distribution worked fine in this configuration (as do nmh and MH binaries on many other Unix systems I've used).

I just noticed the following in the output of dmesg:

    nsm_mon_unmon: rpc failed, status=-13
    lockd: failed to monitor 209.192.165.50

(where 209.192.165.50 is the IP address of my mail hub and NFS server, running RedHat 5.2). This suggests that the problem is actually with lockd rather than with nmh.

Hmmm... The man page for statd(8) is slightly wrong (it refers to /usr/sbin/rpc.lockd and /usr/sbin/rpc.statd, but those binaries are in /sbin), and there's no man page for lockd(8).

Update (but not solution): I found a new bug report in Bugzilla saying that symlinks aren't properly made to /etc/rc.d/init.d/nfslock to start it at boot, so I ran it by hand and the error message I got from the nmh binaries changed to `anno: unable to lock and open 866: Permission denied, continuing...' (after a long pause). I then ran chkconfig nfslock on and rebooted, in case statd and lockd needed to be running at mount time or something, but that didn't make any difference.

RedHat's response

I submitted a tech support request for this problem, and got back the suggestion to `upgrade' the NFS server to 6.1 too, since a lot of things have changed in the NFS support between 5.2 and 6.1. Hmmm... Well, the upgrade I did was originally intended as a dry run for upgrading the server, but at this point there's no way I'm putting 6.1 on another machine. But in any case, RedHat seems to claim that RedHat Linux is suitable for enterprise applications. But if the RedHat 6.1 NFS client code can't interoperate with the RedHat 5.2 server code, how can I be confident that it's going to interoperate with Sun, NetApp, Auspex, or Compaq NFS servers?

I would be happy with `that's thus-and-such a kernel bug (in 5.2 or 6.1), and here's the new kernel RPM', or even with `that's thus-and-such a kernel bug, and here's where you can get the source to recompile your kernel.' But the notion that I shouldn't expect this functionality across version numbers of the same vendor's OS is pretty surprising. (Admittedly, I'm reading a bit into the response I got here.)

But wait! It gets worse!

A couple days later, my machine hung (the mouse wouldn't move, and I couldn't switch virtual consoles) and I had to press the reset button. The machine rebooted and fscked cleanly, but since that time (and across reboots of both the file server running 5.2 and the NFS client running 6.1), attempts to access automounted filesystems (via amd) simply hang, and are unkillable. This pretty well makes my machine unusable. I can't figure out what state is involved here and where it's kept.

Now, I probably should have seen if it was possible to log in over the network and reboot the machine cleanly, but the fsck was clean, and there's nothing relevant that I can see in /etc/mtab. I can't find an analogue of /etc/sm on either machine, and in any case the hang doesn't seem to be related to locking.

Problem isolated to amd

It turned out that the problem was with amd. Static NFS mounts worked fine. That's not nearly so bad as I had thought at first, because I could either just get by with static NFS mounts, or figure out how to use autofs for mounting home directories. However, having had so many problems with RedHat 6.1, and still having the X server crash and hang with great frequency, I gave up and reinstalled 5.2.

SMP kernel caused swapper to panic

For no particular reason other than curiosity, I had chosen to install an SMP-capable kernel as well as the regular kernel in the installer, and the installer made that the default stanza in the generated lilo.conf. The SMP kernel, however, caused a kernel panic very early in the boot process. The panic message said the panic was in `swapper'. I'm afraid I didn't transcribe any of the register dump.

I was, however, able to boot from the (non-default) uniprocessor kernel, and I just removed the SMP stanza from lilo.conf and reliloed.

The description (rpm -qip output) of the SMP kernel said that it should boot fine on uniprocessor boxes too. In my case that wasn't the case. (By the way, I've got a Cyrix processor.)

X server crashes extremely frequently - SHOWSTOPPER

I've got a Matrox Millenium, so I'm using the XF86_SVGA server. It crashes extremely frequently (much more so than under the previous version I was running - unfortunately, I don't remember which version it was, but it was the most recent version for which there were update RPMs for 5.2 available from RedHat). I've had it crash several dozen times in the four days since I upgraded.

There are no signs of thrashing before it crashes. The crash is typically associated with mouse movement, but I'm not sure that's not coincidence. And yes, it happens even when Netscape isn't running. :-)

It has happened in response to nothing more than mouse movement over the panel, when nothing but a single xterm and the base KDE stuff (kpanel, kwm, krootwm) was running. However, certain things seem to cause it to crash more frequently. Trying to configure display settings (either from kcontrol or from the krootwm menu) causes it to crash quite frequently; any change I make seems to have about a 25% chance of causing the crash. Also, I found a Web page that reproducibly caused the crash for me in the version of Netscape (netscape-communicator-4.7-1.1) that came with 6.1 - visiting http://www.palmgear.com/software/showsoftware.cfm?prodID=5079 and scrolling down the page a bit will always crash the X server.

Nothing shows up in /etc/X11/xdm/xdm-errors, but /var/adm/messages shows the following when the server crashes:

    Server for display :0 terminated unexpectedly: 1536

kdm then restarts the server normally.

Here's my XF86Config file.

Unable to deiconify things in GNOME/Enlightenment

I haven't been much of a GNOME user, but when I ran into the problem above with the KDE panel, I thought I'd try it. I discovered that when I iconified windows (with the underscore button on the titlebar) they just disappeared, and there was no way I could find to deiconify them. They did not produce icons on the screen, and there was no taskbar. I tried pressing all the mouse buttons in various combinations and exploring the GNOME menu (that comes left from the foot icon), and couldn't find a window list of any sort. Presumably I could reconfigure Enlightenment to fix this, but if that's the normal configuration out of the box, it seems less than ideal.

Incidentally, I'd installed GNOME with the `install-gnome' script that came on the RedHat 5.2 CD, so perhaps this is a problem with bits of the RH5.2 GNOME configuration being left over. I didn't look into it very thoroughly.

KDE problems

Can't get KDE panel (kpanel) to lower

This is really an issue for the KDE developers rather than for you, but under the version of KDE I got from kde.org for RedHat 5.2, there was an option to allow other winders to be raised above the KDE panel. That option seems to be missing from the version of KDE shipped with RedHat 6.1, so there's no way to prevent the panel staying on top. I find that extremely frustrating; often I want a window to fill the screen completely (e.g. for a presentation or for doing graphics editing), and with the current version of KDE I can't do that. (I'm probably just going to figure out how to stop the panel from starting when I log in, and not use it.)

Can't easily reconfigure panel applications

Also not your fault but also slightly annoying: An ordinary user can't reconfigure the default icons on the panel. I tried to add -ls to the kvt invocation, but the change the the command to execute was silently dropped. That sort of makes sense, since maybe the panel was using a site-wide file to describe that application, and I was able to accomplish the same thing by defining my own personal application that called kvt, but (1) silently failing to make the change is confusing - it would be better to pop up an error message, and (2) creating a new personal application and adding it to the panel is a lot of work just to make a tiny change in how an application is invoked, and a steep learning curve for somebody who's new to KDE.

After I figured this all out, I remembered that it had been the same way under RedHat 5.2. (It had just been so long since I'd reconfigured the panel that I'd forgotten I wasn't using the system-wide kvt icon.)

/etc/sysconfig/desktop not created

Some of the scripts refer to /etc/sysconfig/desktop, but that file was not created by the upgrade. I managed to figure out that it was supposed to have `preferred=environment' in it by poking at a couple of scripts. I don't know if there's anything else that ought to be in there.

prefdm

By the way, /etc/X11/prefdm checks for the preferred environment by simply grepping for GNOME, KDE, or AnotherLevel. That's not robust in the face of a file that reads something like

    # I gave up on KDE, so let's try AnotherLevel
    preferred=AnotherLevel

Installer assumed mouse was right-handed

This is a very minor thing, but it might be easy to fix. My mouse is left-handed, but the installer required me to use the leftmost button as button 1. I don't know much about GTK, but if it's easy to tell the widgets to treat any button press identically to button 1, that might be a useful thing.

Installer doesn't look at existing lilo.conf

I had needed `append=linear' in my lilo.conf. It might be nice if the installer parsed the existing lilo.conf during an upgrade and set defaults appropriately, or at least gave the user the option of displaying it for reference when creating the new one. (Not a big deal, just an idea.)

Missing manual pages

EXMH-related

The EXMH documentation is divided into multiple man pages, including exmh(1), exmh-use(1), exmh-ref(1), and exmh-custom(1). Only the man page for exmh(1) is included in the RedHat 6.1 EXMH rpm (exmh-2.0.3-2); the others are missing. (That was the case under RedHat 5.2 also.)

clock(8)

The command `man -k clock' includes in its output

    clock (8)            - query and set the hardware clock (RTC)

and setclock(8) also refers to clock(8), but `man 8 clock' says `No entry for clock in section 8 of the manual', and sure enough, there's no clock.8 under /usr/man.

rhosts and hosts.equiv

There are no man pages for rhosts(5) or hosts.equiv(5).