RedHat 6.1 considered harmful
Note: I'm going to be adding more background and detail shortly,
so check back in a few days.
I had a lot of major problems during and after my upgrade from
RedHat 5.2 to 6.1, some of them show-stoppers. This was the
first time I'd actually performed an upgrade rather than just
wiping my OS partitions and doing a fresh install. I'd be interested
to know if the increased problems I'm seeing are due to doing
an upgrade rather than an install from scratch, or if they're
just about 6.1 being a flakier release than 5.2.
RedHat 5.2 was really solid. However, 6.1 had a number of features
that made me want to upgrade:
- TrueType support
- Newer versions of some packages (including Ghostscript and
mkisofs, which I'd had to build myself under RedHat 5.2 in order to get
features I needed)
- Improvements in the kernel
- Compatibility with newer third-party RPMs
Unfortunately, I had so many problems with 6.1, that I wiped my
OS partitions and went back to 5.2 (and will probably be shopping
around for a different distribution). I've been a RedHat customer
for a long time and spent quite a lot of money on RedHat software.
This is the first time I've felt like that money was wasted.
RedHat customer support
(There will eventually be a description of RedHat's hideous and
almost unusable Web-based trouble-tracking system here.)
Can't see installation CD in ATAPI CD-ROM drive (potential showstopper
for some people)
(Since I've already accomplished the install, this is a matter
of curiosity for me and a bug report, rather than something I
personally need a fix for.)
When I booted from floppy (from the
boot-RHEA-1999:044.img disk), the installer couldn't see my CD in the ATAPI CD-ROM.
After I was asked where the installation media was and said
`Local CDROM', the screen cleared to blue and there was a very
very long pause (several minutes) during which my IDE CD-ROM spun
up and down a couple times and its light came on. Then the
installation program said `I could not find a RedHat CD in any
of your CD-ROM drives'. (Yes, there was one in the drive. :-)
Although an ATAPI CD-ROM is pretty vanilla hardware, I guessed
that maybe some prior hardware probe might have been causing it
to misbehave somehow, so I tried an expert-mode installation.
That did
not give me an option to read from an ordinary ATAPI CD-ROM drive.
However, fortunately I happen to have a SCSI CD-R, which expert
mode was able to see. I put the RedHat CD in the SCSI CD-R,
and the installer found it and I was able to do the upgrade.
By the way, the pause after `running /sbin/loader' is extremely long; it might be useful to change that message
to `running /sbin/loader; please wait' or precede it with `It is normal for the following step to take a long time' or something like that.
Update: I've discovered that:
- I can mount most CDs in my ATAPI CD-ROM drive, including the
Linux Applications Library and PowerTools CDs that came with my
deluxe boxed set as well as lots of other CDs,
- I can mount the Official RedHat 6.1 CDs (disk 1 with installation
RPMs and disk 2 with SRPMs), as well as any other CDs, in my SCSI
CD-R,
but
- I can not mount the Official RedHat 6.1 CDs (installation CD
or SRPMs CD) in my ATAPI drive.
I'm guessing that there's something slightly off about the mastering
of those disks that my SCSI drive is able to handle but my ATAPI
drive can't.
(Incidentally, the StarOffice CD, right out of the shrinkwrap,
was badly scratched. I was able to mount it; I haven't yet
tried to install from it. Of course, that CD is produced by
Sun rather than by RedHat.)
I see errors like the following in the output of
dmesg:
VFS: Disk change detected on device ide0(3,64)
ATAPI device hdb:
Error: Unit attention -- (Sense key=0x06)
Not ready to ready transition, medium may have changed -- (asc=0x28, ascq=0x00)
The failed "Test Unit Ready" packet command was:
"00 00 00 00 00 00 00 00 00 00 00 00 "
cdrom: open failed.
Original boot disk from box would not boot
When I found the updated boot disk couldn't find the CD in my
ATAPI CD-ROM drive, I had tried booting from the original floppy
included in my boxed set. That got partway through booting
and displayed `boot failed...press a key to continue'. (That's when I tried the expert-mode installation from the
updated boot disk.)
Can't access parallel port (SOLVED)
I am to print, because the RedHat 6.1 kernel can't see my (perfectly
normal) parallel port out of the box. It was
lp1 under RedHat 5.2. Running
printtool and trying to add a printer tells me `Not detected' for
lp0,
lp1, and
lp2, and
catting something to
/dev/lp1 says `no such device: /dev/lp1'.
rmmodding and
insmodding
lp doesn't help. It worked fine under RedHat 5.2.
I also get kernel messages that say `lp: driver loaded but no devices found'.
Update: I found a bunch of similar bug reports in
RedHat's Bugzilla database, and it turns out that the fix is to add the line
alias parport_lowlevel parport_pc
to
/etc/conf.modules. Oddly, though, my printer now shows up as
lp0, whereas it was
lp1 under RedHat 5.2
amd problems, locking and otherwise
(I originally thought this was an NFS problem; it turned out
to be an
amd problem.)
Both my mail spool (/var/spool/mail) and my MH folders are served via NFS from a RedHat 5.2 box.
Some of the binaries in the nmh package seem to have trouble
with that. When I try to send mail, I get a message (in the
background, after
comp has exited) saying something like `send: unable to lock and open 825: No locks available, continuing...' The mail does appear to get sent, though. The
anno command fails with messages like `anno: unable to lock and open 3350: No locks available, continuing...'. In both cases the message appears after a long pause, as
though something is timing out.
I do delivery directly to my mailbox via
procmail (on the mail hub, on which my mail folders are local), so I
don't know if
inc would have similar problems, but I suspect it would.
The nmh binaries in the 5.2 distribution worked fine in this configuration
(as do nmh and MH binaries on many other Unix systems I've used).
I just noticed the following in the output of
dmesg:
nsm_mon_unmon: rpc failed, status=-13
lockd: failed to monitor 209.192.165.50
(where 209.192.165.50 is the IP address of my mail hub and NFS
server, running RedHat 5.2). This suggests that the problem
is actually with
lockd rather than with nmh.
Hmmm... The man page for
statd(8) is slightly wrong (it refers to
/usr/sbin/rpc.lockd and
/usr/sbin/rpc.statd, but those binaries are in
/sbin), and there's no man page for
lockd(8).
Update (but not solution): I found a new bug report in Bugzilla saying that symlinks aren't
properly made to
/etc/rc.d/init.d/nfslock to start it at boot, so I ran it by hand and the error message
I got from the nmh binaries changed to `anno: unable to lock and open 866: Permission denied, continuing...' (after a long pause). I then ran
chkconfig nfslock on and rebooted, in case
statd and
lockd needed to be running at mount time or something, but that didn't
make any difference.
RedHat's response
I submitted a tech support request for this problem, and got back
the suggestion to `upgrade' the NFS server to 6.1 too, since a
lot of things have changed in the NFS support between 5.2 and
6.1. Hmmm... Well, the upgrade I did was originally intended
as a dry run for upgrading the server, but at this point there's
no way I'm putting 6.1 on another machine. But in any case,
RedHat seems to claim that RedHat Linux is suitable for enterprise
applications. But if the RedHat 6.1 NFS client code can't interoperate
with the RedHat 5.2 server code, how can I be confident that it's
going to interoperate with Sun, NetApp, Auspex, or Compaq NFS
servers?
I would be happy with `that's thus-and-such a kernel bug (in 5.2
or 6.1), and here's the new kernel RPM', or even with `that's
thus-and-such a kernel bug, and here's where you can get the source
to recompile your kernel.' But the notion that I shouldn't
expect this functionality across version numbers of the same vendor's
OS is pretty surprising. (Admittedly, I'm reading a bit into
the response I got here.)
But wait! It gets worse!
A couple days later, my machine hung (the mouse wouldn't move,
and I couldn't switch virtual consoles) and I had to press the
reset button. The machine rebooted and
fscked cleanly, but since that time (and across reboots of both the
file server running 5.2 and the NFS client running 6.1), attempts
to access automounted filesystems (via amd) simply hang, and are
unkillable. This pretty well makes my machine unusable.
I can't figure out what state is involved here and where it's
kept.
Now, I probably should have seen if it was possible to log in
over the network and reboot the machine cleanly, but the
fsck was clean, and there's nothing relevant that I can see in
/etc/mtab. I can't find an analogue of
/etc/sm on either machine, and in any case the hang doesn't seem to
be related to locking.
Problem isolated to amd
It turned out that the problem was with
amd. Static NFS mounts worked fine. That's not nearly so bad
as I had thought at first, because I could either just get by
with static NFS mounts, or figure out how to use
autofs for mounting home directories. However, having had so many
problems with RedHat 6.1, and still having the X server crash
and hang with great frequency, I gave up and reinstalled 5.2.
SMP kernel caused swapper to panic
For no particular reason other than curiosity, I had chosen to
install an SMP-capable kernel as well as the regular kernel in
the installer, and the installer made that the default stanza
in the generated
lilo.conf. The SMP kernel, however, caused a kernel panic very early
in the boot process. The panic message said the panic was in
`swapper'. I'm afraid I didn't transcribe any of the register dump.
I was, however, able to boot from the (non-default) uniprocessor
kernel, and I just removed the SMP stanza from
lilo.conf and reliloed.
The description (rpm -qip output) of the SMP kernel said that it should boot fine on
uniprocessor boxes too. In my case that wasn't the case.
(By the way, I've got a Cyrix processor.)
X server crashes extremely frequently - SHOWSTOPPER
I've got a Matrox Millenium, so I'm using the
XF86_SVGA server. It crashes extremely frequently (much more so than
under the previous version I was running - unfortunately, I don't
remember which version it was, but it was the most recent version
for which there were update RPMs for 5.2 available from RedHat).
I've had it crash several dozen times in the four days since
I upgraded.
There are no signs of thrashing before it crashes. The crash
is typically associated with mouse movement, but I'm not sure
that's not coincidence. And yes, it happens even when Netscape
isn't running. :-)
It has happened in response to nothing more than mouse movement
over the panel, when nothing but a single xterm and the base KDE
stuff (kpanel,
kwm,
krootwm) was running. However, certain things seem to cause it to
crash more frequently. Trying to configure display settings
(either from
kcontrol or from the
krootwm menu) causes it to crash quite frequently; any change I make
seems to have about a 25% chance of causing the crash. Also,
I found a Web page that reproducibly caused the crash for me in
the version of Netscape (netscape-communicator-4.7-1.1) that came with 6.1 - visiting
http://www.palmgear.com/software/showsoftware.cfm?prodID=5079 and scrolling down the page a bit will always crash the X server.
Nothing shows up in
/etc/X11/xdm/xdm-errors, but
/var/adm/messages shows the following when the server crashes:
Server for display :0 terminated unexpectedly: 1536
kdm then restarts the server normally.
Here's
my
XF86Config file.
Unable to deiconify things in GNOME/Enlightenment
I haven't been much of a GNOME user, but when I ran into the problem
above with the KDE panel, I thought I'd try it. I discovered
that when I iconified windows (with the underscore button on the
titlebar) they just disappeared, and there was no way I could
find to deiconify them. They did not produce icons on the screen,
and there was no taskbar. I tried pressing all the mouse buttons
in various combinations and exploring the GNOME menu (that comes
left from the foot icon), and couldn't find a window list of any
sort. Presumably I could reconfigure Enlightenment to fix this,
but if that's the normal configuration out of the box, it seems
less than ideal.
Incidentally, I'd installed GNOME with the `install-gnome' script that came on the RedHat 5.2 CD, so perhaps this is a
problem with bits of the RH5.2 GNOME configuration being left
over. I didn't look into it very thoroughly.
KDE problems
Can't get KDE panel (kpanel) to lower
This is really an issue for the KDE developers rather than for
you, but under the version of KDE I got from
kde.org for RedHat 5.2, there was an option to allow other winders
to be raised above the KDE panel. That option seems to be missing
from the version of KDE shipped with RedHat 6.1, so there's no
way to prevent the panel staying on top. I find that extremely
frustrating; often I want a window to fill the screen completely
(e.g. for a presentation or for doing graphics editing), and
with the current version of KDE I can't do that. (I'm probably
just going to figure out how to stop the panel from starting when
I log in, and not use it.)
Can't easily reconfigure panel applications
Also not your fault but also slightly annoying: An ordinary user
can't reconfigure the default icons on the panel. I tried to
add
-ls to the
kvt invocation, but the change the the command to execute was silently
dropped. That sort of makes sense, since maybe the panel was
using a site-wide file to describe that application, and I was
able to accomplish the same thing by defining my own personal
application that called
kvt, but (1) silently failing to make the change is confusing - it
would be better to pop up an error message, and (2) creating a
new personal application and adding it to the panel is a lot of
work just to make a tiny change in how an application is invoked,
and a steep learning curve for somebody who's new to KDE.
After I figured this all out, I remembered that it had been the
same way under RedHat 5.2. (It had just been so long since
I'd reconfigured the panel that I'd forgotten I wasn't using the
system-wide
kvt icon.)
/etc/sysconfig/desktop not created
Some of the scripts refer to
/etc/sysconfig/desktop, but that file was not created by the upgrade. I managed to
figure out that it was supposed to have `preferred=environment' in it by poking at a couple of scripts. I don't know if there's
anything else that ought to be in there.
prefdm
By the way,
/etc/X11/prefdm checks for the preferred environment by simply grepping for
GNOME,
KDE, or
AnotherLevel. That's not robust in the face of a file that reads something
like
# I gave up on KDE, so let's try AnotherLevel
preferred=AnotherLevel
Installer assumed mouse was right-handed
This is a very minor thing, but it might be easy to fix. My
mouse is left-handed, but the installer required me to use the
leftmost button as button 1. I don't know much about GTK, but
if it's easy to tell the widgets to treat
any button press identically to button 1, that might be a useful
thing.
Installer doesn't look at existing lilo.conf
I had needed `append=linear' in my
lilo.conf. It might be nice if the installer parsed the existing
lilo.conf during an upgrade and set defaults appropriately, or at least
gave the user the option of displaying it for reference when creating
the new one. (Not a big deal, just an idea.)
Missing manual pages
EXMH-related
The EXMH documentation is divided into multiple man pages, including
exmh(1),
exmh-use(1),
exmh-ref(1), and
exmh-custom(1). Only the man page for
exmh(1) is included in the RedHat 6.1 EXMH rpm (exmh-2.0.3-2); the others are missing. (That was the case under RedHat
5.2 also.)
clock(8)
The command `man -k clock' includes in its output
clock (8) - query and set the hardware clock (RTC)
and
setclock(8) also refers to
clock(8), but `man 8 clock' says `No entry for clock in section 8 of the manual', and sure enough, there's no
clock.8 under
/usr/man.
rhosts and hosts.equiv
There are no man pages for
rhosts(5) or
hosts.equiv(5).