Discussion:
[dm-devel] Re: 2.6.24-rc4-mm1
Andrew Morton
2007-12-06 12:04:20 UTC
Permalink
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
Something in here broke LVM support - an initrd that has worked fine for
quite some time suddenly couldn't mount /dev/VolGroup00/root so we get the
infamous "Kernel panic - not syncing: Attempted to kill init!" when we
fall off the end of the initrd and haven't pivoted to the real disk.
[ 81.202310] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 81.214466] sd 0:0:0:0: [sda] Write Protect is off
[ 81.226467] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 81.238436] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 81.250780] sda: sda1 sda2
[ 75.396119] sd 0:0:0:0: [sda] Attached SCSI disk
but then the lvm command says it can't find the volume group VolGroup00 (which
is actually sda2 - sda1 is a small /boot partition, rest of disk is LVM).
A quick look at the rc4-mm1 announcement doesn't have any obviously tempting
patch names to start at, so it looks like it's time to play mm-bisect. It may
take me a day or two, as I have some time management issues this week...
OK, thanks.

First step would be to eliminate rewrite-rd.patch: maybe the ramdisk driver
in which that initrd resides is bust.

After that, agk-dm-dm-*.patch are of course the ones to look at.

Please keep dm-***@redhat.com cc'ed.
V***@vt.edu
2007-12-06 19:18:08 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
Something in here broke LVM support - an initrd that has worked fine for
quite some time suddenly couldn't mount /dev/VolGroup00/root so we get the
infamous "Kernel panic - not syncing: Attempted to kill init!" when we
fall off the end of the initrd and haven't pivoted to the real disk.
[ 81.202310] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 81.214466] sd 0:0:0:0: [sda] Write Protect is off
[ 81.226467] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 81.238436] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 81.250780] sda: sda1 sda2
[ 75.396119] sd 0:0:0:0: [sda] Attached SCSI disk
but then the lvm command says it can't find the volume group VolGroup00 (which
is actually sda2 - sda1 is a small /boot partition, rest of disk is LVM).
A quick look at the rc4-mm1 announcement doesn't have any obviously tempting
patch names to start at, so it looks like it's time to play mm-bisect. It may
take me a day or two, as I have some time management issues this week...
OK, thanks.
First step would be to eliminate rewrite-rd.patch: maybe the ramdisk driver
in which that initrd resides is bust.
After that, agk-dm-dm-*.patch are of course the ones to look at.
How did I not notice them? Yeah, those guys would be on the suspicious list...
I've gotten it down to about 128 patches, but it's interesting what ended
up bracketed by GOOD/BAD:

powerpc-invalid-size-for-swapper_pg_dir-with-config_pte_64bit=y.patch GOOD
#GREGKH-DRIVER-START
gregkh-driver-nozomi.patch
gregkh-moby-patch-tree....
unbork-gregkh-driver-kset-convert-sys-devices-to-use-kset_create-vioc.patch BAD

Would I be remiss in hypothesising that something in gregkh-driver-kobject-*
changed something, and now we need a agk-dm-dm-kobject-fixupage.patch?

The actual bug is probably elsewhere, but it *manifests* due to gregkh-driver
tree. Will probably be tomorrow before I get it down further...
Greg KH
2007-12-06 19:38:43 UTC
Permalink
Post by V***@vt.edu
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
Something in here broke LVM support - an initrd that has worked fine for
quite some time suddenly couldn't mount /dev/VolGroup00/root so we get the
infamous "Kernel panic - not syncing: Attempted to kill init!" when we
fall off the end of the initrd and haven't pivoted to the real disk.
[ 81.202310] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 81.214466] sd 0:0:0:0: [sda] Write Protect is off
[ 81.226467] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 81.238436] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 81.250780] sda: sda1 sda2
[ 75.396119] sd 0:0:0:0: [sda] Attached SCSI disk
but then the lvm command says it can't find the volume group VolGroup00 (which
is actually sda2 - sda1 is a small /boot partition, rest of disk is LVM).
A quick look at the rc4-mm1 announcement doesn't have any obviously tempting
patch names to start at, so it looks like it's time to play mm-bisect. It may
take me a day or two, as I have some time management issues this week...
OK, thanks.
First step would be to eliminate rewrite-rd.patch: maybe the ramdisk driver
in which that initrd resides is bust.
After that, agk-dm-dm-*.patch are of course the ones to look at.
How did I not notice them? Yeah, those guys would be on the suspicious list...
I've gotten it down to about 128 patches, but it's interesting what ended
powerpc-invalid-size-for-swapper_pg_dir-with-config_pte_64bit=y.patch GOOD
#GREGKH-DRIVER-START
gregkh-driver-nozomi.patch
gregkh-moby-patch-tree....
unbork-gregkh-driver-kset-convert-sys-devices-to-use-kset_create-vioc.patch BAD
Would I be remiss in hypothesising that something in gregkh-driver-kobject-*
changed something, and now we need a agk-dm-dm-kobject-fixupage.patch?
I don't know, it all depends on what is in the dm patches. Hopefully
everything that I have changed will manifest with a build breakage to
obviously detect that something needs to be fixed up.

But I've been known to mess things up that I didn't intend to :)

If there's anything that I can do to test this, please let me know.

thanks,

greg k-h
V***@vt.edu
2007-12-06 20:04:25 UTC
Permalink
Post by Greg KH
Post by V***@vt.edu
Would I be remiss in hypothesising that something in gregkh-driver-kobject-*
changed something, and now we need a agk-dm-dm-kobject-fixupage.patch?
I don't know, it all depends on what is in the dm patches. Hopefully
everything that I have changed will manifest with a build breakage to
obviously detect that something needs to be fixed up.
But I've been known to mess things up that I didn't intend to :)
Given that it *didn't* totally break the build, it's likely a fencepost error
or some similar issue...
Post by Greg KH
If there's anything that I can do to test this, please let me know.
I wanted to give a heads-up, in case there was a D'Oh! patch hiding. At worst,
I just need another 6 or 7 bisects to figure out which of those 120-ish patches
is the culprit. With luck I'll end up stopped on a patch that in retrospect
was obviously busticated. If not, we'll have to apply the usual more drastic
measures. If you don't have a box that's already demonstrating it, and you
don't have any obvious candidates, it's likely that the most productive
use of everybody's time is for you to chase down any other kobject issues
while I bisect the problem down further...
Kay Sievers
2007-12-06 22:04:12 UTC
Permalink
Post by Greg KH
Post by V***@vt.edu
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
Something in here broke LVM support - an initrd that has worked fine for
quite some time suddenly couldn't mount /dev/VolGroup00/root so we get the
infamous "Kernel panic - not syncing: Attempted to kill init!" when we
fall off the end of the initrd and haven't pivoted to the real disk.
[ 81.202310] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 81.214466] sd 0:0:0:0: [sda] Write Protect is off
[ 81.226467] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 81.238436] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 81.250780] sda: sda1 sda2
[ 75.396119] sd 0:0:0:0: [sda] Attached SCSI disk
but then the lvm command says it can't find the volume group VolGroup00 (which
is actually sda2 - sda1 is a small /boot partition, rest of disk is LVM).
A quick look at the rc4-mm1 announcement doesn't have any obviously tempting
patch names to start at, so it looks like it's time to play mm-bisect. It may
take me a day or two, as I have some time management issues this week...
OK, thanks.
First step would be to eliminate rewrite-rd.patch: maybe the ramdisk driver
in which that initrd resides is bust.
After that, agk-dm-dm-*.patch are of course the ones to look at.
How did I not notice them? Yeah, those guys would be on the suspicious list...
I've gotten it down to about 128 patches, but it's interesting what ended
powerpc-invalid-size-for-swapper_pg_dir-with-config_pte_64bit=y.patch GOOD
#GREGKH-DRIVER-START
gregkh-driver-nozomi.patch
gregkh-moby-patch-tree....
unbork-gregkh-driver-kset-convert-sys-devices-to-use-kset_create-vioc.patch BAD
Would I be remiss in hypothesising that something in gregkh-driver-kobject-*
changed something, and now we need a agk-dm-dm-kobject-fixupage.patch?
I don't know, it all depends on what is in the dm patches. Hopefully
everything that I have changed will manifest with a build breakage to
obviously detect that something needs to be fixed up.
But I've been known to mess things up that I didn't intend to :)
If there's anything that I can do to test this, please let me know.
What's the value of SYSFS_DEPRECATED? Care to set it to yes, if it isn't,
and try again?

A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.

Kay
Alasdair G Kergon
2007-12-06 22:12:20 UTC
Permalink
Post by Kay Sievers
A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.
I released it yesterday:-)

Alasdair
--
***@redhat.com
V***@vt.edu
2007-12-06 23:12:30 UTC
Permalink
Post by Kay Sievers
What's the value of SYSFS_DEPRECATED? Care to set it to yes, if it isn't,
and try again?
I *knew* there was a D'Oh! error in here. ;)

Bisection is fast closing in on gregkh-driver-block-device.patch, which broke
my LVM almost the exact same way the *last* time it showed up in -mm ;)
Post by Kay Sievers
A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.
I seem to recall it was 'nash' rather than LVM that had the indigestion the
last time around.
Kay Sievers
2007-12-06 23:24:04 UTC
Permalink
Post by V***@vt.edu
Post by Kay Sievers
What's the value of SYSFS_DEPRECATED? Care to set it to yes, if it isn't,
and try again?
I *knew* there was a D'Oh! error in here. ;)
Bisection is fast closing in on gregkh-driver-block-device.patch, which broke
my LVM almost the exact same way the *last* time it showed up in -mm ;)
Oh, it must not, if SYSFS_DEPRECATED=y is set. I hope we fixed all
issues. Please let us know if it does not work, then we will need to
look into it.
Post by V***@vt.edu
Post by Kay Sievers
A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.
I seem to recall it was 'nash' rather than LVM that had the indigestion the
last time around.
I think that a recent nash should work, even with SYSFS_DEPRECATED=n.
Anyway, nothing should change when SYSFS_DEPRECATED set, nash works fine
here, with that.

Kay
V***@vt.edu
2007-12-07 18:20:30 UTC
Permalink
Post by Kay Sievers
Post by V***@vt.edu
Post by Kay Sievers
What's the value of SYSFS_DEPRECATED? Care to set it to yes, if it isn't,
and try again?
I *knew* there was a D'Oh! error in here. ;)
Bisection is fast closing in on gregkh-driver-block-device.patch, which broke
my LVM almost the exact same way the *last* time it showed up in -mm ;)
Oh, it must not, if SYSFS_DEPRECATED=y is set. I hope we fixed all
issues. Please let us know if it does not work, then we will need to
look into it.
I changed SYSFS_DEPRECATED to y, and it was able to boot with the same old
initrd I've been using for a while.

Note that I had it set to 'n' for at least the last 4-5 -mm kernels, so it
*was* working fine without it..
Post by Kay Sievers
Post by V***@vt.edu
Post by Kay Sievers
A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.
I seem to recall it was 'nash' rather than LVM that had the indigestion the
last time around.
I think that a recent nash should work, even with SYSFS_DEPRECATED=n.
Anyway, nothing should change when SYSFS_DEPRECATED set, nash works fine
here, with that.
It was working fine with =n here until -rc4-mm1 as well, that's why it's a bit
of a surprise. What got added to the 'deprecated' list in this iteration?

Now for the truly odd part - I just tried with a rebuilt initrd that included
the lvm.static from last night's Rawhide (lvm2-2.02.29-1.fc9). And that didn't
work any better.

So to summarize: (old lvm == 2.02.24)

release SYSFS_DEPRECATED lvm2 works
-rc3-mm2 N old yes
-rc4-mm1 N old no
-rc4-mm1 Y old yes
-rc4-mm1 N new no

(I'm sure looking at that, everybody is now going 'WTF??!?' ;)

gregkh-driver-driver-core-fix-class-glue-dir-cleanup-logic.patch and
gregkh-driver-block-device.patch are the only patches left in the (very small)
bisect window that reference SYSFS_DEPRECATED at all (according to grep)

Anybody got any brilliant ideas? :)
Kay Sievers
2007-12-07 18:44:29 UTC
Permalink
Post by V***@vt.edu
Post by Kay Sievers
Post by V***@vt.edu
Post by Kay Sievers
What's the value of SYSFS_DEPRECATED? Care to set it to yes, if it isn't,
and try again?
I *knew* there was a D'Oh! error in here. ;)
Bisection is fast closing in on gregkh-driver-block-device.patch, which broke
my LVM almost the exact same way the *last* time it showed up in -mm ;)
Oh, it must not, if SYSFS_DEPRECATED=y is set. I hope we fixed all
issues. Please let us know if it does not work, then we will need to
look into it.
I changed SYSFS_DEPRECATED to y, and it was able to boot with the same old
initrd I've been using for a while.
Great!
Post by V***@vt.edu
Note that I had it set to 'n' for at least the last 4-5 -mm kernels, so it
*was* working fine without it..
Yeah, but the raw block kobjects got converted to devices, which are
symlinks with SYSFS_DEPRECATED=n.
Post by V***@vt.edu
Post by Kay Sievers
Post by V***@vt.edu
Post by Kay Sievers
A fix for LVM to handle symlinks instead of directories is in the LVM
CVS tree, but there wasn't a release since August.
I seem to recall it was 'nash' rather than LVM that had the indigestion the
last time around.
I think that a recent nash should work, even with SYSFS_DEPRECATED=n.
Anyway, nothing should change when SYSFS_DEPRECATED set, nash works fine
here, with that.
It was working fine with =n here until -rc4-mm1 as well, that's why it's a bit
of a surprise. What got added to the 'deprecated' list in this iteration?
Block devices got integrated in the driver model.
Post by V***@vt.edu
Now for the truly odd part - I just tried with a rebuilt initrd that included
the lvm.static from last night's Rawhide (lvm2-2.02.29-1.fc9). And that didn't
work any better.
So to summarize: (old lvm == 2.02.24)
release SYSFS_DEPRECATED lvm2 works
-rc3-mm2 N old yes
-rc4-mm1 N old no
-rc4-mm1 Y old yes
-rc4-mm1 N new no
(I'm sure looking at that, everybody is now going 'WTF??!?' ;)
gregkh-driver-driver-core-fix-class-glue-dir-cleanup-logic.patch and
gregkh-driver-block-device.patch are the only patches left in the (very small)
bisect window that reference SYSFS_DEPRECATED at all (according to grep)
Anybody got any brilliant ideas? :)
I guess it's nash again, which version is it?

You probably need to wait for Red Hat to catch up, and don't disable
SYSFS_DEPRECATED for now, they don't support that.

Kay
V***@vt.edu
2007-12-07 20:28:23 UTC
Permalink
Post by Kay Sievers
Post by V***@vt.edu
Anybody got any brilliant ideas? :)
I guess it's nash again, which version is it?
Confirmed - nash again. 6.0.9 does not work, upgrading to 6.0.19 works.

init/Kconfig says this for SYSFS_DEPRECATED (which is where I got lead astray,
as most of my laptop is Fedora Rawhide and therefor "released this week"):

If you are using a distro that was released in 2006 or later,
it should be safe to say N here.

nash 6.0.9 came out in April 2007, and I suspect, but can't prove, that the
relevant change was this one:

* Mon Aug 27 2007 Peter Jones <***@redhat.com> - 6.0.12-2
- Fix segfault in scsi vpd probe code
- Fix block device creation

That was just over 3 months ago. We probably need to fix that help text,
but I admit not being sure what guidance we should give now....
Kay Sievers
2007-12-07 20:49:41 UTC
Permalink
Post by V***@vt.edu
Post by Kay Sievers
Post by V***@vt.edu
Anybody got any brilliant ideas? :)
I guess it's nash again, which version is it?
Confirmed - nash again. 6.0.9 does not work, upgrading to 6.0.19 works.
Oh, cool!

I expected 6.0.9 to contain the fix though:
http://lkml.org/lkml/2007/6/7/209
Post by V***@vt.edu
init/Kconfig says this for SYSFS_DEPRECATED (which is where I got lead astray,
If you are using a distro that was released in 2006 or later,
it should be safe to say N here.
nash 6.0.9 came out in April 2007, and I suspect, but can't prove, that the
- Fix segfault in scsi vpd probe code
- Fix block device creation
That was just over 3 months ago.
Yeah right, even if it is fixed earlier, it's at least 6 months.
Post by V***@vt.edu
We probably need to fix that help text,
but I admit not being sure what guidance we should give now....
Right, the help text needs to be changed. There are distros like Red Hat
which don't support !DEPRECATED at all today, so we should probably just
remove the date and add that to the help text.

On the other hand, we are currently considering making the DEPRECATED
behavior a boot-time option, so it can be changed on the kernel command
line instead of a compile option. There would only be a compile-time
default to set.
We will get to that after the current kobject work (~100 patches) in
Greg's tree is finished.

Thanks for your help and patience,
Kay

Loading...