Hi,
I know this might sound highly improbable but Linux has been locking up randomly. Heres details of my problem:
Hardware: - P4 2.8C GHz 800MHz FSB Northwood processor with HT - 256MB Hynix RAM in single channel dynamic paging mode - Intel D865GBF motherboard - 120GB Samsung SATA Harddrive - 80GB Seagate PATA Harddrive - 20GB Seagate PATA Harddrive - Sony DVD writer ( new ) - Netgear WGR311v2 54Mbps wifi card - Dlink 538TX 10/100Mbps wired network card - ATI Radeon 7000 AGP card
All hardware is about 2 years old except the one mentioned as new.
Software: - Fedora Core 2 - Windows Server 2003 - Windows XP
Symptoms: - The system has been locking up randomly after the addition of the Sony DVD writer.
- One peculiar symptom that my system has been exhibiting with / without the writer is that the display doesn't come on at boot up. The monitor's LED just keeps on blinking. The keyboard doesn't respond at all and the harddisk LED just keeps on glowing as if it is accessing something. - Recently dmesg shows these suspicious entries: EXT3 FS on sda3, internal journal device-mapper: 4.3.0-ioctl (2004-09-30) initialised: dm-devel@redhat.com Adding 522104k swap on /dev/sda2. Priority:-1 extents:1 program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO .... ( 10 msgs repeatedly. No where else are these messages repeated ).
I understand the the swap part but why is it complaining of deprecated API? As I understand its seeing my SATA HD was a SCSI HD. But I've been using this harddisk for a while now and I haven't seen these messages appear - ever! :(
- I've seen only 1 kernel panic and it was related to some SCSI function. I dont have the exact message but thats all I remember!
Troubleshooting: - For the first problem, I've observed that after disconnecting the writer, the system is stable for a few hours now. But can't say with certainty if it will remain so. Will continue to test. - I've tried removing all component from the system and keeping the bare necessities i.e. just the keyboard, monitor, RAM, processor. The system still exhibits symptom No.2 though randomly. This problem has been occurring on and off. Initially I thought it might be due to loose contacts, dust etc... Yet even after cleaning the system and securing all components properly, the problem persists. So I'm inclined to think that the new hardware has nothing to do with the problem. - I've also replaced the harddisk cables just incase but to no avail. Rather the system works better / stabler with the old ones in place. - I have only tried troubleshooting on Linux. I have to yet boot into Windows and see if the same problems occur. In my opinion this can't be software specific bug which is causing the lockups. I am stumpted on problem No.1
Any inputs would be appreciated.
Bye! :)
On 11/7/06, Dinesh Joshi dinesh.a.joshi@gmail.com wrote:
Symptoms:
- The system has been locking up randomly after the addition of the Sony
DVD writer.
Put the writer back in, disable all DMA and see if system remains stable. Infact, stress-test with this config if possible.
I had a similar lock-up issue after adding an NVidia AGP card - found that my SMPS was not able to cope with all that load. But system would run stable by either: 1) disabling DMA 2) underclocking the CPU (lowered FSB)
I eventually went for a branded PowerSafe PSU.
. farazs
On Tuesday 07 November 2006 06:38, Faraz Shahbazker wrote:
Put the writer back in, disable all DMA and see if system remains stable. Infact, stress-test with this config if possible.
I had a similar lock-up issue after adding an NVidia AGP card - found that my SMPS was not able to cope with all that load. But system would run stable by either:
- disabling DMA
- underclocking the CPU (lowered FSB)
I eventually went for a branded PowerSafe PSU.
Will try that tomorrow. Right now I am trying to determine whether the system is stable without it. If it is then the drive definitely has something to do with my system locking up. But I still am clueless about my second problem. Umm...and won't disabling DMA severely impact the system's performance? I already have enabled the cpuspeed daemon so my system is already underclocked. I also dont think power is the issue because I used to run a DVD-ROM, CD writer along with the current config and never had any issues. My PSU is rated 300W. I think that should be sufficient.
On 11/7/06, Dinesh Joshi dinesh.a.joshi@gmail.com wrote:
On Tuesday 07 November 2006 06:38, Faraz Shahbazker wrote:
Put the writer back in, disable all DMA and see if system remains stable. Infact, stress-test with this config if possible.
about my second problem. Umm...and won't disabling DMA severely impact the system's performance? I already have enabled the cpuspeed daemon so my system is already underclocked. I also dont think power is the issue because I used to run a DVD-ROM, CD writer along with the current config and never had any issues. My PSU is rated 300W. I think that should be sufficient.
I meant that for debugging purposes - not as a permanent solution.
BTW: i ran my Celeron2.4 underclocked @2.2 for almost a year before I bought that new PSU - for me that was a *good* solution. My old PSU was rated @ 350W - but it was the cheap local variety - so it probably didn't deliver.
The thing with hardware problems is that they are rarely deterministic - you never know which change might trigger an old itch. More precisely, the electronics are usually deterministic but the electricals are not. And it's near impossible to find someone having the *exact* same h/w config as yours for comparison.
AFAIK, cpuspeed for desktop processors is a farce. The core internally runs at the same speed, but only does work in a fraction of the cycles and no-ops the others. This translates to no power-saving and no cooling effect - just looks cool :-). It is only meaningful for Mobile processors where the clock frequency and voltage are actually scaled down in hardware.
. farazs
On Tuesday 07 November 2006 11:03, Faraz Shahbazker wrote:
I meant that for debugging purposes - not as a permanent solution.
Well, I didn't need to do that at all. The system locked up again. After about 6-8 hours of operation. I had left the CPU case open just to ensure that its getting sufficiently cooled but it locked up. So I dont think the new drive has anything to do with this odd behavior as it was
AFAIK, cpuspeed for desktop processors is a farce. The core internally runs at the same speed, but only does work in a fraction of the cycles and no-ops the others. This translates to no power-saving and no cooling effect - just looks cool :-). It is only meaningful for Mobile processors where the clock frequency and voltage are actually scaled down in hardware.
No...my P4's CPU frequency is definitely scaled down. I can feel the difference between the frequencies as when it gets upto 2.8GHz, it generates huge quantities of heat :P
Anyway, back on topic...I booted into a different kernel ( though its still a 2.6.x ) kernel to see what happens. Do you think this could be a harddisk issue? I definitely think this is a low level failure...
On Wednesday 08 November 2006 00:23, Dinesh Joshi wrote:
On Tuesday 07 November 2006 11:03, Faraz Shahbazker wrote:
I meant that for debugging purposes - not as a permanent solution.
Well, I didn't need to do that at all. The system locked up again. After about 6-8 hours of operation. I had left the CPU case open just to ensure that its getting sufficiently cooled but it locked up. So I dont think the new drive has anything to do with this odd behavior as it was
Anyway, back on topic...I booted into a different kernel ( though its still a 2.6.x ) kernel to see what happens. Do you think this could be a harddisk issue? I definitely think this is a low level failure...
Hmm I've faced lockups due to a hard disk failure and once a bad RAM module.
Power supply can be an issue too.
On Tuesday 07 November 2006 13:52, Mrugesh Karnik wrote:
Hmm I've faced lockups due to a hard disk failure and once a bad RAM module.
Power supply can be an issue too.
I guess the RAM could be the culprit. But I am not sure. How do I find out if RAM is the problem? memtest? Umm...the harddisk is pretty new and I would expect to see a lot of read / write time outs in dmesg if the harddisk was dying or had bad sectors. Nothing like that except that SCSI message. Any idea wat it means?
Sometime on Tue, Nov 07, 2006 at 07:41:07PM +0000, Dinesh Joshi said:
On Tuesday 07 November 2006 13:52, Mrugesh Karnik wrote:
Hmm I've faced lockups due to a hard disk failure and once a bad RAM module.
Power supply can be an issue too.
I guess the RAM could be the culprit. But I am not sure. How do I find out if RAM is the problem? memtest? Umm...the harddisk is pretty new
i've seen several system freezups which later turned out to be due to bad RAM modules. If Memtest reports such, or runs for several hours then try changing your ram and see.
Anurag
On Tuesday 07 November 2006 14:16, Anurag wrote:
i've seen several system freezups which later turned out to be due to bad RAM modules. If Memtest reports such, or runs for several hours then try changing your ram and see.
alrighty. any particular parameters i should look at during the tests? How much time should the test take for a 256MB DDR400 ( PC3200 ) module?
On Wednesday 08 November 2006 01:29, Dinesh Joshi wrote:
On Tuesday 07 November 2006 14:16, Anurag wrote:
i've seen several system freezups which later turned out to be due to bad RAM modules. If Memtest reports such, or runs for several hours then try changing your ram and see.
alrighty. any particular parameters i should look at during the tests? How much time should the test take for a 256MB DDR400 ( PC3200 ) module?
Ideally, run memtest for 24 hours. It keep looping. One loop is almost never enough.
Just a FYI post.
It seems my motherboard has developed some problems. I have emailed Intel Asia support for repairing / replacing it as its still under warranty. If anybody has direct contact nos for Intel or any helpline nos that I can use please let me know as the warranty is expiring soon and so I need to get it to them ASAP.
Note, Linux wasn't responsible for the lock ups and those kernel messages can be safely ignored as they are just warnings. The RAM module is good as well. No problems with that.
Thanks everyone for your suggestions.
-- Dinesh A. Joshi
Dinesh Joshi wrote:
Just a FYI post.
It seems my motherboard has developed some problems. I have emailed Intel Asia support for repairing / replacing it as its still under warranty. If anybody has direct contact nos for Intel or any helpline nos that I can use please let me know as the warranty is expiring soon and so I need to get it to them ASAP.
Please do a proper cross check before you conclude that the mobo is the problem.
After that the procedure is as follows. Write a mail to their Bangalore sercice center. Their address is apacsupport at mailbox dot intel dot com You have to give your full details including home/office address, tel number, model, serial number, purchase details and problem in the board. They will reply to you and give you an ID for reference. After some days their courier will pick up the board from your place and return it after repairs after a week or two. Their email will be self explainatory. Once you got your ID you can call them on 1800 425 6835 to track your job status.
Regards,
Rony. Send instant messages to your online friends http://uk.messenger.yahoo.com
On Monday 13 November 2006 17:51, Rony wrote:
Please do a proper cross check before you conclude that the mobo is the problem.
I have checked it properly and as well as my vendor. We have replaced everything from the SMPS to cards, RAM, everything! We were not able to replace the processor or the motherboard as I guess they are no longer available. So right now things point to a bad motherboard and / or processor.
After that the procedure is as follows. Write a mail to their Bangalore sercice center. Their address is apacsupport at mailbox dot intel dot com You have to give your full details including home/office address, tel number, model, serial number, purchase details and problem in the board. They will reply to you and give you an ID for reference. After some days their courier will pick up the board from your place and return it after repairs after a week or two. Their email will be self explainatory. Once you got your ID you can call them on 1800 425 6835 to track your job status.
I already did that. Its been 4 days and no reply from their side. I have already given them all the details including the AA number of my board. I couldnt find the FPO number of the processor as I think I have discarded the processor's packaging. But apart from that I've given them all the problem details, troubleshooting details, everything. I tried calling them on some number given by their BKC office ( 1901... ) and I got routed to some American support who gave me a no. for India 000 6518 659....something which apparently doesnt work..! :/
Dinesh Joshi wrote:
On Monday 13 November 2006 17:51, Rony wrote:
Please do a proper cross check before you conclude that the mobo is the problem.
I have checked it properly and as well as my vendor. We have replaced everything from the SMPS to cards, RAM, everything!
Sorry to ask again but what exactly is happening in your system? Did you test it on Windows too? Please also test it using a CD ROM and different live linux CDs. Use CD ROM cables only, not the 'panvati' hdd cables. Have you replaced the cmos battery and checked?
After that the procedure is as follows.
I already did that. Its been 4 days and no reply from their side. I have already given them all the details including the AA number of my board.
If the problem happens during OS usage, maybe a report on its windows based usage will convince them better in case they are not aware of what is a linux kernel message. ;)
Regards,
Rony. Send instant messages to your online friends http://uk.messenger.yahoo.com
On Monday 13 November 2006 20:52, Rony wrote:
Sorry to ask again but what exactly is happening in your system? Did you test it on Windows too? Please also test it using a CD ROM and different live linux CDs. Use CD ROM cables only, not the 'panvati' hdd cables. Have you replaced the cmos battery and checked?
Yup...I have disconnected all cables yaar. I have used the non panvati cables!! Replaced the battery. Done everything under the sun...! :(
Dinesh Joshi wrote:
On Tuesday 07 November 2006 13:52, Mrugesh Karnik wrote:
Hmm I've faced lockups due to a hard disk failure and once a bad RAM module.
Power supply can be an issue too.
I guess the RAM could be the culprit. But I am not sure. How do I find out if RAM is the problem? memtest? Umm...the harddisk is pretty new and I would expect to see a lot of read / write time outs in dmesg if the harddisk was dying or had bad sectors. Nothing like that except that SCSI message. Any idea wat it means?
HDD comes into the picture after bootup, but your problem is on power on too. I am not sure about memtest. My system hangs within a few seconds of memtest but has been running well since 5 years. (Touch wood). Put in another RAM. Hynix has duplicates too.
Regards,
Rony.
___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html
Dinesh Joshi wrote:
Anyway, back on topic...I booted into a different kernel ( though its still a 2.6.x ) kernel to see what happens. Do you think this could be a harddisk issue? I definitely think this is a low level failure...
Try another RAM please just to clear any doubts. :)
Regards,
Rony. Send instant messages to your online friends http://uk.messenger.yahoo.com
Dinesh Joshi wrote:
Anyway, back on topic...I booted into a different kernel ( though its still a 2.6.x ) kernel to see what happens. Do you think this could be a harddisk issue? I definitely think this is a low level failure...
As Faras has suggested, try another smps too.
Regards,
Rony. Send instant messages to your online friends http://uk.messenger.yahoo.com
Dinesh Joshi wrote:
Symptoms:
- The system has been locking up randomly after the addition of the Sony
DVD writer.
In the Intel 102 GGC mobo, Sony DVD writer is not compatible, except for the very latest drives. In 865 it should work.
- One peculiar symptom that my system has been exhibiting with / without
the writer is that the display doesn't come on at boot up. The monitor's LED just keeps on blinking. The keyboard doesn't respond at all and the harddisk LED just keeps on glowing as if it is accessing something.
Put in a spare RAM and crosscheck.
Regards,
Rony. Send instant messages to your online friends http://uk.messenger.yahoo.com