Results 1 to 14 of 14

Thread: GPU Failing?

  1. #1
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default GPU Failing?

    Specs:
    i5 3570k (stock speeds atm)
    Xigmatek Dark Knight CPU Cooler
    ASUS P8Z77V-Pro
    MSI GTX 780 Lighting
    Antec 750W HCG
    Intel 256 GB SSD (Games)
    Samsung 128 GB SSD (OS)
    Seagate 3TB HDD (Media)



    Problem:
    Hard restarts during games/benchmarks. Computer shuts off like the power cord has been pulled and restarts a few seconds later (Event Log: Kernal Power 41(63)). It first started happening intermittently a few months ago (restart every couple of days) and has increased to the point where i can no longer run any game or benchmark without a restart. It only lasts 3-5 mins now before restarting. I have performed all sorts of tests trying to determine the cause which most likely would seem to be a heat or power supply issue. Multiple passes on memtest came back clean. Ran prime95 and intelburntest for extended periods just fine. Ruling out memory and CPU (which is at default clock speed for the moment) leaves me to the GPU and PSU as the likely sources of the error. This is where I came across some interesting. As I was running furmark to stress test the GPU i was getting fairly consistent restarts a couple of mins though. However on one run i decided to manually run the GPU fans at 100% for the entirety of the test and what do you know it makes it though the whole test for the first time in days. This made me rule out the PSU as the culprit because the system made it though the whole test just fine. So I started testing as i slowly decreased the fan % down each test. Basically it crashed everytime the fans were significantly lower than 100%. So it must be overheating right? No, well maybe. Obviously I was keeping an eye on the GPU temps as I was running the benchmarks. Most of the restarts are happening when the card is 60-72C which from my understanding is pretty normal GPU temps. With the fans at 100% the GPU stays right around 56C max. I dont think it is drivers as this has been happening though multiple driver versions. It seems like it must be something to do with the GPU, but i just dont buy that it is overheating. It doesnt throttle or drop fps before it happens it just goes down hard. I have thoughly cleaned the heatsinks and whatnot with compressed air.


    I have put it a online ticket with MSI to troubleshoot the card, possibly RMA.

    Any other ideas?

    Thanks in advance for any consideration and help.

  2. #2
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Forgot to mention I am running Widnows 8.1 and am currently on Nvidia driver 355.82.

  3. #3
    "Oh great, here comes Captain Dipshit in a LAV" - Pyle986 AOD Member AOD_Grady666's Avatar
    Rank
    Specialist
    Division
    Battlefield
    Status
    Active
    Join Date
    Apr 2015
    Location
    US
    Age
    22
    Posts
    1,418

    Icon2

    Quote Originally Posted by AOD_Ziggy0511 View Post
    Specs:
    i5 3570k (stock speeds atm)
    Xigmatek Dark Knight CPU Cooler
    ASUS P8Z77V-Pro
    MSI GTX 780 Lighting
    Antec 750W HCG
    Intel 256 GB SSD (Games)
    Samsung 128 GB SSD (OS)
    Seagate 3TB HDD (Media)



    Problem:
    Hard restarts during games/benchmarks. Computer shuts off like the power cord has been pulled and restarts a few seconds later (Event Log: Kernal Power 41(63)). It first started happening intermittently a few months ago (restart every couple of days) and has increased to the point where i can no longer run any game or benchmark without a restart. It only lasts 3-5 mins now before restarting. I have performed all sorts of tests trying to determine the cause which most likely would seem to be a heat or power supply issue. Multiple passes on memtest came back clean. Ran prime95 and intelburntest for extended periods just fine. Ruling out memory and CPU (which is at default clock speed for the moment) leaves me to the GPU and PSU as the likely sources of the error. This is where I came across some interesting. As I was running furmark to stress test the GPU i was getting fairly consistent restarts a couple of mins though. However on one run i decided to manually run the GPU fans at 100% for the entirety of the test and what do you know it makes it though the whole test for the first time in days. This made me rule out the PSU as the culprit because the system made it though the whole test just fine. So I started testing as i slowly decreased the fan % down each test. Basically it crashed everytime the fans were significantly lower than 100%. So it must be overheating right? No, well maybe. Obviously I was keeping an eye on the GPU temps as I was running the benchmarks. Most of the restarts are happening when the card is 60-72C which from my understanding is pretty normal GPU temps. With the fans at 100% the GPU stays right around 56C max. I dont think it is drivers as this has been happening though multiple driver versions. It seems like it must be something to do with the GPU, but i just dont buy that it is overheating. It doesnt throttle or drop fps before it happens it just goes down hard. I have thoughly cleaned the heatsinks and whatnot with compressed air.


    I have put it a online ticket with MSI to troubleshoot the card, possibly RMA.

    Any other ideas?

    Thanks in advance for any consideration and help.
    Could be a number of things: You have a 750W+ PSU, so there *shouldnt* be anything going on w/ the PSU Overloading and Cutting DC Power(which will instantaneously shut down the PC); Have you tried using a different keyboard/mouse and switching them out(as well as any other peripherals running off the US Bus(USB);

    Also, have you tried doing a clean install w/ Display Driver Uninstaller: http://www.guru3d.com/files-details/...-download.html
    Do the clean install option(Itll say "for installing a new graphics card"), uninstall your current drivers, save the installation file for the latest Nvidia WHQL driver on a flash-drive,Shut-down, Reseat your Graphics card, and run the installer off the Flash-drive then reboot once more- That's the most thorough way to cleanly uninstall and re-install your Graphics card drivers; It might work, might not, but its worth a try-

    Give me an update if/when you try this

  4. #4
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Tried a clean driver install. Unfortunately the PC still restarted while running furmark. It did seem to stay up for a noticeably longer amount of time though (7.5mins as opposed to 3-4, could have been due to the cold boot though). I repeated my 100% fan test and the system stayed up for the whole 10 minute run again. Temps were about the same. The restart occurred at 72C. The 100% run floated between 57-58C once it got going.

    I haven't messed with any of my peripherals yet aside from unplugging my 360 controller. All that is connected is keyboard, mouse, headphones, speakers, and a mic. The only other keyboard and mouse i have is a wireless combo that runs off a USB adapter deelibop. Would it be better to plug my current keyboard/mouse into other USB ports? or should I setup the wireless ones and see if anything changes that would require installing the wireless software as well. Regardless, I dont think that they are the problem though it cant hurt to test it out.

  5. #5
    I get enough exercise just pushing my luck 13uckFUtter's Avatar
    Rank
    Forum Member
    Division
    None
    Status
    Active
    Join Date
    Feb 2012
    Location
    Rockford
    Age
    28
    Posts
    339

    Default

    780 lightning shouldn't being dieing that fast, but it's always possible.

  6. #6
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Quote Originally Posted by AOD_13uckFUtter View Post
    780 lightning shouldn't being dieing that fast, but it's always possible.
    Yea I purchased the card new in Jan 2014, luckily it is still under warranty though (2 years). I am considering a clean OS install. Would that effectively rule out any driver or software issues?

  7. #7
    I get enough exercise just pushing my luck Marrv's Avatar
    Rank
    Forum Member
    Division
    None
    Status
    Active
    Join Date
    May 2015
    Age
    33
    Posts
    336

    Default

    Also worth considering Mobo issue, not that they are easy to diagnoise (aka rule everything else out - by putting known good parts in & testing, which is a pain to most people who do not have spare parts lying around....), but when it has issues it can cause all sorts of mayhem including what your having

    Just to be troublesome - while you have tested that your PSU can supply the power to the components adequately it may not be providing steady power (it could be spiking, which causes issues, if you know about OC-ing before all the modern tools you did it by adjusting voltages to the ram modules, if the psu is not supplying correct voltages it will cause a hard crash. This is very rarely the case these days as most psu's have gotten better but some still get through).

    Also have you disabled automatic restarts to see if the is a full error report created/any additional information (judging by what you done above I assume you have, but best be sure)

  8. #8
    You are depriving some poor village of its idiot AOD Member AOD_Timmee45's Avatar
    Rank
    Specialist
    Division
    Battlefield
    Status
    Active
    Join Date
    Jul 2011
    Location
    Pittsburgh, PA
    Posts
    726

    Default

    Run something like Hardware Monitor from CPUZ which will pull motherboard temps and supplied voltages also and you can see if you're getting PSU voltage dips when it reboots during the furmark test. It will also tell you if your board is getting hot and causing it, and being circumvented by the supplied surrounding air pressure created by the increased GPU fan speed.

  9. #9
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Quote Originally Posted by AOD_Marrv View Post
    Also worth considering Mobo issue, not that they are easy to diagnoise (aka rule everything else out - by putting known good parts in & testing, which is a pain to most people who do not have spare parts lying around....), but when it has issues it can cause all sorts of mayhem including what your having

    Just to be troublesome - while you have tested that your PSU can supply the power to the components adequately it may not be providing steady power (it could be spiking, which causes issues, if you know about OC-ing before all the modern tools you did it by adjusting voltages to the ram modules, if the psu is not supplying correct voltages it will cause a hard crash. This is very rarely the case these days as most psu's have gotten better but some still get through).

    Also have you disabled automatic restarts to see if the is a full error report created/any additional information (judging by what you done above I assume you have, but best be sure)
    Quote Originally Posted by AOD_Timmee45 View Post
    Run something like Hardware Monitor from CPUZ which will pull motherboard temps and supplied voltages also and you can see if you're getting PSU voltage dips when it reboots during the furmark test. It will also tell you if your board is getting hot and causing it, and being circumvented by the supplied surrounding air pressure created by the increased GPU fan speed.
    I had considered the motherboard as the possible source of my pain. I just couldnt think of a good way to test it, I had been keeping an eye on Mobo temps with HWMonitor throughout most of my testing. I woke up this morning and it occurred to me to try running the GPU on a different PCIex16 lane. 3x 10min runs on furmark and no restarts. GPU temp was floating around 74 for pretty much the whole second and third runs. So it seems to be a motherboard problem now. The 100% fans on the GPU must have been mitigating whatever motherboard component that was having the issue. I am going to try some BF4 see how that goes. Assuming that works out I will reapply my CPU overclock and test with furmark then real games again. But it seems to be solved. I will report back if anything comes up.

  10. #10
    You are depriving some poor village of its idiot AOD Member AOD_Timmee45's Avatar
    Rank
    Specialist
    Division
    Battlefield
    Status
    Active
    Join Date
    Jul 2011
    Location
    Pittsburgh, PA
    Posts
    726

    Default

    You can also test a BIOS update to see if there was a possible known issue with the top (blue) PCIe slot that got resolved with an update.

  11. #11
    I get enough exercise just pushing my luck Marrv's Avatar
    Rank
    Forum Member
    Division
    None
    Status
    Active
    Join Date
    May 2015
    Age
    33
    Posts
    336

    Default

    Quote Originally Posted by AOD_Timmee45 View Post
    You can also test a BIOS update to see if there was a possible known issue with the top (blue) PCIe slot that got resolved with an update.
    The are a lot of BIOS updates for his mobo https://www.asus.com/Motherboards/P8...Desk_Download/

  12. #12
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Quote Originally Posted by AOD_Timmee45 View Post
    You can also test a BIOS update to see if there was a possible known issue with the top (blue) PCIe slot that got resolved with an update.
    Quote Originally Posted by AOD_Marrv View Post
    The are a lot of BIOS updates for his mobo https://www.asus.com/Motherboards/P8...Desk_Download/

    I forgot to mention that I did update the motherboard BIOS prior to enlisting the help of AOD troubleshooting my rig. I was on the original drivers from when I did the build but updated to the latest when I was troubleshooting on my own. Performance and temps seem to be fine, so it looks like im in the clear. I am probably going to try and sit on replacing the board until its time to get a new CPU. Hopefully she can hold out like this for a year or two.

  13. #13
    Boycott shampoo! Demand the REAL poo! AOD Member AOD_Ziggy0511's Avatar
    Rank
    Private First Class
    Division
    Battlefield
    Status
    Active
    Join Date
    Aug 2014
    Age
    26
    Posts
    194

    Default

    Also wanna give you guys a round of applause for all your help. I really do appreciate it.

  14. #14
    If I'm not back in 5....wait longer! AOD Member AOD_Jayson201's Avatar
    Rank
    Private
    Division
    Battlefield
    Status
    Active
    Join Date
    Jun 2015
    Age
    21
    Posts
    95

    Default

    Question, how is the wiring in your house?
    I had issues with my rig where the PSU made noises and random shutdowns, when I thought it was the power supply, It was really the old 50's wiring in my house. Switched to a modern outlet, bam, no noises, no shut downs, no problem. Turns out it was a ground fault.
    Get yourself a little 5$ outlet tester from walmart, it'll tell you if your outlets are okay. My theory is best case, your rig is fine and your house is supplying wacky power, and when you turn the gpu up that extra power trips something, somewhere, and maybe upping the fan speed balances out the iffy power imbalance.
    I'm not a professional, nor an electrician. But if bad power can make my power supply make some weird ass grinding sounds, then it can probably do worse.

    Not sure if that even is a best case, i got lucky and had good outlets in the same room. Might not be the same for everyone.


 

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
vBulletin Skin By: ForumThemes.com
Top