Ordered Chaotic Discussions

Several BSODs in the last few days

troubleshooting
windows

#1

Windows 10 Pro (1803)
CPU: Ryzen 7 1700 (happens with both OC and stock)
RAM: G.Skill Flare X DDR4-3200 (happens with both OC and stock)
Motherboard: Asus Crosshair VII Hero Wi-Fi
GPU: MSI RX 480 Gaming X 8G (stock)
SSD: WD Black 256GB NVMe
HDD: 2x WD Red 8TB (white label) / 1x HGST 7k3000 / 1x HGST 7k4000
PSU: Seasonic Prime Platinum 850W

BIOS is the latest version.
All drivers are up to date.
I lost count how many times I un- and re-installed the GPU driver (trying different versions / 18.12.X and 18.10.1, doesn’t make a difference).
Did it in Safe Mode and using DDU (latest version) and AMD CleanupUtility.

It started with a BSOD IRQL_NOT_LESS_OR_EQUAL a few days ago.
Then a few MEMORY_MANAGEMENT, atikmdag.sys, dxgmms2.sys and SYSTEM_THREAD_EXCEPTION_NOT_HANDLED.

This happens in World of Warcraft after ~15-30min. On one occasion when just Firefox, Twitch App and Discord were open.

I disabled hardware acceleration in FF and Twitch App as the most recent change. So far no BSODs when using FF and Twitch App simultaneously.

DISM, SFC Scan, Windows Memory Diagnostic didn’t show any errors/solved the issue.

I’m at my wits end as nothing seems to work.

Minidump files:
https://drive.google.com/open?id=1vErnTaTsrlcwyDPR5M_yGlzR3OjVTwrN


#2

Just had another BSOD (MEMORY_MANAGEMENT).

Minidump file:
https://drive.google.com/open?id=1Oa5YiJszcWOjzwPiesZdkXHAV-B-KrHo

memory.dmp file:
https://drive.google.com/open?id=1AML0nYXmkaIW_AnjFd9i5839jIHruoDN

At the time it occured I had FF, Twitch App, Discord App and Deluge running.
OC: CPU - 3.85GHz, 1.3V, RAM - DDR4-3200 C14-14-14-28. 1.4V


#3

Just found a way to trigger the BSODs (3 kinds so far) by running a scan with Malwarebytes. It always hangs for ~1sec and then immediately shows the BSOD at “Scanning File System”.
The 3 BSODs I was able to trigger that way (kinda hoping to see which file would cause it, no real success though. one of them was D3DX11 related) were:

  • SYSTEM_SERVICE_EXCEPTION_FAILURE (cause: ntfs.sys)
  • BAD_POOL_HEADER (the scan froze at D3DX11, I think)
  • KERNEL_SECURITY_CHECK_FAILURE (followed by Windows not loading correctly and sending me to the repair screen).

UPDATE: so much for that. Uninstalled Malwarebytes and installed the newest version -> no more BSODs while scanning. Makes it to the end with no threats detected.

nvm, just started a complete scan and had a BSOD again (KERNEL_SECURITY_CHECK_FAILURE).

Update: think I found the culprit:
vlcsnap-2018-12-30-02h47m12s548

Update: memtest86 showed that 1 stick would instantly produce errors (50k+ in under a minute), while the other stick would pass just fine. I didn’t test them for hours, just 1 test pass each. But I did try them in all 4 ram slots individually. Should be safe to say that this kit is also defective.
I’m wondering, though, what caused it. Passing 15h/18 passes while OCed but then suddenly one stick dies? Weird.


(The Lazy) #4

Something is very fucky atm. These checks should almost never fail unless something is screwing with them. Im guessing either the IMC on the CPU is dying or your ram is dead. What bios settings have you adjusted regarding stuff like LLC (Load line calibration) and SoC voltage?


#5

Stock settings (auto).
When I OCed the CPU I only changed the multiplier and core voltage; 3.85GHz @1.3V and 3.9GHz @1.35V but I wasn’t comfortable with the latter as it led to Prime95 freezing (both on the old Gigabyte board and the C7H now). But when the issues started I set the BIOS back to default settings

Stick 1 tested okay in every RAM slot. Stick 2 produced errors right at the first test in Memtest86 in all 4 3 RAM slots (after the test failed in 3 of 4 RAM slots I didn’t bother testing the last one). PC has been running stable with only stick 1 installed.
So…bad RAM or is the IMC still a possibility?


#6

Or - asked in a different way - how can I tell if it’s the IMC or RAM?


#7

Do you still have the old ram kits? If it’s just this ram that’s the easier way to tell. If it’s on more than one kit of ram… Either you are extremely unlucky as far as ram goes or it’s more than just the ram.


#8

Nope.

My RAM history on this mobo:

  • Trident Z 3200MHz CL15
    That’s the one I thought was faulty as I got errors on Test 7 in Memtest. Was probably cause I only applied the DDR4-3200 XMP profile which made it unstable.
    Sent back to Amazon already.

  • Flare X 3200MHz CL14 - Kit #1
    Dead. Mobo wouldn’t even post. It prompted a 2 Q-Codes but I forgot what it was. One when 2 stick, one when 1 stick was installed. One of the codes wasn’t even specified in the manual and reserved for later.
    Sent back to Amazon already.

  • Flare X 3200MHz CL14 - Kit #2
    Current one. 1 sticks seems to cause BSODs, other one not (running since yesterday without a single BSOD, just 1 DX12 crash, no issues with DX11).
    Currently 1 of 2 sticks in use. Going back to Amazon soon.

  • Trident Z 3200MHz CL14
    Should arrive on Wednesday.


#9

also
3200mhz is very hit or miss on ryzen gen 1
very.
so there’s also that. oh and XMP is still quite broken on AMD platforms (or… it was on gen 1 and I am not sure how much 2nd gen mobos/ryzen improved it so) would not recommend using XMP.
running at 2933 with lower timings might help.


#10

2933 seems to work out of the box with XMP.
And yeah, 3200 was unstable with XMP but passed 15h/18passes of Memtest with a pre-saved OC profile in the BIOS (3200MHz CL14-14-14-28-50 1.4V).

I think I’m done with OCing for a while. Just want a stable system.

And the BSODs happened without any OC applied (CPU+RAM) too.


(The Lazy) #11

Part of it was Mobo/ryzen and part of it was ram. Is better but still not great from what i hear

If you get errors from the trident Z kit then i would highly start looking into that IMC.


#12

If that happens then I’ll return it to Amazon for sure (refund) and get a 2700X.
tbh, ever since you mentioned the IMC possibly being at fault, I don’t trust the CPU 100% anymore. I’ve got no reason for that. The system’s been running stable ever since I removed the seemingly faulty RAM stick.

Got a few errors in the Event Viewer (DistributedCOM, Kernel-Power, EventLog, Kernel-Boot) but no BSODs.DistributedCOM has been a common thing but I haven’t noticed any issues. The other 3 are from the same event, notifying me about an unexpected shutdown (which must have happened when I booted yesterday but I absolutely haven’t noticed anything. it started normal).

Out of curiosity, how does one test for that?
With currently one RAM stick working in every slot and the other one causing errors/not letting me boot, how’d I go testing the IMC as the cause? I’ve read that test 7 or 8 in Memtest are kinda heavy on the IMC, but nothing specific.