Raspberry Pi 5 losing PCIe link to NVMe SSD

I have a Raspberry Pi 5 in an Argon NEO 5 M.2 case with a Crucial P3 SSD. It’s been running fine for almost two years, but occasionally the PCIe connection to the SSD would get lost. Up until recently, this wasn’t a big deal as it would reset and come back within fractions of a second:

Mar 01 09:00:07 raspberrypi kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Mar 01 09:00:07 raspberrypi kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Mar 01 09:00:07 raspberrypi kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Mar 01 09:00:07 raspberrypi kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Mar 01 09:00:07 raspberrypi kernel: nvme nvme0: 4/0/0 default/read/poll queues
Mar 01 09:00:07 raspberrypi kernel: nvme nvme0: Ignoring bogus Namespace Identifiers

Then I did both a kernel update and a rpi-eeprom-update and suddenly it would fail to reset:

Mar 01 14:05:56 personalcloud kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Mar 01 14:05:56 personalcloud kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Mar 01 14:05:56 personalcloud kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Mar 01 14:05:56 personalcloud kernel: nvme 0001:01:00.0: enabling device (0000 -> 0002)
Mar 01 14:05:56 personalcloud kernel: nvme nvme0: Disabling device after reset failure: -19

Putting the suggested nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off into /boot/firmware/cmdline.txt and rebooting did not help. A lot of time the issue happened under I/O load, so power saving seemed an unlikely culprit.

What ended up solving the issue was putting

dtparam=pciex1
dtparam=pciex1_gen=1

at the end of /boot/firmware/config.txt and rebooting. This limits the Raspberry Pi to PCIe 1.0 with its lower transfer rate. You will thus see log entries like this one:

Mar 05 19:32:25 personalcloud kernel: brcm-pcie 1000110000.pcie: link up, 2.5 GT/s PCIe x1 (!SSC)
Mar 05 19:32:25 personalcloud kernel: pci 0001:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0001:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)

The Raspberry Pi 5 is capable of PCIe 3.0. It defaults to 2.0 unless you specify pciex1_gen=3 because it lacks the certification for 3.0. I had actually tried pciex1_gen=3 in the past and found that it significantly increased the frequency of PCIe resets. But luckily, you can use the same flag to limit the device to PCIe 1.0. PCIe 1.0’s lower data rate corresponds a lower signal frequency, which reduces the requirements on the cable quality and the sensitivity to electromagnetic interference compared to PCIe 2.0 or even 3.0. The Argon’s cable that connects the M.2 riser to the Raspberry Pi’s PCIe port always looked a bit flimsy to me. How could it support the multi-GHz signal frequencies of PCIe without any extra shielding? My conclusion now is: it cannot. But that is not a big deal as most applications you could run on a Raspberry Pi won’t benefit from bandwidth higher than that of PCIe 1.0.

Leave a Reply

Your email address will not be published. Required fields are marked *