Skip to content

mimxrt: Fix corruption due to sdcard transfer alignment.#19304

Merged
projectgus merged 3 commits into
micropython:masterfrom
projectgus:bugfix/mimxrt_sdcard_dma_align
Jun 11, 2026
Merged

mimxrt: Fix corruption due to sdcard transfer alignment.#19304
projectgus merged 3 commits into
micropython:masterfrom
projectgus:bugfix/mimxrt_sdcard_dma_align

Conversation

@projectgus

@projectgus projectgus commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #19010. Follow-up to PR #19140 which fixed this bug on stm32, and this PR is to fix the same bug on mimxrt.

This PR has some overlap with #18451, I apologise that I haven't looked super closely at that PR yet.

First commit is to add a test:

  • Adapts the sdcard_dma_align.py test from stm32: Fix cache corruption on unaligned SDCard reads #19140 so it now works on mimxrt as well. This test relies on a high frequency timer interrupt where the CPU reads/writes memory throughout the SD transfer. It's a little less effective on mimxrt because the max timer frequency is only 1kHz - however it still seemed able to trigger the bug.

Then the fix:

  • Disable DMA if the buffer is not DCache aligned. This is to prevent corruption caused by CPU writes to the same cache line as the start or end of the buffer. The SDCard block API doesn't return until the operation is done, so disabling DMA shouldn't have much impact apart from a small power consumption increase.
  • Invalidate DCache for the DMA buffer region after the transfer completes. This avoids stale cache lines if the CPU has read into the cache while the transfer was in progress (including due to a speculative read). The cache is already correctly cleaned before the transfer (this happens in the NXP driver layer).

Testing

  • Tested sdcard_dma_align.py before and after the fix on TEENSY41 & MIMXRT1050. Both boards fail without the fix, and pass with the fix. (An interesting note, the first part of the fix - disabling DMA - doesn't seem to be necessary to pass the test on MIMXRT1050, even running a high number of repeats. I think this is because the interrupts only trigger at 1kHz so it's very hard to have one trigger while the start or end cache line is being read, and the driver itself disables DMA internally if a buffer isn't 4-byte aligned so we're aiming for a very small target. This change was, however, very necessary for the test to pass on the Teensy - not entirely sure what the difference is there.)
  • Did a special test run where I multiplied the REPEATS value by 30 on both boards, checked these extended ~20 minute test runs also pass with the fix applied.
  • Ran the updated test on PYBD_SF6 to ensure it still works on stm32 port (it does).

Trade-offs and Alternatives

  • As discussed in stm32: Fix cache corruption on unaligned SDCard reads #19140, it's not viable to always ensure a DMA buffer is aligned to DCache. Some kind of workaround or special case handler is needed.
  • By disabling DMA like this it may be harder to swap to a non-blocking API, as is being trialled in mimxrt: Fix SD card deadlock and implement proper timeout handling. #18451.
  • It might be possible to fix this using the same dma_protect_rx_region() method that we used on stm32 instead, however (a) that method doesn't scale to multiple concurrent DMA transfers, and (b) I couldn't actually figure out where the MPU is configured on mimxrt! (Although I didn't look for that long.)
  • There is support to submit a table of multiple DMA descriptors for a single transfer in the NXP USDHC driver, but it looks like the driver has been designed only for a very specific use case where the descriptors are updated dynamically during the transfer(!) We could maybe write our own USDHC driver so we could chain small aligned "head" and "tail" transfers with a large aligned transfer in the middle, but it would be a lot of work and possibly we'd be the first people to try and use the NXP DMA hardware in this way (which makes me nervous).

Generative AI

I did not use generative AI tools when creating this PR.

@projectgus

Copy link
Copy Markdown
Contributor Author

@andrewleech @robert-hh This may be relevant to your discussions in #18451 (apologies for not engaging with that PR, yet, as I see it has some overlapping changes.)

@kwagyeman Hopefully of use to you as well. 🙂

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.48%. Comparing base (aa25a4e) to head (188c61e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19304   +/-   ##
=======================================
  Coverage   98.48%   98.48%           
=======================================
  Files         176      176           
  Lines       22910    22910           
=======================================
  Hits        22564    22564           
  Misses        346      346           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code size report:

Reference:  extmod/modtls_mbedtls: Add TLS-PSK support. [aa25a4e]
Comparison: tests/extmod_hardware/machine_sdcard_dma_align: Fix for native emitter. [merge of 188c61e]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:   +96 +0.024% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@robert-hh

Copy link
Copy Markdown
Contributor

@projectgus I verified that this PR has a good effect. Testing with a MIXRT1050EVK, the test reports errors during interrupted read using the master branch, and it passes fine with this PR. The test was repeated 5 times. Also tested with MIMXRT1020_EVK, Teensy 4.1 and MIMXRT1176_EVK with 3 test runs each. No fail.

@kwagyeman

kwagyeman commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

I checked the driver. I see clean on tx data, and with the changes applied, you have a clean/invalidate on rx before and an invalidate afterward. This is the correct behavior.

If the test is passing, then it should be good!

@kwagyeman kwagyeman moved this to In progress in OpenMV Features Jun 6, 2026
@projectgus projectgus added this to the release-1.29.0 milestone Jun 9, 2026
@projectgus projectgus requested a review from dpgeorge June 9, 2026 23:32

@dpgeorge dpgeorge left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good! I verified that the test still passes on PYBD_SF6. I did not test any mimxrt board.

Comment thread tests/extmod_hardware/machine_sdcard_dma_align.py
print("SKIP")
raise SystemExit

READBLOCKS_SUCCESS = (0, True) # some drivers return True for success, some return 0...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this will be fixed by #16223 ... I've rebased that so that PR should be ready to go.

Nothing to change here. I'll have to update that PR if this is merged before it.

@octoprobe-bot

octoprobe-bot commented Jun 10, 2026

Copy link
Copy Markdown

Octoprobe PR report

Test Tests
passed
Tests
skipped
Tests
xfailed
Tests
failed
format flash 1
run-tests.py 920 97
run-tests.py --via-mpy --emit native 905 112
run-tests.py --via-mpy 920 97
run-perfbench.py 24
run-natmodtests.py 37 4
run-tests.py --test-dirs=extmod_hardware 6 3 1
run-tests.py --test-dirs=extmod_hardware --emit-native 6 3 1
Total 2818 317 2
Failures

@dpgeorge

Copy link
Copy Markdown
Member

Octoprobe is picking up a real problem here! Yay for Octoprobe! @hmaerki

The issue is that sdcard_dma_align.py uses raise without arguments, which is not supported by the native emitter. I have a PR open to fix this, #19245.

Now runs on mimxrt as well (less effective due to virtual timers only).
Can be easily extended to run on other ports.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
- Invalidate the DCache after reads complete, to avoid stale cache lines
  that were read during the operation (the driver already cleans the cache
  before the operation).

- Disable DMA if the read buffer isn't DCache aligned, to avoid corruption
  if the CPU writes to the same cache line as the buffer during the
  operation.

The sdcard_dma_align.py test added in the parent commit is fixed by this
commit.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Raise without an argument isn't supported by native emitter.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@projectgus projectgus force-pushed the bugfix/mimxrt_sdcard_dma_align branch from de903e0 to 188c61e Compare June 11, 2026 00:17
@projectgus

projectgus commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Yay Octoprobe! Have rebased with a commit to add the raise argument, and kicked off an Octoprobe run again.

@dpgeorge

Copy link
Copy Markdown
Member

Octoprobe updated the comment! And there are no failures!

@projectgus projectgus merged commit 7b06871 into micropython:master Jun 11, 2026
39 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in OpenMV Features Jun 11, 2026
@projectgus projectgus deleted the bugfix/mimxrt_sdcard_dma_align branch June 11, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

SD Card Cache Line Corruption

5 participants