-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP32-S3: Support execute in place from PSRAM #3024
base: main
Are you sure you want to change the base?
Conversation
I'm thinking the bulk of the code should go into a |
This already looks very good from my point of view. Thanks for taking care of this |
Do you think it's a good idea to move all the SOC_MMU_ constants into mmu.rs as well (further dropping the MMU_ prefix) and change the existing code references to refer to them? I would just like to avoid duplicating the constants. |
Yeah I'd say move the mmu constants as well |
Previously the call to cache_dbus_mmu_set would have likely failed instead. This is more explicit.
This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom function) works except it differs in a few key ways: The ESP-IDF seems to map `.text` and `.rodata` into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how the `Cache_Count_Flash_Pages`, `Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping `.text` and `.rodata` differently, these ROM functions are technically not appropriate if more than 256 pages of flash (`.text` and `.rodata` combined) are in use by the application. Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker[0][1]. As a result, this commit contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code. There are three additional noteworthy differences from the ESP-IDF version of the code: 1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to. 2. The ESP-IDF calls `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));` (functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3. 3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address when copying `.text` and a dcache address when copying `.rodata`. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact. [0]: espressif/esp-idf#15262 [1]: espressif/esp-idf#15263
I've made all the requested changes. Let me know if the explicitly limited visibility ( Let me know if I've gone too far in the new MMU module refactoring. I've also tried to make indices and sizes all I have a few ideas for how the MMU module could be better but I didn't want to get too carried away (or more accurately, I already got too carried away on another branch and then scrapped the idea). I noticed a "bug" in the previous PSRAM code. To be fair I think actually it wouldn't be a bug but more just a confusing error message. Don't hold back any nitpicks. |
I thought I could get away with running lint on only esp-hal. Apparently not. Whoops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good, I'll try and give it a spin myself soon.
/// Refer to | ||
/// <https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-guides/external-ram.html#execute-in-place-xip-from-psram> | ||
/// for more information. | ||
pub execute_from_psram: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more, given there's more than one way to cut this XiP cake, especially once we start looking at static linking in PSRAM, I think this boolean should be separate from the PsramConfig
.
I'm currently thinking it should go in esp_hal::Config
, then init_psram
can take a bool
to do the right thing.
It'd be a good idea to add a HIL test for this. I tried using this but my S3 hangs in |
A question out of my total lack of knowledge, so forgive me if it's a dumb one. Because this feature involves mapping PSRAM over Flash, is it going to conflict/support in OTA firmware update where you need to do flashing while running and can potentially boot from two different flash addresses? |
Absolutely, I will get onto HIL tests as soon as I've gotten around to actually putting together a USB cable to connect to the magic JTAG USB port. I hope to do that this weekend. |
Take this with a grain of salt, but iirc writing to flash bypasses the mmu so this shouldn't impede that. |
Submission Checklist 📝
cargo xtask fmt-packages
command to ensure that all changed code is formatted correctly.CHANGELOG.md
in the proper section.Pull Request Details 📖
Description
This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the
Cache_Flash_To_SPIRAM_Copy
rom function) works except it differs in a few key ways:The ESP-IDF seems to map
.text
and.rodata
into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how theCache_Count_Flash_Pages
,Cache_Flash_To_SPIRAM_Copy
rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping.text
and.rodata
differently, these ROM functions are technically not appropriate if more than 256 pages of flash (.text
and.rodata
combined) are in use by the application.Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker (espressif/esp-idf#15262, espressif/esp-idf#15263).
As a result, this PR contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code.
There are three additional noteworthy differences from the ESP-IDF version of the code:
.text
and.rodata
segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to.cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));
(functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH.text
to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3.Cache_Flash_To_SPIRAM_Copy
with an icache address when copying.text
and a dcache address when copying.rodata
. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact.Please let me know what this PR needs as I am not particulary experienced both with ESP32 development, embedded rust, unsafe rust, or this project.
Let me know if this change should have more documentation and where it's appropriate. As the psram stuff is unstable, I have assumed that it doesn't need to go in the migration guide, but I could be wrong about this.
Testing
I've done some basic testing of this version of the code on an ESP32-S3. I plan on doing some more testing (especially of the special handling of the spare page) but I felt like it was a good idea to get this on github for early comments.