I own a home router (TP-Link Archer C7) and I have been running OpenWRT for many years on it, having full control over it via SSH. A few years back, I had an idea for an daemon process/service to improve a certain thing in its functionality. I was delighted to find out that the MIPS chip that powers the thing was supported by Rust, and I coded a "script", cross-compiled it to a binary and SCP't that to the router. It ran beautifully. After that (and running Rust on a few other embedded systems, including an ARM Cortex-R chip) I became convinced that Rust is the next C for the embedded.
The OpenWRT SDK is quite polished and convenient to use, so I usually use that for custom OpenWRT binaries. But a few days ago I needed to run something custom on my old QNAP NAS (Marvell ARMv5TE based), and I decided to try cross-rs[1] for the first time.
It turned the usual multi-hour expedition of locating and configuring SDKs, toolchains, etc into 3 commands and 5 minutes of downloads and compilation. The resulting executable ran successfully at the first try. I was amazed.
I default to Go for code I want to run on old MIPS and ARMv5/v6 appliances because despite the "huge" binary sizes, cross-compiling is done by setting 2 environmental variables.
Note that this was written 2024-08-08. While I haven't kept up to date with exactly what's been happening in rp-rs, I do know that probe-rs has since been updated with support for the RP2350. Other things may be outdated as well.
Can anyone explain how the dual ARM/RISC-V system works, architecturally?
Are there two actual CPUs on the same die? Is it one shared architecture with two different instruction decode stages, one for ARM and the other for RISC-V that can be toggled at boot time? I like the idea conceptually but I'm not sure how much of that is a hack and/or inefficient compared to a pure ARM or RISC-V core.
> It’s space-inefficient as half of the CPUs are shutdown
In practice is doesn't matter very much for a design like this. The size is already limited to a certain minimum to provide enough perimeter area to provide wire bonding area for all of the pins, so they can fill up the middle with whatever they want.
What difference would the extra 16KiB or whatever instead of the 2 RISC-V cores make? If 520KB is far too little for you, you're likely better off adding a 8 MiB PSRAM chip.
SRAM is big in gate count. typically 6 transistors per bit.
The i386, a 32 bit chip already dragging around a couple of generations of legacy architecture came in at 275,000. I would imagine the Hazard3 would be quite a bit more efficient in transistor usage due to architecture.
16K is 16384(bytes) *8(bits per byte) *6(transistors per bit) = 786, 432
The ESP32-S3 has 512 KB of SRAM, and the RP2350 has 520 KB of SRAM. The ESP32-S3-WROOM does indeed come in configurations with additional PSRAM, but that would be comparing apples and pears. The WROOM is an entire module complete with program flash, PSRAM, crystal oscillator etc. It comes in a much larger footprint than the actual ESP32-S3, and it is entirely conceivable that one could create a similar module with the same amount of PSRAM using the RP2350.
Furthermore, the added RAM in both cases is indeed PSRAM. That being said, the ESP32-S3 supports octal PSRAM, not just quad PSRAM, which does make a difference for the throughput.
And go cellphone style: Package-on-Package or Multi-Chip Module of some sort.
Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?
> But doesn't the ESP32-S3-WROOM have some large on-chip RAM?
They use the same PSRAM chips with relatively bad latency you complained about higher up in the thread. There are boards like those from Pimoroni that even have them on the PCB from the factory.
> For the Pico, say, something in the line of the approach taken by many smartphone SoCs that package memory and processor together.
What for? This only saves you PCB space, the latency is not going to be affected by this. There probably won't be enough people ordering those to justify the additional inventory overhead of (at least) 2 more skews.
Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?
Source for the RISC-V cores being essentially free (Luke Wren is the creator of the RISC-V core design used):
> The final die size would likely have been exactly the same with the Hazard3 removed, as std cell logic is compressible, and there is some rounding on the die dimensions due to constraints on the pad ring design.
Funny thing is that it cost them more than you might think. It was the ability to switch to the riscv which made it (much) easier to glitch. See the "Hazardous threes" exploit [1]
I wonder if they're using the same die for one or more microprocessor products that are RISC-V-only or ARM-only? They could be binning dies that fail testing on one or the other cores that way. Such a product might be getting sold under an entirely different brand name too.
They're not currently doing that but there is a documented way to permanently disable the ARM cores, so they could sell a cheaper RISC-V-only version of the same silicon if there's enough demand to justify another SKU.
That may be the plan for the future. Right now, this is a hedge / leverage against negotiations with ARM. For developers looking to test their code against a new architecture and compare it to known good code/behavior, it doesn’t get any easier than rebooting into the other core!
I find this whole concept remarkable, and somewhat puzzling.
Have seen the same (ARM + RISC-V cores) even at larger scales before (Milk-V Duo @1GHz-ish). But how is this economical? Is die space that cheap? Could you not market the same thing as quadcore with just minor design changes, or would that be too hard because of power budget/bus bandwidth reasons?
Thats also a good point. For the big Milk-V systems I mentioned they use external DRAM-- but cache might still be a die-space issue (I'd assume that it's always shared completely between ARM/RISC-V cores, and would need to be scaled up for true multicore operation).
But I'm still amazed that this is a thing, and you can apparently just throw a full core for a different architecture on a microcontroller at basically no cost :O
1) it needs a certain perimeter to allow all the pins to go from the silicon to the package, which mandates a certain sized square-ish die
2) only the cores are duplicated (and some switching thing is added)
so yes, there is enough space to just add another two cores without any worries, since they don't need more IO or pins or memory or anything.
Tapeouts are expensive. It seems that they had spare area, so they decided to add RISC-V as a sort of feature test and market test; that lets them determine what the market preference is between the two and acquire geek coolness points (plus an additional point for using Rust) at relatively low cost, certainly lower cost than doing two tapeouts for two different SKUs.
Here at $MEDIUM_SIZED_CHIP_CO, we also have some products which have a custom DSP for realtime audio processing and an ARM for "other stuff". It's more common than you think. Even the very first Pi was effectively a video-oriented vector processor with a small ARM glued on the side.
> so they decided to add RISC-V as a sort of feature test and market test;
After some thought I understand the problem this setup tackles: Without dual architectures, there's a conundrum: how do you sell Risc-V if everyone is tooled for Arm?
So I get that offering both means they dont have to risk tapeout of a pure Risc-V die there's no immediate market for. The dual arch chip presents little to no risk while allowing the same chip to bootstrap a Risc-V tooling ecosystem. Once the end users have a mature Risc-V ecosystem the chip makers can begin to cut the Arm licensing strings they are entangled in.
> It's more common than you think.
You are describing co-processors or accelerators which fulfill a different role in the same system. It makes sense to have those. However, I was initially confused as to why a chip needs two architectures that fulfill the same role. That did not make sense at first glance from a technical standpoint but does from a business one.
They certainly can but the end users likely can't. I am sure there are a lot of projects that make Arm assumptions and porting isn't as simple as changing compilers. And is Risc-V tooling as mature as Arm? Doubtful. So releasing a pure Risc-V micro is Risc-Y at this time.
However, giving end users both in the same chip eliminates any risk because it allows the Risc-V tooling to develop at the same time. And if that development fails, oh well, the Arm cores will keep working and the next tapout might exclude the Risc-v cores. Life moves on.
Which of the four RP2350 variants is used in the Raspberry Pi Pico 2 W? I see RP2350A0A2⁰ and (pardon my ignorance) guess that means SKU RP2350A package QFN60, the one with the least features.
I own a home router (TP-Link Archer C7) and I have been running OpenWRT for many years on it, having full control over it via SSH. A few years back, I had an idea for an daemon process/service to improve a certain thing in its functionality. I was delighted to find out that the MIPS chip that powers the thing was supported by Rust, and I coded a "script", cross-compiled it to a binary and SCP't that to the router. It ran beautifully. After that (and running Rust on a few other embedded systems, including an ARM Cortex-R chip) I became convinced that Rust is the next C for the embedded.
The OpenWRT SDK is quite polished and convenient to use, so I usually use that for custom OpenWRT binaries. But a few days ago I needed to run something custom on my old QNAP NAS (Marvell ARMv5TE based), and I decided to try cross-rs[1] for the first time.
It turned the usual multi-hour expedition of locating and configuring SDKs, toolchains, etc into 3 commands and 5 minutes of downloads and compilation. The resulting executable ran successfully at the first try. I was amazed.
[1] https://github.com/cross-rs/cross
I default to Go for code I want to run on old MIPS and ARMv5/v6 appliances because despite the "huge" binary sizes, cross-compiling is done by setting 2 environmental variables.
Easily making static executables is a huge boon too.
> I became convinced that Rust is the next C for the embedded.
Ada works, too, though, or Forth. :P
See more at https://learn.adacore.com/courses/Ada_For_The_Embedded_C_Dev....
Note that this was written 2024-08-08. While I haven't kept up to date with exactly what's been happening in rp-rs, I do know that probe-rs has since been updated with support for the RP2350. Other things may be outdated as well.
The author points out that his changes have already been upstreamed to https://github.com/rp-rs/rp-hal
Can anyone explain how the dual ARM/RISC-V system works, architecturally?
Are there two actual CPUs on the same die? Is it one shared architecture with two different instruction decode stages, one for ARM and the other for RISC-V that can be toggled at boot time? I like the idea conceptually but I'm not sure how much of that is a hack and/or inefficient compared to a pure ARM or RISC-V core.
To be more precise, four CPUs - two ARM and two RISC. There is just a mux for the instruction and data buses - see chapter 3 of the [datasheet](https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.p...).
It’s space-inefficient as half of the CPUs are shutdown, but architecturally it’s all on the same bus.
> It’s space-inefficient as half of the CPUs are shutdown
In practice is doesn't matter very much for a design like this. The size is already limited to a certain minimum to provide enough perimeter area to provide wire bonding area for all of the pins, so they can fill up the middle with whatever they want.
They should have filled it with more SRAM instead - 520KB is far too little.
What difference would the extra 16KiB or whatever instead of the 2 RISC-V cores make? If 520KB is far too little for you, you're likely better off adding a 8 MiB PSRAM chip.
Just 16KB? Couldn’t a lot more be fitted?
PSRAM has huge latency.
SRAM takes up a tremendous amount of space compared to logic. Usually at least six transistors per bit, plus passives, plus management logic.
SRAM is big in gate count. typically 6 transistors per bit.
The i386, a 32 bit chip already dragging around a couple of generations of legacy architecture came in at 275,000. I would imagine the Hazard3 would be quite a bit more efficient in transistor usage due to architecture.
16K is 16384(bytes) *8(bits per byte) *6(transistors per bit) = 786, 432
It was the first CPU on my desk! 80386SX 25MHz.
(this one, only 32bit internally)
Thanks for the explanations - was not aware.
…vertically stack a slab of SRAM above or beneath the CPU die, does come to mind ;)
This is way too expensive for something like a microcontroller. AMD calls this 3D V-Cache and uses it on their top end SKUs.
But doesn't the ESP32-S3-WROOM have some large on-chip RAM?
For the Pico, say, something in the line of the approach taken by many smartphone SoCs that package memory and processor together.
The ESP32-S3 has 512 KB of SRAM, and the RP2350 has 520 KB of SRAM. The ESP32-S3-WROOM does indeed come in configurations with additional PSRAM, but that would be comparing apples and pears. The WROOM is an entire module complete with program flash, PSRAM, crystal oscillator etc. It comes in a much larger footprint than the actual ESP32-S3, and it is entirely conceivable that one could create a similar module with the same amount of PSRAM using the RP2350.
Furthermore, the added RAM in both cases is indeed PSRAM. That being said, the ESP32-S3 supports octal PSRAM, not just quad PSRAM, which does make a difference for the throughput.
> "some"
And go cellphone style: Package-on-Package or Multi-Chip Module of some sort.
Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?
> But doesn't the ESP32-S3-WROOM have some large on-chip RAM?
They use the same PSRAM chips with relatively bad latency you complained about higher up in the thread. There are boards like those from Pimoroni that even have them on the PCB from the factory.
> For the Pico, say, something in the line of the approach taken by many smartphone SoCs that package memory and processor together.
What for? This only saves you PCB space, the latency is not going to be affected by this. There probably won't be enough people ordering those to justify the additional inventory overhead of (at least) 2 more skews.
I believe there's already a separate Flash die in the same package. Probably not possible to add yet another die for DRAM.
(for various chemistry reasons, it's much more efficient to manufacture Flash, DRAM, and regular logic on separate wafers with different processing)
Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?
It may be technically space inefficient but they only added the RISC-V cores because they had area to spare. It didn't cost them much.
Source for the RISC-V cores being essentially free (Luke Wren is the creator of the RISC-V core design used):
> The final die size would likely have been exactly the same with the Hazard3 removed, as std cell logic is compressible, and there is some rounding on the die dimensions due to constraints on the pad ring design.
https://nitter.space/wren6991/status/1821582405188350417
Funny thing is that it cost them more than you might think. It was the ability to switch to the riscv which made it (much) easier to glitch. See the "Hazardous threes" exploit [1]
[1] https://www.raspberrypi.com/news/security-through-transparen...
I wonder if they're using the same die for one or more microprocessor products that are RISC-V-only or ARM-only? They could be binning dies that fail testing on one or the other cores that way. Such a product might be getting sold under an entirely different brand name too.
They're not currently doing that but there is a documented way to permanently disable the ARM cores, so they could sell a cheaper RISC-V-only version of the same silicon if there's enough demand to justify another SKU.
That may be the plan for the future. Right now, this is a hedge / leverage against negotiations with ARM. For developers looking to test their code against a new architecture and compare it to known good code/behavior, it doesn’t get any easier than rebooting into the other core!
I find this whole concept remarkable, and somewhat puzzling.
Have seen the same (ARM + RISC-V cores) even at larger scales before (Milk-V Duo @1GHz-ish). But how is this economical? Is die space that cheap? Could you not market the same thing as quadcore with just minor design changes, or would that be too hard because of power budget/bus bandwidth reasons?
SRAM is very area intensive. What you're asking for is very greedy. The RISC-V core they are using is absolutely tiny.
Thats also a good point. For the big Milk-V systems I mentioned they use external DRAM-- but cache might still be a die-space issue (I'd assume that it's always shared completely between ARM/RISC-V cores, and would need to be scaled up for true multicore operation).
But I'm still amazed that this is a thing, and you can apparently just throw a full core for a different architecture on a microcontroller at basically no cost :O
two things:
1) it needs a certain perimeter to allow all the pins to go from the silicon to the package, which mandates a certain sized square-ish die 2) only the cores are duplicated (and some switching thing is added)
so yes, there is enough space to just add another two cores without any worries, since they don't need more IO or pins or memory or anything.
> Can anyone explain how the dual ARM/RISC-V system works, architecturally?
What I really want to know is why would anyone need or want dual architectures on one chip?
This is hedging their bets.
Tapeouts are expensive. It seems that they had spare area, so they decided to add RISC-V as a sort of feature test and market test; that lets them determine what the market preference is between the two and acquire geek coolness points (plus an additional point for using Rust) at relatively low cost, certainly lower cost than doing two tapeouts for two different SKUs.
Here at $MEDIUM_SIZED_CHIP_CO, we also have some products which have a custom DSP for realtime audio processing and an ARM for "other stuff". It's more common than you think. Even the very first Pi was effectively a video-oriented vector processor with a small ARM glued on the side.
> so they decided to add RISC-V as a sort of feature test and market test;
After some thought I understand the problem this setup tackles: Without dual architectures, there's a conundrum: how do you sell Risc-V if everyone is tooled for Arm?
So I get that offering both means they dont have to risk tapeout of a pure Risc-V die there's no immediate market for. The dual arch chip presents little to no risk while allowing the same chip to bootstrap a Risc-V tooling ecosystem. Once the end users have a mature Risc-V ecosystem the chip makers can begin to cut the Arm licensing strings they are entangled in.
> It's more common than you think.
You are describing co-processors or accelerators which fulfill a different role in the same system. It makes sense to have those. However, I was initially confused as to why a chip needs two architectures that fulfill the same role. That did not make sense at first glance from a technical standpoint but does from a business one.
If anything, they must have learned by now that they can simply drop the ARM cores and go full RISC-V.
As long as they make their core at least feature-equivalent by implementing FPU and the secure boot chain.
They certainly can but the end users likely can't. I am sure there are a lot of projects that make Arm assumptions and porting isn't as simple as changing compilers. And is Risc-V tooling as mature as Arm? Doubtful. So releasing a pure Risc-V micro is Risc-Y at this time.
However, giving end users both in the same chip eliminates any risk because it allows the Risc-V tooling to develop at the same time. And if that development fails, oh well, the Arm cores will keep working and the next tapout might exclude the Risc-v cores. Life moves on.
>is Risc-V tooling as mature as Arm?
RISC-V is rapidly growing the strongest ecosystem.
Isn't half the mission of the Pi project that it is an educational project, and also taretting hobbyists?
In that respect it makes perfect sense to me.
Which of the four RP2350 variants is used in the Raspberry Pi Pico 2 W? I see RP2350A0A2⁰ and (pardon my ignorance) guess that means SKU RP2350A package QFN60, the one with the least features.
⁰ https://datasheets.raspberrypi.com/picow/pico-2-w-schematic....
As far as I know, any Pico 2 currently sold carry the RP2350A: 30 GPIOs, no internal flash since the board carries an external flash chip.
(BTW two of the variants are called RP2354 and not RP2350, the last digit means the amount of internal flash)
Most picture I find seem to show the RP2350A variant for the W and non-W version.
thejpster, where do I know that username from...
Ah, he wrote most of embedded-sdmmc-rs which I used quite a bit. Jonathan, if you see this, thanks!
Embassy got _some_ support for rp2350 for quite some time now
https://github.com/embassy-rs/embassy
I'm very much looking forward to giving Embassy on the RP2350 a go. I've been using it for a while with the RP2040, and it's a joy to use.
Previous post on HN:
Raspberry Pi Showcases Rust on the RP2350 Microcontroller (9 comments):
https://news.ycombinator.com/item?id=41476505
Uhm, I was looking to try it and I found this advisory: https://forums.pimoroni.com/t/rp2350-gpio-internal-pull-prob...
Would suggest people understand this before buying.
See, this is why we use conformal coating ;)