Testing with STM32MP1 setup, 100MHz MCU, 500MHz MPU, sending 100B query to MCU which responds with 5x400B packets.
Run of 1000 queries tooks 150ms. It equals MCU read rate of 13.3MB/sec. Here is top:
Mem: 21040K used, 100804K free, 10868K shrd, 0K buff, 11632K cached
CPU: 0.0% usr 100% sys 0.0% nic 0.0% idle 0.0% io 0.0% irq 0.0% sirq
Load average: 0.46 0.12 0.04 2/47 363
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
25 2 root RW 0 0.0 0 30.7 [kworker/0:1+m4@]
362 147 root S 1268 1.0 0 23.0 ./rpmsg_tst.elf
100 2 root SW 0 0.0 0 15.3 [irq/44-4c001000]
99 2 root SW 0 0.0 0 15.3 [irq/43-4c001000]
363 147 root R 2572 2.1 0 7.6 top
10 2 root IW 0 0.0 0 7.6 [rcu_sched]
Another setup is VirtIO network, the same HW config, M4 sends 990B eth packets to A7 at rate 20MB/sec. Here is load:
CPU: 0.1% usr 41.0% sys 0.0% nic 40.2% idle 0.0% io 0.0% irq 18.5% sirq
Load average: 0.66 0.56 0.37 1/48 748
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
99 2 root SW 0 0.0 0 43.4 [irq/43-4c001000]
25 2 root RW 0 0.0 0 15.8 [kworker/0:1-m4@]
Variant when VirtIO IRQ is sent for each 8th packet:
CPU: 0.0% usr 4.1% sys 0.0% nic 81.9% idle 0.0% io 0.0% irq 13.9% sirq
Load average: 0.25 0.11 0.18 1/48 759
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
99 2 root SW 0 0.0 0 12.9 [irq/43-4c001000]
25 2 root RW 0 0.0 0 2.3 [kworker/0:1-m4@]