ARM has thumb-compression of RISC opcodes for efficient bandwidth throughput and 64-way set associative cache.
The FPU in the ARM (the ARM2 and 250 anyway) isn't stunning, IIRC it's way less efficient than the 6888x.
I don't have any benchmarks for RISC-OS but my A5000 with 25MHz ARM3 *feels* faster than my TT. I know that's down to video hardware and OS as well (the TT doesn't have any video hardware as such, whereas the Acorn has the VIDC chip).
And there's the Wolf3D benchmark...
I think, subjectively, the Acorn mahchines are Slightly faster (having a bus that runs at the CPU's frequency helps).