But the question is, is it actually worthwhile? Writing code for a SIMD processor is hard at the best of times, and in this case the documentation is near non-existent, and Apple's compiler is buggy. (It turns out that Apple use their own undocumented instruction naming convention which is aliased to the ARM names. But sometimes, the alias isn't quite right.)
Now this is interesting because the new processor actually has two distinct personalities - it can either be a 32-bit processor (aka ARMv7), which looks just the same as previous generation iPad/iPhone processors, or it can be a 64-bit processor with a different instruction set (ARMv8-A). By way of background, there have been numbers of "experts" on the web stating that 64-bit would make no difference. Which is of course in theory true - all other things being equal, you can build a 32-bit processor as fast as a 64-bit one, but that rather misses the point. The point being, did Apple decide to make all other things equal, or not? Given X amount of chip area to work with, Apple could choose to use that area either to make the 32-bit part of the chip fast, or the 64-bit part.
So I set out to find out (a) just how fast the new iPad is in imaging applications, and (b) whether either 64-bit mode or using the SIMD instruction set would make a significant difference.
The benchmarkMy interest in this is practical, and is just about about how to optimize my products, in this case PhotoRaw. So I chose to measure the performance of just one stage of PhotoRaw's pipeline which happens to be fairly "SIMD friendly", and is already SIMD accelerated for 32-bit under the older ARM processors. Note:
- This is just a single point test - the stage in question is typical of an image processing pipeline, but your results may vary. A lot. Also, it's real production code, and it's the whole stage, so when I say SIMD, that actually means a mix of SIMD and C++.
- The stage is multi-threaded, so will use all cores. Specifically, note that the iPad 1 is single core vs the later iPad's two core architecture.
- The NEON SIMD code is hand optimized. Interestingly, the SIMD code in the core loop on the 64-bit ARMv8-A is 23 instructions vs. 27 for the 32-bit code, so about a 15% saving there, although that's not hugely meaningful as different instructions take different numbers of cycles to complete.
- Finally, it so happens that this stage runs identically in AccuRaw, so allows me to also benchmark the same code, in X86 form, on a Intel Core i7 processor, which is quad core.
Times in mS, lower is better.
|iPad Air 32-bit||321||474|
|iPad Air 64-bit||230||108|
|Intel Core i7 4.2 GHz||46|
The results are interesting, and probably not quite what you'd expect:
- Unsurprisingly, the iPad 1 just gets completely outclassed - it has a slow single core processor, and just can't keep up at all. On the iPad 1 however, SIMD makes a real difference, which is how SIMD originally found its way into PhotoRaw.
- The iPad 4 is much better, but there's a surprise - SIMD code only helps a little.
- On the iPad Air, there's another surprise - running in 64-bit mode instantly gains you about 50% - 230mS vs 321 just using compiled C++ code.
- SIMD on the iPad Air is the real shocker. Firstly, in 32-bit mode, it's slower than straight C++ code. If I had to guess, I'd say that Apple deliberately built the 32-bit SIMD side of the new chip to just match the iPad 4, for compatibility reasons. However, in 64-bit mode, it's screamingly fast, clocking a full 5 times faster than the iPad 4, and twice as fast as compiled C++.
- Apple have claimed the that the processor in the iPad Air is "desktop class". Well, sort of. Versus a Core i7 clocking at 4.2 GHz, its about 1/5 the speed. But on a per core basis, that's close to half the speed. That from a device that including memory, screen, battery, etc takes up about 10% of the space that the Core i7's heat sink and fan take up!!!!!
First conclusion - if you were wondering whether the whole bother of rebuilding apps for 64-bit is worthwhile, then the answer is that if they are CPU intensive imaging apps, then it is probably worth the bother. You can expect a 50% uptick in performance right there.
Second conclusion - SIMD might be worthwhile for you, but only if you're going to 64-bit mode and have a real need. Otherwise, don't bother.
Third conclusion - all those web "experts" that said that 64-bit doesn't matter - well, Apple made it matter.
Finally, various people have speculated as to whether Apple's 64-bit chip could find its way into a desktop product. The answer is, yes, probably. If you built a 4-core version, up-clocked it and added heat sinking, it probably still wouldn't quite compete with the top-of-the-line Intel chips. But it would be quite capable.