I used to think that floating-point was not for Embedded Systems. Too slow, too much code overhead and rounding is always a problem.
It turns out that while scaled integers still have a performance benefit, floating-point computations can be done with a surprisingly high performance these days on modern Embedded CPUs. This is true not only for CPUs with floating-point unit (FPU), such as the Cortex-M4F, but also for CPUs which have to do this in software, such as a regular Cortex-M3 or M4 without FPU.
But first things first, let’s look at how things work:
Basic floating-point operations
Let’s look at a function multiplying 2 integers:
int Mul(int a, int b) { return a * b; }
The generated code (in Thumb-2 instructions) can be as simple as this:
FB01F000 mul r0, r1, r0 4770 bx lr
For the multiplication of two floats the code is quite similar when the FPU is used:
float FMul(float a, float b) { return a * b; }
EE200A20 vmul.f32 s0, s0, s1 4770 bx lr
Also the speed is similar. mul and vmul.f32 take just a few cycles.
What happens without FPU?
The good new is: Nothing changes from an application programmers perspective. The C-code does not need to be modified at all. We only tell the compiler that no FPU is available.
Let’s look at the output of FMul() again:
B508 push {r3, lr} F000FB20 bl __aeabi_fmul BD08 pop {r3, pc}
Without an FPU, the compiler cannot use the floating-point multiplication instruction. Instead we can see that it now adds a call to a function __aeabi_fmul to do the work.
Implicit floating-point functions
The ARM Run-time ABI defines implicit floating-point functions, 38 of which are in active use. The compiler can and typically will add calls to these functions whenever it is told to not use an FPU, i.e. the floating-point ABI type is set to “soft”, and the application program needs to perform floating-point operations, such as add, subtract, multiply, divide, or compare.
These functions perform the floating-point operation in software, using the available integer-only instructions. An implementation needs to be provided to the tool-chain, usually as part of the runtime library.
Floating-point benchmark
We were curious to find out more about the performance of such software floating-point operations and compared some implementations:
- SEGGER RunTime Library, ASM-implementation
- Competitor A
- Competitor B, fast library
- Competitor B, small library (default)
- SEGGER RunTime Library, C-implementation
- GNU Arm Embedded, Libgcc
How we test
We called all 38 implicit functions multiple times with different parameters, to execute most, if not all, paths of the implementations. We built an average over all calls of a function, and added the average execution time of all functions to summarize the performance with a single value. For the comparison, we have used the SEGGER RunTime Library ASM implementation as the 100% mark reference.
The code has been generated for Cortex-M4 (Arm architecture v7EM) and runs on an NXP Kinetis K66. All code is executed from RAM to eliminate degradation and variations in execution speed due to caching.
To accurately measure the execution time, the Cortex-M cycle counter, built into the CPU, has been used.
Performance of the SEGGER ASM implementation
Function | Average Cycles | |
---|---|---|
Float, Math | __aeabi_fadd | 31.0 |
__aeabi_fsub | 39.9 | |
__aeabi_frsub | 39.9 | |
__aeabi_fmul | 26.0 | |
__aeabi_fdiv | 53.0 | |
Float, Compare | __aeabi_fcmplt | 13.0 |
__aeabi_fcmple | 13.0 | |
__aeabi_fcmpgt | 13.0 | |
__aeabi_fcmpge | 13.0 | |
__aeabi_fcmpeq | 7.0 | |
Double, Math | __aeabi_dadd | 54.5 |
__aeabi_dsub | 71.2 | |
__aeabi_drsub | 71.2 | |
__aeabi_dmul | 56.4 | |
__aeabi_ddiv | 134.0 | |
Double, Compare | __aeabi_dcmplt | 14.0 |
__aeabi_dcmple | 14.0 | |
__aeabi_dcmpgt | 14.0 | |
__aeabi_dcmpge | 14.0 | |
__aeabi_dcmpeq | 14.0 | |
Float, Conversion | __aeabi_f2iz | 9.0 |
__aeabi_f2uiz | 6.0 | |
__aeabi_f2lz | 13.5 | |
__aeabi_f2ulz | 12.0 | |
__aeabi_i2f | 10.5 | |
__aeabi_ui2f | 7.5 | |
__aeabi_l2f | 19.0 | |
__aeabi_ul2f | 13.8 | |
__aeabi_f2d | 9.0 | |
Double, Conversion | __aeabi_d2iz | 10.0 |
__aeabi_d2uiz | 8.0 | |
__aeabi_d2lz | 16.5 | |
__aeabi_d2ulz | 13.5 | |
__aeabi_i2d | 12.0 | |
__aeabi_ui2d | 8.0 | |
__aeabi_l2d | 17.9 | |
__aeabi_ul2d | 12.9 | |
__aeabi_d2f | 11.0 |
Comparison with other implementations
SEGGER RT Lib (ASM) | Competitor A | Competitor B (fast) | Competitor B (small) | GNU Arm libgcc | SEGGER RT Lib (C) | |
---|---|---|---|---|---|---|
Float, Math | 100.0% | 95.2% | 95.2% | 234.0% | 179.9% | 334.6% |
Float, Compare | 100.0% | 144.1% | 186.4% | 81.4% | 449.2% | 142.4% |
Double, Math | 100.0% | 95.6% | 86.6% | 375.5% | 240.3% | 410.7% |
Double, Compare | 100.0% | 150.0% | 157.1% | 190.0% | 394.3% | 340.0% |
Float, Conversion | 100.0% | 110.6% | 121.1% | 249.9% | 688.2% | 638.3% |
Double, Conversion | 100.0% | 111.3% | 137.0% | 531.1% | 714.3% | 903.7% |
Total | 100.0% | 106.3% | 110.0% | 318.0% | 358.9% | 456.2% |
The SEGGER ASM library has been used as reference, making the values in the first column 100%.
Smaller values mean higher performance.
Detailed information on the results is available here: PDF.
Conclusions
It is surprising to see that it is possible to perform IEEE 795 compliant floating point operations so efficiently in software. Only 26 cycles on average for a multiplication, 31 cycles for addition mean that a Cortex-M4 can execute a floating point computation in fractions of a μs (such as 0.13μs at 200MHz). Modern embedded CPUs are capable of performing millions of floating point operations per second, making floating point “affordable” even on hardware without FPU. This is true for pretty much any system except maybe those whose primary purpose is number crunching, such as digital filter applications.
However, there are differences in implementations.
All commercial toolchains (SEGGER Embedded Studio as well as competitors A and B) have put a lot of effort into their floating-point code. They all use highly optimized assembly code to perform these operations.
Competitor B also supplies a “code size optimized” library, which is actually default. Beware! While the code is somewhat more compact, the performance of the small library is surprisingly low. We might take a closer look at things in another part of this series.
Libgcc for GNU Arm Embedded is also written in assembler, but by far not as optimized. Actually, its performance (about 30% of the performance of the commercial implementations) is quite disappointing.
The only library written in pure C in this test is the SEGGER RunTime Library C variant. Not surprisingly, it is about 4 time slower than its brother written in ASM. However, it is almost as fast as the assembly coded Libgcc. That is quite impressive for a library that can be basically used on any CPU, not just ARM.
Good work, Mr. Curtis!
The SEGGER RunTime Library is the overall winner, but performance differences are not significant in most cases, simply because competitors A and B have also “done their homework” and produced good code as well.
Benchmark program
Below for reference the code we have used to benchmark the different libraries.
/********************************************************************* * (c) SEGGER Microcontroller GmbH * * The Embedded Experts * * www.segger.com * ********************************************************************** -------------------------- END-OF-HEADER ----------------------------- File : bench.c Purpose : Benchmark Arm EABI implicit floating-point functions. */ /********************************************************************* * * #include section * ********************************************************************** */ /********************************************************************* * * Defines, fixed * ********************************************************************** */ #ifdef SEMIHOST #include "SEGGER_SEMIHOST.h" #endif #if !defined (__clang__) || defined(__CC_ARM) #include <string.h> #include <stdio.h> #endif #include <stdarg.h> /********************************************************************* * * Defines, configurable * ********************************************************************** */ #define SPECIAL(X) // Set to X, if specials are required /********************************************************************* * * Defines, fixed * ********************************************************************** */ #define COUNTOF(X) (sizeof(X) / sizeof(X[0])) #define DWT_CYCCNT (*(volatile unsigned *)0xE0001004) #if defined (__clang__) && !defined(__CC_ARM) #define memset _MEMSET #endif /********************************************************************* * * Types, local * ********************************************************************** */ typedef unsigned long long u64; typedef unsigned long u32; typedef enum { MODE_INT_RETURN_FLOAT, MODE_INT_RETURN_DOUBLE, MODE_LLONG_RETURN_FLOAT, MODE_LLONG_RETURN_DOUBLE, MODE_FLOAT_RETURN_INT, MODE_FLOAT_RETURN_LLONG, MODE_FLOAT_RETURN_FLOAT, MODE_FLOAT_RETURN_DOUBLE, MODE_DOUBLE_RETURN_INT, MODE_DOUBLE_RETURN_LLONG, MODE_DOUBLE_RETURN_FLOAT, MODE_DOUBLE_RETURN_DOUBLE, // MODE_INT_INT_RETURN_INT, MODE_LLONG_LLONG_RETURN_LLONG, MODE_FLOAT_FLOAT_RETURN_INT, MODE_FLOAT_FLOAT_RETURN_FLOAT, MODE_DOUBLE_DOUBLE_RETURN_INT, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, MODE_MAX } ExecMode; typedef enum { SEQUENCE_END, SEQUENCE_SPECIAL_F32xF32, SEQUENCE_TYPICAL_F32xF32, SEQUENCE_SPECIAL_F64xF64, SEQUENCE_TYPICAL_F64xF64, SEQUENCE_31_INT, SEQUENCE_31_FLOAT, SEQUENCE_31_DOUBLE, SEQUENCE_63_LLONG, SEQUENCE_63_FLOAT, SEQUENCE_63_DOUBLE, SEQUENCE_SIGNED = 1<<8 } SEQUENCE; typedef void (*VoidFunc)(void); typedef struct { unsigned index; unsigned last; int sign; SEQUENCE seq; } ExecSequence; typedef union { float f; double d; int i; long long l; } ExecValue; typedef volatile union { void (*pfVoidReturnVoid) (void); float (*pfIntReturnFloat) (int); double (*pfIntReturnDouble) (int); float (*pfLlongReturnFloat) (long long); double (*pfLlongReturnDouble) (long long); float (*pfUllongReturnFloat) (unsigned long long); double (*pfUllongReturnDouble) (unsigned long long); int (*pfIntIntReturnInt) (int, int); long long (*pfLlongLlongReturnLlong) (long long, long long); // float (*pfFloatFloatReturnFloat) (float, float); int (*pfFloatFloatReturnInt) (float, float); // double (*pfDoubleDoubleReturnDouble) (double, double); int (*pfDoubleDoubleReturnInt) (double, double); // int (*pfFloatReturnInt) (float); float (*pfFloatReturnFloat) (float); long long (*pfFloatReturnLlong) (float); double (*pfFloatReturnDouble) (float); // int (*pfDoubleReturnInt) (double); long long (*pfDoubleReturnLlong) (double); float (*pfDoubleReturnFloat) (double); double (*pfDoubleReturnDouble) (double); } ExecFunction; typedef struct { ExecMode Mode; ExecFunction Function; ExecValue v0; ExecValue v1; int i; int j; } ExecContext; /********************************************************************* * * Prototypes (of benchmarked runtime functions) * ********************************************************************** */ // ARM EAEBI int __aeabi_idiv (int, int); long long __aeabi_ldivmod (long long, long long); float __aeabi_fadd (float, float); float __aeabi_fsub (float, float); float __aeabi_frsub (float, float); float __aeabi_fmul (float, float); float __aeabi_fdiv (float, float); int __aeabi_fcmplt (float, float); int __aeabi_fcmple (float, float); int __aeabi_fcmpgt (float, float); int __aeabi_fcmpge (float, float); int __aeabi_fcmpeq (float, float); double __aeabi_dadd (double, double); double __aeabi_dsub (double, double); double __aeabi_drsub (double, double); double __aeabi_dmul (double, double); double __aeabi_ddiv (double, double); int __aeabi_dcmplt (double, double); int __aeabi_dcmple (double, double); int __aeabi_dcmpgt (double, double); int __aeabi_dcmpge (double, double); int __aeabi_dcmpeq (double, double); int __aeabi_f2iz (float); unsigned __aeabi_f2uiz (float); long long __aeabi_f2lz (float); unsigned long long __aeabi_f2ulz (float); float __aeabi_i2f (int); float __aeabi_ui2f (unsigned); float __aeabi_l2f (long long); float __aeabi_ul2f (unsigned long long); int __aeabi_d2iz (double); long long __aeabi_d2lz (double); unsigned __aeabi_d2uiz (double); unsigned long long __aeabi_d2ulz (double); double __aeabi_i2d (int); double __aeabi_ui2d (unsigned); double __aeabi_l2d (long long); double __aeabi_ul2d (unsigned long long); double __aeabi_f2d (float); float __aeabi_d2f (double); // GNU API float __addsf3 (float, float); float __subsf3 (float, float); float __mulsf3 (float, float); float __divsf3 (float, float); float __ltsf2 (float, float); float __lesf2 (float, float); float __gtsf2 (float, float); float __gesf2 (float, float); float __eqsf2 (float, float); float __nesf2 (float, float); double __adddf3 (double, double); double __subdf3 (double, double); double __muldf3 (double, double); double __divdf3 (double, double); double __ltdf2 (double, double); double __ledf2 (double, double); double __gtdf2 (double, double); double __gedf2 (double, double); double __eqdf2 (double, double); double __nedf2 (double, double); /********************************************************************* * * Static data, const * ********************************************************************** */ // binary32 special values static u32 _aFloatSpecials[] = { 0x00000000, // +0 0x80000000, // -0 0x7F800000, // +Inf 0xFF800000, // -Inf 0x7FC00000, // NaN 0xFFC00000, // NaN }; // Random floats derived from quantum randomness (https://qrng.anu.edu.au) static float _aFloatRandomUniformDistribution1[] = { 0.7885449723, 0.9998094715, 0.3876576724, 0.8356841958, 0.3148936939, 0.9970710786, 0.8235131486, 0.3335833366, 0.1948718644, 0.8166663091, 0.1650510733, 0.3968966721, 0.3638974189, 0.9667957495, 0.3121612214, 0.9223421130, 0.7188766282, 0.2825422601, 0.0383919030, 0.5764071341, 0.4114595256, 0.4700649972, 0.8002487955, 0.3655678094, 0.6008792749, 0.4053804503, 0.3819831959, 0.7347183835, 0.4479462250, 0.3401285649, 0.0707507148, 0.4984719161, 0.3409999091, 0.8548396639, 0.5045839402, 0.7739178709, 0.0983707712, 0.5618592840, 0.1426608492, 0.5289642164, 0.1578932915, 0.9081336126, 0.4058290755, 0.8012231669, 0.8389891772, 0.0952707962, 0.4920716871, 0.3719829386, 0.0144001994, 0.7667299990, 0.6203624231, 0.7813631283, 0.6673019642, 0.7618224988, 0.6041512158, 0.8233172946, 0.6591242263, 0.6219177115, 0.6990491696, 0.5953941475, 0.7233279722, 0.3609917109, 0.1769333638, 0.1089936333 }; // Random floats derived from quantum randomness (https://qrng.anu.edu.au) static float _aFloatRandomUniformDistribution2[] = { 0.0422564714, 0.7728131769, 0.8620072105, 0.8170243470, 0.9945166426, 0.1984626113, 0.9276007395, 0.5248677401, 0.7048731442, 0.7535610915, 0.5463053182, 0.6054137050, 0.2593339109, 0.3244756924, 0.4028105685, 0.2196438660, 0.2756980496, 0.4626345033, 0.0841498048, 0.4801435920, 0.3151815446, 0.5968274530, 0.6534962360, 0.6365893527, 0.1284928145, 0.6721899283, 0.6016264597, 0.7256847994, 0.5143220404, 0.8687852838, 0.1344069993, 0.4294689739, 0.1108499650, 0.8959778614, 0.6813699648, 0.7632335353, 0.1046082104, 0.3226924169, 0.9592376359, 0.8123961553, 0.8210336750, 0.5806060940, 0.8104785465, 0.3776035579, 0.4898308927, 0.3280951809, 0.1899302640, 0.7083087792, 0.3979903829, 0.0754221734, 0.2727227594, 0.0049476867, 0.3373789961, 0.3441676357, 0.6555256263, 0.8512584435, 0.2644237446, 0.2510367962, 0.7095772772, 0.6422276897, 0.3595680716, 0.7666331518, 0.7823634977, 0.9986928948 }; // Random doubles derived from quantum randomness (https://qrng.anu.edu.au) static double _aDoubleRandomUniformDistribution1[] = { 0.62598670877017040467, 0.49248291507389259323, 0.02726059443179837415, 0.52383376114815388239, 0.94881962108914826333, 0.23945969797938011460, 0.22132856465995987511, 0.40164002160057182308, 0.02558438713688386477, 0.12523811179432791317, 0.67056860624381301735, 0.05494466839729881311, 0.15128037511840960857, 0.93290446929135529390, 0.51819119451781587437, 0.56565829405943493592, 0.89639821540508221456, 0.48541199928732648388, 0.08836267574199602456, 0.24251967550090505148, 0.16586885359007352595, 0.48961907867477528217, 0.82618915609454883542, 0.73600718852549053876, 0.87066246033524769869, 0.86020591848752893062, 0.85699897202914135194, 0.11452935695167901460, 0.41303841463037521702, 0.80951287799563916322, 0.75378633773898919971, 0.49633766682999376297, 0.98545748812484449544, 0.34260954016749222648, 0.56915335626507813411, 0.85065987223630355238, 0.29075114535898746357, 0.24604485121860453263, 0.70681987573003776796, 0.23564755356683848740, 0.19445599747538142750, 0.26612471807353255859, 0.26043225424381303005, 0.00087537885780199165, 0.57611537016977388272, 0.21274132250868999946, 0.68576149410247520150, 0.53597164019987906463, 0.80723091306137133570, 0.48431508160461319525, 0.05117989159074980911, 0.22820212900191732869, 0.00323988328678565153, 0.28633918134445096158, 0.61724704767476312226, 0.86797895493611381017, 0.40851001880412455855, 0.04568938942160463537, 0.05128283614073092389, 0.45920412605629752877, 0.96756956301432592105, 0.91365827487144381776, 0.44010767338302752699, 0.08153736749748720152 }; // Random doubles derived from quantum randomness (https://qrng.anu.edu.au) static double _aDoubleRandomUniformDistribution2[] = { 0.13791062199848394430, 0.30853562390856629686, 0.47255436807785749811, 0.76494137047912110909, 0.85737237712826384305, 0.50837580073361976443, 0.09879071225648270259, 0.37142974335787939576, 0.89382622662737115497, 0.11034956939165642209, 0.95260237469393842878, 0.32369926555136014278, 0.70240408851025394699, 0.95193126496005132973, 0.29833512067425684597, 0.86891023377616471572, 0.65753247170754614536, 0.15021233235108470092, 0.51993705151156395899, 0.95605688170461269955, 0.78399271749907696931, 0.35253001866313723186, 0.27301178262116164802, 0.96813863725529664873, 0.97590087719427336527, 0.12411551533291666025, 0.02730357846216959856, 0.21329428836053378704, 0.66554383622626407541, 0.76125224975509662526, 0.55864211173109079551, 0.67051126395572986824, 0.66246756407151419597, 0.97890897008569255528, 0.05455944378960619263, 0.86547464045876951401, 0.43622915074551305806, 0.98726021620151813075, 0.81792085362753240587, 0.31793107375168283805, 0.06057444961449573412, 0.03432623241446031505, 0.29130429676615284313, 0.91094642214997097136, 0.55970045530181872927, 0.12220353062299554107, 0.64201703938238601704, 0.75006836925733483649, 0.24034675143166997700, 0.17493414521907372445, 0.89125908767524068203, 0.57105276386594203357, 0.53693818935585752770, 0.43389582086499787074, 0.82302863999955747663, 0.08636985155486539717, 0.28425740151320795268, 0.78755776097024850001, 0.90255131206116015747, 0.13030404250052839406, 0.80136503668073546207, 0.80110538175970802405, 0.59736931252456376089, 0.28568214281583553721 }; // binary64 special values static u64 _aDoubleSpecials[] = { 0x0000000000000000uLL, // +0 0x8000000000000000uLL, // -0 0x7FF0000000000000uLL, // +Inf 0xFFF0000000000000uLL, // -Inf 0x7FF8000000000000uLL, // NaN 0xFFF8000000000000uLL, // NaN }; // Random floats with magnitudes [1..2^31] static float _aFloat31[] = { 0.8620072105 * (1uLL<<1), 0.8170243470 * (1uLL<<2), 0.9945166426 * (1uLL<<3), 0.9276007395 * (1uLL<<4), 0.5248677401 * (1uLL<<5), 0.7048731442 * (1uLL<<6), 0.7535610915 * (1uLL<<7), 0.5463053182 * (1uLL<<8), 0.6054137050 * (1uLL<<9), 0.5968274530 * (1uLL<<10), 0.6534962360 * (1uLL<<11), 0.6365893527 * (1uLL<<12), 0.6721899283 * (1uLL<<13), 0.6016264597 * (1uLL<<14), 0.7256847994 * (1uLL<<15), 0.5143220404 * (1uLL<<16), 0.8687852838 * (1uLL<<17), 0.8959778614 * (1uLL<<18), 0.6813699648 * (1uLL<<19), 0.7632335353 * (1uLL<<20), 0.9592376359 * (1uLL<<21), 0.8123961553 * (1uLL<<22), 0.8210336750 * (1uLL<<23), 0.5806060940 * (1uLL<<24), 0.8104785465 * (1uLL<<25), 0.7083087792 * (1uLL<<26), 0.6555256263 * (1uLL<<27), 0.8512584435 * (1uLL<<28), 0.7095772772 * (1uLL<<29), 0.6422276897 * (1uLL<<30), 0.7666331518 * (1uLL<<31), }; // Random floats with magnitudes [1..2^63] static float _aFloat63[] = { +0.8620072105 * (1uLL<<1), +0.8170243470 * (1uLL<<2), +0.9945166426 * (1uLL<<3), +0.9276007395 * (1uLL<<4), +0.5248677401 * (1uLL<<5), +0.7048731442 * (1uLL<<6), +0.7535610915 * (1uLL<<7), +0.5463053182 * (1uLL<<8), +0.6054137050 * (1uLL<<9), +0.5968274530 * (1uLL<<10), +0.6534962360 * (1uLL<<11), +0.6365893527 * (1uLL<<12), +0.6721899283 * (1uLL<<13), +0.6016264597 * (1uLL<<14), +0.7256847994 * (1uLL<<15), +0.5143220404 * (1uLL<<16), +0.8687852838 * (1uLL<<17), +0.8959778614 * (1uLL<<18), +0.6813699648 * (1uLL<<19), +0.7632335353 * (1uLL<<20), +0.9592376359 * (1uLL<<21), +0.8123961553 * (1uLL<<22), +0.8210336750 * (1uLL<<23), +0.5806060940 * (1uLL<<24), +0.8104785465 * (1uLL<<25), +0.7083087792 * (1uLL<<26), +0.6555256263 * (1uLL<<27), +0.8512584435 * (1uLL<<28), +0.7095772772 * (1uLL<<29), +0.7728131769 * (1uLL<<30), +0.8620072105 * (1uLL<<31), +0.8170243470 * (1uLL<<32), +0.9945166426 * (1uLL<<33), +0.9276007395 * (1uLL<<34), +0.5248677401 * (1uLL<<35), +0.7048731442 * (1uLL<<36), +0.7535610915 * (1uLL<<37), +0.5463053182 * (1uLL<<38), +0.6054137050 * (1uLL<<39), +0.5968274530 * (1uLL<<40), +0.6534962360 * (1uLL<<41), +0.6365893527 * (1uLL<<42), +0.6721899283 * (1uLL<<43), +0.6016264597 * (1uLL<<44), +0.7256847994 * (1uLL<<45), +0.5143220404 * (1uLL<<46), +0.8687852838 * (1uLL<<47), +0.8959778614 * (1uLL<<48), +0.6813699648 * (1uLL<<49), +0.7632335353 * (1uLL<<50), +0.9592376359 * (1uLL<<51), +0.8123961553 * (1uLL<<52), +0.8210336750 * (1uLL<<53), +0.5806060940 * (1uLL<<54), +0.8104785465 * (1uLL<<55), +0.7083087792 * (1uLL<<56), +0.6555256263 * (1uLL<<57), +0.8512584435 * (1uLL<<58), +0.7095772772 * (1uLL<<59), +0.6422276897 * (1uLL<<60), +0.7666331518 * (1uLL<<61), +0.7728131769 * (1uLL<<62), +0.8620072105 * (1uLL<<63), }; // Random integers with magnitudes [1..2^31] static int _aInt31[] = { (int)(0.8620072105 * (1uLL<<1)), (int)(0.8170243470 * (1uLL<<2)), (int)(0.9945166426 * (1uLL<<3)), (int)(0.9276007395 * (1uLL<<4)), (int)(0.5248677401 * (1uLL<<5)), (int)(0.7048731442 * (1uLL<<6)), (int)(0.7535610915 * (1uLL<<7)), (int)(0.5463053182 * (1uLL<<8)), (int)(0.6054137050 * (1uLL<<9)), (int)(0.5968274530 * (1uLL<<10)), (int)(0.6534962360 * (1uLL<<11)), (int)(0.6365893527 * (1uLL<<12)), (int)(0.6721899283 * (1uLL<<13)), (int)(0.6016264597 * (1uLL<<14)), (int)(0.7256847994 * (1uLL<<15)), (int)(0.5143220404 * (1uLL<<16)), (int)(0.8687852838 * (1uLL<<17)), (int)(0.8959778614 * (1uLL<<18)), (int)(0.6813699648 * (1uLL<<19)), (int)(0.7632335353 * (1uLL<<20)), (int)(0.9592376359 * (1uLL<<21)), (int)(0.8123961553 * (1uLL<<22)), (int)(0.8210336750 * (1uLL<<23)), (int)(0.5806060940 * (1uLL<<24)), (int)(0.8104785465 * (1uLL<<25)), (int)(0.7083087792 * (1uLL<<26)), (int)(0.6555256263 * (1uLL<<27)), (int)(0.8512584435 * (1uLL<<28)), (int)(0.7095772772 * (1uLL<<29)), (int)(0.6422276897 * (1uLL<<30)), (int)(0.7666331518 * (1uLL<<31)), }; // Random doubles derived from quantum randomness (https://qrng.anu.edu.au) static double _aDouble31[] = { 0.62598670877017040467 * (1uLL<<1), 0.49248291507389259323 * (1uLL<<2), 0.02726059443179837415 * (1uLL<<3), 0.52383376114815388239 * (1uLL<<4), 0.94881962108914826333 * (1uLL<<5), 0.23945969797938011460 * (1uLL<<6), 0.22132856465995987511 * (1uLL<<7), 0.40164002160057182308 * (1uLL<<8), 0.02558438713688386477 * (1uLL<<9), 0.12523811179432791317 * (1uLL<<10), 0.67056860624381301735 * (1uLL<<11), 0.05494466839729881311 * (1uLL<<12), 0.15128037511840960857 * (1uLL<<13), 0.93290446929135529390 * (1uLL<<14), 0.51819119451781587437 * (1uLL<<15), 0.56565829405943493592 * (1uLL<<16), 0.89639821540508221456 * (1uLL<<17), 0.48541199928732648388 * (1uLL<<18), 0.08836267574199602456 * (1uLL<<19), 0.24251967550090505148 * (1uLL<<20), 0.16586885359007352595 * (1uLL<<21), 0.48961907867477528217 * (1uLL<<22), 0.82618915609454883542 * (1uLL<<23), 0.73600718852549053876 * (1uLL<<24), 0.87066246033524769869 * (1uLL<<25), 0.86020591848752893062 * (1uLL<<26), 0.85699897202914135194 * (1uLL<<27), 0.11452935695167901460 * (1uLL<<28), 0.41303841463037521702 * (1uLL<<29), 0.80951287799563916322 * (1uLL<<30), 0.75378633773898919971 * (1uLL<<31), }; // Random doubles derived from quantum randomness (https://qrng.anu.edu.au) static double _aDouble63[] = { 0.62598670877017040467 * (1uLL<<1), 0.49248291507389259323 * (1uLL<<2), 0.02726059443179837415 * (1uLL<<3), 0.52383376114815388239 * (1uLL<<4), 0.94881962108914826333 * (1uLL<<5), 0.23945969797938011460 * (1uLL<<6), 0.22132856465995987511 * (1uLL<<7), 0.40164002160057182308 * (1uLL<<8), 0.02558438713688386477 * (1uLL<<9), 0.12523811179432791317 * (1uLL<<10), 0.67056860624381301735 * (1uLL<<11), 0.05494466839729881311 * (1uLL<<12), 0.15128037511840960857 * (1uLL<<13), 0.93290446929135529390 * (1uLL<<14), 0.51819119451781587437 * (1uLL<<15), 0.56565829405943493592 * (1uLL<<16), 0.89639821540508221456 * (1uLL<<17), 0.48541199928732648388 * (1uLL<<18), 0.08836267574199602456 * (1uLL<<19), 0.24251967550090505148 * (1uLL<<20), 0.16586885359007352595 * (1uLL<<21), 0.48961907867477528217 * (1uLL<<22), 0.82618915609454883542 * (1uLL<<23), 0.73600718852549053876 * (1uLL<<24), 0.87066246033524769869 * (1uLL<<25), 0.86020591848752893062 * (1uLL<<26), 0.85699897202914135194 * (1uLL<<27), 0.11452935695167901460 * (1uLL<<28), 0.41303841463037521702 * (1uLL<<29), 0.80951287799563916322 * (1uLL<<30), 0.75378633773898919971 * (1uLL<<31), 0.49633766682999376297 * (1uLL<<32), 0.98545748812484449544 * (1uLL<<33), 0.34260954016749222648 * (1uLL<<34), 0.56915335626507813411 * (1uLL<<35), 0.85065987223630355238 * (1uLL<<36), 0.29075114535898746357 * (1uLL<<37), 0.24604485121860453263 * (1uLL<<38), 0.70681987573003776796 * (1uLL<<39), 0.23564755356683848740 * (1uLL<<40), 0.19445599747538142750 * (1uLL<<41), 0.26612471807353255859 * (1uLL<<42), 0.26043225424381303005 * (1uLL<<43), 0.00087537885780199165 * (1uLL<<44), 0.57611537016977388272 * (1uLL<<45), 0.21274132250868999946 * (1uLL<<46), 0.68576149410247520150 * (1uLL<<47), 0.53597164019987906463 * (1uLL<<48), 0.80723091306137133570 * (1uLL<<49), 0.48431508160461319525 * (1uLL<<50), 0.05117989159074980911 * (1uLL<<51), 0.22820212900191732869 * (1uLL<<52), 0.00323988328678565153 * (1uLL<<53), 0.28633918134445096158 * (1uLL<<54), 0.61724704767476312226 * (1uLL<<55), 0.86797895493611381017 * (1uLL<<56), 0.40851001880412455855 * (1uLL<<57), 0.04568938942160463537 * (1uLL<<58), 0.05128283614073092389 * (1uLL<<59), 0.45920412605629752877 * (1uLL<<60), 0.96756956301432592105 * (1uLL<<61), 0.91365827487144381776 * (1uLL<<62), 0.44010767338302752699 * (1uLL<<63), }; static long long _aLlong63[] = { (long long)(0.59248291507389259323 * (1uLL<<1)), (long long)(0.72726059443179837415 * (1uLL<<2)), (long long)(0.52383376114815388239 * (1uLL<<3)), (long long)(0.94881962108914826333 * (1uLL<<4)), (long long)(0.23945969797938011460 * (1uLL<<5)), (long long)(0.22132856465995987511 * (1uLL<<6)), (long long)(0.40164002160057182308 * (1uLL<<7)), (long long)(0.02558438713688386477 * (1uLL<<8)), (long long)(0.12523811179432791317 * (1uLL<<9)), (long long)(0.67056860624381301735 * (1uLL<<10)), (long long)(0.05494466839729881311 * (1uLL<<11)), (long long)(0.15128037511840960857 * (1uLL<<12)), (long long)(0.93290446929135529390 * (1uLL<<13)), (long long)(0.51819119451781587437 * (1uLL<<14)), (long long)(0.56565829405943493592 * (1uLL<<15)), (long long)(0.89639821540508221456 * (1uLL<<16)), (long long)(0.48541199928732648388 * (1uLL<<17)), (long long)(0.08836267574199602456 * (1uLL<<18)), (long long)(0.24251967550090505148 * (1uLL<<19)), (long long)(0.16586885359007352595 * (1uLL<<20)), (long long)(0.48961907867477528217 * (1uLL<<21)), (long long)(0.82618915609454883542 * (1uLL<<22)), (long long)(0.73600718852549053876 * (1uLL<<23)), (long long)(0.87066246033524769869 * (1uLL<<24)), (long long)(0.86020591848752893062 * (1uLL<<25)), (long long)(0.85699897202914135194 * (1uLL<<26)), (long long)(0.11452935695167901460 * (1uLL<<27)), (long long)(0.41303841463037521702 * (1uLL<<28)), (long long)(0.80951287799563916322 * (1uLL<<29)), (long long)(0.75378633773898919971 * (1uLL<<30)), (long long)(0.49633766682999376297 * (1uLL<<31)), (long long)(0.98545748812484449544 * (1uLL<<32)), (long long)(0.34260954016749222648 * (1uLL<<33)), (long long)(0.56915335626507813411 * (1uLL<<34)), (long long)(0.85065987223630355238 * (1uLL<<35)), (long long)(0.29075114535898746357 * (1uLL<<36)), (long long)(0.24604485121860453263 * (1uLL<<37)), (long long)(0.70681987573003776796 * (1uLL<<38)), (long long)(0.23564755356683848740 * (1uLL<<39)), (long long)(0.19445599747538142750 * (1uLL<<40)), (long long)(0.26612471807353255859 * (1uLL<<41)), (long long)(0.26043225424381303005 * (1uLL<<42)), (long long)(0.00087537885780199165 * (1uLL<<43)), (long long)(0.57611537016977388272 * (1uLL<<44)), (long long)(0.21274132250868999946 * (1uLL<<45)), (long long)(0.68576149410247520150 * (1uLL<<46)), (long long)(0.53597164019987906463 * (1uLL<<47)), (long long)(0.80723091306137133570 * (1uLL<<48)), (long long)(0.48431508160461319525 * (1uLL<<49)), (long long)(0.05117989159074980911 * (1uLL<<50)), (long long)(0.22820212900191732869 * (1uLL<<51)), (long long)(0.00323988328678565153 * (1uLL<<52)), (long long)(0.28633918134445096158 * (1uLL<<53)), (long long)(0.61724704767476312226 * (1uLL<<54)), (long long)(0.86797895493611381017 * (1uLL<<55)), (long long)(0.40851001880412455855 * (1uLL<<56)), (long long)(0.04568938942160463537 * (1uLL<<57)), (long long)(0.05128283614073092389 * (1uLL<<58)), (long long)(0.45920412605629752877 * (1uLL<<59)), (long long)(0.96756956301432592105 * (1uLL<<60)), (long long)(0.91365827487144381776 * (1uLL<<61)), (long long)(0.44010767338302752699 * (1uLL<<62)), (long long)(0.08153736749748720152 * (1uLL<<63)), }; /********************************************************************* * * Static data * ********************************************************************** */ static u64 _aOverhead[MODE_MAX]; // Overhead of calling a function that simply returns, per mode. /********************************************************************* * * Local functions * ********************************************************************** */ #if defined (__clang__) && !defined(__CC_ARM) static void* _MEMSET(void *str, int c, int n) { unsigned char* p; p = (unsigned char*)str; while (n > 0) { *p++ = (unsigned char)c; n--; } return str; } #endif /********************************************************************* * * _Logf() * * Function description * Log a formatted string via semihosting or toolchain internal loggin. */ static void _Logf(const char *sFormat, ...) { va_list ap; // va_start(ap, sFormat); #ifdef SEMIHOST SEGGER_SEMIHOST_Writef(sFormat, &ap); #else vprintf(sFormat, ap); #endif } /********************************************************************* * * _GetTime() * * Function description * Get the current time from a performance counter. */ static u32 _GetTime(void) { return DWT_CYCCNT; } /********************************************************************* * * NullVoidReturnVoid() * * Function description * Naked function to measure function call overhead. */ #if defined(__CC_ARM) __asm void __attribute__((noinline)) NullVoidReturnVoid(void) { bx lr; }; #elif defined (__ICCARM__) || defined (__SES_ARM) || defined(__GNUC__) void __attribute__((naked, noinline, section(".fast"))) NullVoidReturnVoid(void) { __asm("bx lr"); }; #endif /********************************************************************* * * _Time() * * Function description * Get the time of executing a function. */ static u32 __attribute__((noinline, section(".fast"))) _Time(ExecContext *pContext) { u32 t0; u32 t1; // t0 = _GetTime(); switch (pContext->Mode) { case MODE_INT_RETURN_FLOAT: pContext->Function.pfIntReturnFloat (pContext->v0.i); break; case MODE_INT_RETURN_DOUBLE: pContext->Function.pfIntReturnDouble (pContext->v0.i); break; case MODE_LLONG_RETURN_FLOAT: pContext->Function.pfLlongReturnFloat (pContext->v0.l); break; case MODE_LLONG_RETURN_DOUBLE: pContext->Function.pfLlongReturnDouble (pContext->v0.l); break; case MODE_FLOAT_RETURN_INT: pContext->Function.pfFloatReturnInt (pContext->v0.f); break; case MODE_FLOAT_RETURN_LLONG: pContext->Function.pfFloatReturnLlong (pContext->v0.f); break; case MODE_FLOAT_RETURN_FLOAT: pContext->Function.pfFloatReturnFloat (pContext->v0.f); break; case MODE_FLOAT_RETURN_DOUBLE: pContext->Function.pfFloatReturnDouble (pContext->v0.f); break; case MODE_DOUBLE_RETURN_INT: pContext->Function.pfDoubleReturnInt (pContext->v0.d); break; case MODE_DOUBLE_RETURN_FLOAT: pContext->Function.pfDoubleReturnFloat (pContext->v0.d); break; case MODE_DOUBLE_RETURN_DOUBLE: pContext->Function.pfDoubleReturnDouble (pContext->v0.d); break; case MODE_DOUBLE_RETURN_LLONG: pContext->Function.pfDoubleReturnLlong (pContext->v0.d); break; case MODE_INT_INT_RETURN_INT: pContext->Function.pfIntIntReturnInt (pContext->v0.i, pContext->v1.i); break; case MODE_LLONG_LLONG_RETURN_LLONG: pContext->Function.pfLlongLlongReturnLlong (pContext->v0.l, pContext->v1.l); break; case MODE_FLOAT_FLOAT_RETURN_INT: pContext->Function.pfFloatFloatReturnInt (pContext->v0.f, pContext->v1.f); break; case MODE_FLOAT_FLOAT_RETURN_FLOAT: pContext->Function.pfFloatFloatReturnFloat (pContext->v0.f, pContext->v1.f); break; case MODE_DOUBLE_DOUBLE_RETURN_INT: pContext->Function.pfDoubleDoubleReturnInt (pContext->v0.d, pContext->v1.d); break; case MODE_DOUBLE_DOUBLE_RETURN_DOUBLE: pContext->Function.pfDoubleDoubleReturnDouble(pContext->v0.d, pContext->v1.d); break; case MODE_MAX: break; } t1 = _GetTime(); return t1 - t0; } /********************************************************************* * * _GetSeqName() * * Function description * Get the name of a sequence by its id. */ static const char * _GetSeqName(unsigned Seq) { switch (Seq) { case SEQUENCE_SPECIAL_F32xF32: case SEQUENCE_SPECIAL_F64xF64: return "+-Inf, +-NaN, +-0"; case SEQUENCE_TYPICAL_F32xF32: case SEQUENCE_TYPICAL_F64xF64: return "Random distribution over (0, 1), operands differ"; case SEQUENCE_31_INT: case SEQUENCE_31_FLOAT: case SEQUENCE_31_DOUBLE: return "Random distribution with magnitudes (1..2^31)"; case SEQUENCE_31_INT | SEQUENCE_SIGNED: case SEQUENCE_31_FLOAT | SEQUENCE_SIGNED: case SEQUENCE_31_DOUBLE | SEQUENCE_SIGNED: return "Random distribution with magnitudes (1..2^31), signed"; case SEQUENCE_63_LLONG: case SEQUENCE_63_FLOAT: case SEQUENCE_63_DOUBLE: return "Random distribution with magnitudes (1..2^63)"; case SEQUENCE_63_LLONG | SEQUENCE_SIGNED: case SEQUENCE_63_FLOAT | SEQUENCE_SIGNED: case SEQUENCE_63_DOUBLE | SEQUENCE_SIGNED: return "Random distribution with magnitudes (1..2^63), signed"; default: return "<unknown>"; } } /********************************************************************* * * _InitSeq() * * Function description * Initialize a sequence. */ static void _InitSeq(ExecSequence *pSeq, SEQUENCE seq) { pSeq->seq = seq; pSeq->index = 0; pSeq->last = 0; pSeq->sign = 1; } /********************************************************************* * * _NextSeq() * * Function description * Get and prepare the next sequence. */ static int _NextSeq(ExecSequence *pSeq, ExecContext *pCtx) { // if (pSeq->last) { if (pSeq->seq & SEQUENCE_SIGNED) { pSeq->sign = -pSeq->sign; pSeq->index = 0; pSeq->last = 0; if (pSeq->sign == 1) { return 0; } } else { return 0; } } // switch (pSeq->seq & 0x1f) { case SEQUENCE_END: pSeq->last = 1; break; // case SEQUENCE_SPECIAL_F32xF32: pCtx->v0.i = _aFloatSpecials[pSeq->index % COUNTOF(_aFloatSpecials)]; pCtx->v1.i = _aFloatSpecials[pSeq->index / COUNTOF(_aFloatSpecials)]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aFloatSpecials)*COUNTOF(_aFloatSpecials); break; // case SEQUENCE_TYPICAL_F32xF32: pCtx->v0.f = _aFloatRandomUniformDistribution1[pSeq->index % COUNTOF(_aFloatRandomUniformDistribution1)]; pCtx->v1.f = _aFloatRandomUniformDistribution2[pSeq->index / COUNTOF(_aFloatRandomUniformDistribution1)]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aFloatRandomUniformDistribution1)*COUNTOF(_aFloatRandomUniformDistribution2); break; // case SEQUENCE_SPECIAL_F64xF64: pCtx->v0.l = _aDoubleSpecials[pSeq->index % COUNTOF(_aDoubleSpecials)]; pCtx->v1.l = _aDoubleSpecials[pSeq->index / COUNTOF(_aDoubleSpecials)]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aDoubleSpecials)*COUNTOF(_aDoubleSpecials); break; // case SEQUENCE_TYPICAL_F64xF64: pCtx->v0.d = _aDoubleRandomUniformDistribution1[pSeq->index % COUNTOF(_aDoubleRandomUniformDistribution1)]; pCtx->v1.d = _aDoubleRandomUniformDistribution2[pSeq->index / COUNTOF(_aDoubleRandomUniformDistribution1)]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aDoubleRandomUniformDistribution1)*COUNTOF(_aDoubleRandomUniformDistribution2); break; // case SEQUENCE_31_INT: pCtx->v0.i = pSeq->sign * _aInt31[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aInt31); break; // case SEQUENCE_31_FLOAT: pCtx->v0.f = pSeq->sign * _aFloat31[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aFloat31); break; // case SEQUENCE_31_DOUBLE: pCtx->v0.d = pSeq->sign * _aDouble31[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aLlong63); break; // case SEQUENCE_63_LLONG: pCtx->v0.l = pSeq->sign * _aLlong63[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aLlong63); break; // case SEQUENCE_63_FLOAT: pCtx->v0.f = pSeq->sign * _aFloat63[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aLlong63); break; // case SEQUENCE_63_DOUBLE: pCtx->v0.d = pSeq->sign * _aDouble63[pSeq->index]; ++pSeq->index; pSeq->last = pSeq->index >= COUNTOF(_aLlong63); break; // default: _Logf("Unknown sequence!\n"); pSeq->last = 1; break; } // return 1; } /********************************************************************* * * _Benchmark() * * Function description * Run a benchmark function. */ static void _Benchmark(void *pFn, ExecMode Mode, const char *sLabel, ...) { va_list ap; u32 Min; u32 Max; u32 Cnt; u32 Tot; u32 t; SEQUENCE SelSeq; ExecContext Ctx; ExecSequence Seq; // va_start(ap, sLabel); // Ctx.Function.pfVoidReturnVoid = (VoidFunc)pFn; Ctx.Mode = Mode; // SelSeq = (SEQUENCE)va_arg(ap, unsigned); while (SelSeq != SEQUENCE_END) { _InitSeq(&Seq, SelSeq); Min = ~0u; Max = 0u; Tot = 0; Cnt = 0; while (_NextSeq(&Seq, &Ctx)) { t = _Time(&Ctx); t -= _aOverhead[Ctx.Mode]; Cnt += 1; Tot += t; if (t < Min) { Min = t; } if (t > Max) { Max = t; } if (t > 100) { t = _Time(&Ctx); } } _Logf("%-15s %6u %6u %6.1f %s\n", sLabel, Min, Max, (float)Tot / Cnt, _GetSeqName(SelSeq)); SelSeq = (SEQUENCE)va_arg(ap, unsigned); } } /********************************************************************* * * _CalculateOverheads() * * Function description * Get the overheads of calling functions. */ static void _CalculateOverheads(void) { ExecContext Context; // memset(&Context, 0, sizeof(Context)); Context.Function.pfVoidReturnVoid = NullVoidReturnVoid; // Context.Mode = (ExecMode)0; while (Context.Mode < MODE_MAX) { _aOverhead[Context.Mode] = _Time(&Context); Context.Mode = (ExecMode)(Context.Mode+1); } } /********************************************************************* * * Global functions * ********************************************************************** */ /********************************************************************* * * main() * * Function description * Application entry point. */ int main(void) { _Logf("IEEE-754 Floating-point Library Benchmarks\n"); _Logf("Copyright (c) 2018-2019 SEGGER Microcontroller GmbH.\n\n"); // _Logf("Target: Cortex-M"); _Logf("\n\n"); // _Logf("Function Min Max Avg Description\n"); _Logf("-------------- ------ ------ ------ -------------------------------\n"); // _CalculateOverheads(); // _Benchmark((void *)__aeabi_fadd, MODE_FLOAT_FLOAT_RETURN_FLOAT, "__aeabi_fadd", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fsub, MODE_FLOAT_FLOAT_RETURN_FLOAT, "__aeabi_fsub", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_frsub, MODE_FLOAT_FLOAT_RETURN_FLOAT, "__aeabi_frsub", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fmul, MODE_FLOAT_FLOAT_RETURN_FLOAT, "__aeabi_fmul", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fdiv, MODE_FLOAT_FLOAT_RETURN_FLOAT, "__aeabi_fdiv", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fcmplt, MODE_FLOAT_FLOAT_RETURN_INT, "__aeabi_fcmplt", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fcmple, MODE_FLOAT_FLOAT_RETURN_INT, "__aeabi_fcmple", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fcmpgt, MODE_FLOAT_FLOAT_RETURN_INT, "__aeabi_fcmpgt", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fcmpge, MODE_FLOAT_FLOAT_RETURN_INT, "__aeabi_fcmpge", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_fcmpeq, MODE_FLOAT_FLOAT_RETURN_INT, "__aeabi_fcmpeq", SPECIAL(SEQUENCE_SPECIAL_F32xF32) SEQUENCE_TYPICAL_F32xF32, SEQUENCE_END); _Benchmark((void *)__aeabi_dadd, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, "__aeabi_dadd", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dsub, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, "__aeabi_dsub", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_drsub, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, "__aeabi_drsub", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dmul, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, "__aeabi_dmul", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_ddiv, MODE_DOUBLE_DOUBLE_RETURN_DOUBLE, "__aeabi_ddiv", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dcmplt, MODE_DOUBLE_DOUBLE_RETURN_INT, "__aeabi_dcmplt", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dcmple, MODE_DOUBLE_DOUBLE_RETURN_INT, "__aeabi_dcmple", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dcmpgt, MODE_DOUBLE_DOUBLE_RETURN_INT, "__aeabi_dcmpgt", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dcmpge, MODE_DOUBLE_DOUBLE_RETURN_INT, "__aeabi_dcmpge", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_dcmpeq, MODE_DOUBLE_DOUBLE_RETURN_INT, "__aeabi_dcmpeq", SPECIAL(SEQUENCE_SPECIAL_F64xF64) SEQUENCE_TYPICAL_F64xF64, SEQUENCE_END); _Benchmark((void *)__aeabi_f2iz, MODE_FLOAT_RETURN_INT, "__aeabi_f2iz", SEQUENCE_31_FLOAT | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_f2uiz, MODE_FLOAT_RETURN_INT, "__aeabi_f2uiz", SEQUENCE_31_FLOAT, SEQUENCE_END); _Benchmark((void *)__aeabi_f2lz, MODE_FLOAT_RETURN_LLONG, "__aeabi_f2lz", SEQUENCE_63_FLOAT | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_f2ulz, MODE_FLOAT_RETURN_LLONG, "__aeabi_f2ulz", SEQUENCE_63_FLOAT, SEQUENCE_END); _Benchmark((void *)__aeabi_i2f, MODE_INT_RETURN_FLOAT, "__aeabi_i2f", SEQUENCE_31_INT | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_ui2f, MODE_INT_RETURN_FLOAT, "__aeabi_ui2f", SEQUENCE_31_INT, SEQUENCE_END); _Benchmark((void *)__aeabi_l2f, MODE_LLONG_RETURN_FLOAT, "__aeabi_l2f", SEQUENCE_63_LLONG | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_ul2f, MODE_LLONG_RETURN_FLOAT, "__aeabi_ul2f", SEQUENCE_63_LLONG, SEQUENCE_END); _Benchmark((void *)__aeabi_d2iz, MODE_DOUBLE_RETURN_INT, "__aeabi_d2iz", SEQUENCE_31_DOUBLE | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_d2uiz, MODE_DOUBLE_RETURN_INT, "__aeabi_d2uiz", SEQUENCE_31_DOUBLE, SEQUENCE_END); _Benchmark((void *)__aeabi_d2lz, MODE_DOUBLE_RETURN_LLONG, "__aeabi_d2lz", SEQUENCE_63_DOUBLE | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_d2ulz, MODE_DOUBLE_RETURN_LLONG, "__aeabi_d2ulz", SEQUENCE_63_DOUBLE, SEQUENCE_END); _Benchmark((void *)__aeabi_i2d, MODE_INT_RETURN_DOUBLE, "__aeabi_i2d", SEQUENCE_31_INT | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_ui2d, MODE_INT_RETURN_DOUBLE, "__aeabi_ui2d", SEQUENCE_31_INT, SEQUENCE_END); _Benchmark((void *)__aeabi_l2d, MODE_LLONG_RETURN_DOUBLE, "__aeabi_l2d", SEQUENCE_63_LLONG | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_ul2d, MODE_LLONG_RETURN_DOUBLE, "__aeabi_ul2d", SEQUENCE_63_LLONG, SEQUENCE_END); _Benchmark((void *)__aeabi_f2d, MODE_FLOAT_RETURN_DOUBLE, "__aeabi_f2d", SEQUENCE_63_FLOAT | SEQUENCE_SIGNED, SEQUENCE_END); _Benchmark((void *)__aeabi_d2f, MODE_DOUBLE_RETURN_FLOAT, "__aeabi_d2f", SEQUENCE_63_DOUBLE | SEQUENCE_SIGNED, SEQUENCE_END); // _Logf("\n"); _Logf("STOP.\n"); // return 0; }