↓
↓
Denormals
Denormals (or subnormals) are very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them. These drops can be as much as a factor of 100 times!
There are a few things we can do which may solve this problem. They vary in efficacy depending on the CPU.
fast-math | pass -ffast-math to GCC |
SSE | pass -msse -mfpmath=sse to GCC. |
SSE DAZ | pass -msse -mfpmath=sse to GCC and switch the CPU to denormals-are-zero mode. |
SSE FTZ | pass -msse -mfpmath=sse to GCC and switch the CPU to flush-to-zero mode. |
SSE fast-math | pass -ffast-math -msse -mfpmath=sse to GCC. |
SSE DAZ fast-math | do SSE fast-math and switch the CPU to denormals-are-zero mode. |
SSE FTZ fast-math | do SSE fast-math and switch the CPU to flush-to-zero mode. |
Finally, it is usually possible to fix the plugin code so that it does not generate denormals.
-msse -mfpmath=sse -ffast-math
.
-msse -mfpmath=sse
and DAZ set.
-ffast-math
only, some do not.
-msse -mfpmath=sse
and FTZ set, some do not.
-msse -mfpmath=sse
to GCC when building your
plugins for distribution, unless you want to support 1999-ish-era
CPUs, in which case build and distribute two versions: one with SSE
and one without.
-msse -mfpmath=sse -ffast-math
to GCC if you do
not mind what -ffast-math
does to your FP code.
Andrew Belt reports that the following piece of code will disable denormals with Linux GCC and MinGW:
#include <xmmintrin.h>
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
and the following with clang on OS X:
#include <fenv.h>
fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
Well, maybe.
I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiplies the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.
This program was run on several different platforms. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.
CPU | GCC | No flags | fast-math | SSE | SSE DAZ | SSE FTZ | SSE fast-math | SSE DAZ fast-math | SSE FTZ fast-math |
---|---|---|---|---|---|---|---|---|---|
64-bit Core i3 | 4.6.1 | 7 | 1 | 7 | 1 | 7 | 1 | 1 | 1 |
64-bit Phenom II X6 1090T | 4.4.5 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
64-bit Phenom II X4 940 | 4.6.2 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
64-bit Atom | 4.6.1 | 12 | 1 | 11 | 1 | 11 | 1 | 1 | 1 |
64-bit Athlon 64 X2 | 4.6.1 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
32-bit Xeon | 3.3.2 | 113 | 110 | 57 | 1 | 54 | 57 | 1 | 55 |
32-bit Core 2 Duo | 3.4.5 | 40 | 40 | 8 | 1 | 11 | 1 | 1 | 1 |
32-bit Athlon 64 X2 | 4.1.2 | 6 | 6 | 9 | 1 | 1 | 1 | 1 | 1 |
32-bit Pentium 3 | 4.1.2 | 12 | 12 | 6 | N/A | 4 | 4 | N/A | 4 |
32-bit VIA Nehemiah | 4.1.2 | 1.2 | 1.3 | 1.5 | N/A | 1.4 | 1.4 | N/A | 1.4 |