Carl's web pages
Plugins — Denormals
Denormals (or subnormals) are simply very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them; the drops can be up to a factor of 100 times!
There are a few weapons that we can use against this problem; they vary in efficacy depending on the CPU.
-ffast-math to GCC.
-msse -mfpmath=sse to GCC.
-msse -mfpmath=sse to GCC and switch the CPU to denormals-are-zero mode.
-msse -mfpmath=sse to GCC and switch the CPU to flush-to-zero mode.
-ffast-math -msse -mfpmath=sse to GCC.
-msse -mfpmath=sse -ffast-math.
-msse -mfpmath=sse and DAZ set.
-ffast-math only, some do not.
-msse -mfpmath=sse and FTZ set, some do not.
-msse -mfpmath=sse to GCC when building your
plugins for distribution, unless you want to support 1999-ish-era
CPUs, in which case build and distribute two versions: one with SSE
and one without.
-msse -mfpmath=sse -ffast-math to GCC if you do
not mind what -ffast-math does to your FP code.
Well, maybe.
I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiples the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.
I then asked some helpful members of the Linux audio community to run this test program on their computers. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.
| CPU | GCC | No flags | fast-math | SSE | SSE DAZ | SSE FTZ | SSE fast-math | SSE DAZ fast-math | SSE FTZ fast-math |
|---|---|---|---|---|---|---|---|---|---|
| 64-bit Core i3 | 4.6.1 | 7 | 1 | 7 | 1 | 7 | 1 | 1 | 1 |
| 64-bit Phenom II X6 1090T | 4.4.5 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
| 64-bit Phenom II X4 940 | 4.6.2 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
| 64-bit Atom | 4.6.1 | 12 | 1 | 11 | 1 | 11 | 1 | 1 | 1 |
| 64-bit Athlon 64 X2 | 4.6.1 | 8 | 1 | 8 | 1 | 1 | 1 | 1 | 1 |
| 32-bit Xeon | 3.3.2 | 113 | 110 | 57 | 1 | 54 | 57 | 1 | 55 |
| 32-bit Core 2 Duo | 4.4.5 | 40 | 40 | 8 | 1 | 11 | 1 | 1 | 1 |
| 32-bit Athlon 64 X2 | 4.1.2 | 6 | 6 | 9 | 1 | 1 | 1 | 1 | 1 |
| 32-bit Pentium 3 | 4.1.2 | 12 | 12 | 6 | N/A | 4 | 4 | N/A | 4 |
| 32-bit VIA Nehemiah | 4.1.2 | 1.2 | 1.3 | 1.5 | N/A | 1.4 | 1.4 | N/A | 1.4 |
It seems that 64-bit-mode CPUs tend to be happy with just
-ffast-math, or -msse -mfpmath=sse and
DAZ if you prefer. 32-bit-mode CPUs are a little harder to
please; -msse -mfpmath=sse with DAZ appears to
be the best option, provided the CPU supports SSE.