Carl's web pages
Plugins — Denormals
Denormals (or subnormals) are simply very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them; the drops can be up to a factor of 100 times!
There are a few weapons that we can use against this problem; they vary in efficacy depending on the CPU.
-msse -mfpmath=sseto GCC.
-msse -mfpmath=sseto GCC and switch the CPU to denormals-are-zero mode.
-msse -mfpmath=sseto GCC and switch the CPU to flush-to-zero mode.
-ffast-math -msse -mfpmath=sseto GCC.
-msse -mfpmath=sse -ffast-math.
-msse -mfpmath=sseand DAZ set.
-ffast-mathonly, some do not.
-msse -mfpmath=sseand FTZ set, some do not.
-msse -mfpmath=sseto GCC when building your plugins for distribution, unless you want to support 1999-ish-era CPUs, in which case build and distribute two versions: one with SSE and one without.
-msse -mfpmath=sse -ffast-mathto GCC if you do not mind what
-ffast-mathdoes to your FP code.
I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiples the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.
I then asked some helpful members of the Linux audio community to run this test program on their computers. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.
|CPU||GCC||No flags||fast-math||SSE||SSE DAZ||SSE FTZ||SSE fast-math||SSE DAZ fast-math||SSE FTZ fast-math|
|64-bit Core i3||4.6.1||7||1||7||1||7||1||1||1|
|64-bit Phenom II X6 1090T||4.4.5||8||1||8||1||1||1||1||1|
|64-bit Phenom II X4 940||4.6.2||8||1||8||1||1||1||1||1|
|64-bit Athlon 64 X2||4.6.1||8||1||8||1||1||1||1||1|
|32-bit Core 2 Duo||4.4.5||40||40||8||1||11||1||1||1|
|32-bit Athlon 64 X2||4.1.2||6||6||9||1||1||1||1||1|
|32-bit Pentium 3||4.1.2||12||12||6||N/A||4||4||N/A||4|
|32-bit VIA Nehemiah||4.1.2||1.2||1.3||1.5||N/A||1.4||1.4||N/A||1.4|
It seems that 64-bit-mode CPUs tend to be happy with just
-msse -mfpmath=sse and
DAZ if you prefer. 32-bit-mode CPUs are a little harder to
-msse -mfpmath=sse with DAZ appears to
be the best option, provided the CPU supports SSE.