Audio plugins on Linux
Denormals (or subnormals) are very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them. These drops can be as much as a factor of 100 times!
There are a few things we can do which may solve this problem. They vary in efficacy depending on the CPU.
|SSE DAZ||pass |
|SSE FTZ||pass |
|SSE DAZ fast-math||do SSE fast-math and switch the CPU to denormals-are-zero mode.|
|SSE FTZ fast-math||do SSE fast-math and switch the CPU to flush-to-zero mode.|
Finally, it is usually possible to fix the plugin code so that it does not generate denormals.
-msse -mfpmath=sse -ffast-math.
-msse -mfpmath=sseand DAZ set.
-ffast-mathonly, some do not.
-msse -mfpmath=sseand FTZ set, some do not.
-msse -mfpmath=sseto GCC when building your plugins for distribution, unless you want to support 1999-ish-era CPUs, in which case build and distribute two versions: one with SSE and one without.
-msse -mfpmath=sse -ffast-mathto GCC if you do not mind what
-ffast-mathdoes to your FP code.
I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiples the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.
This program was run on several different platforms. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.
|CPU||GCC||No flags||fast-math||SSE||SSE DAZ||SSE FTZ||SSE fast-math||SSE DAZ fast-math||SSE FTZ fast-math|
|64-bit Core i3||4.6.1||7||1||7||1||7||1||1||1|
|64-bit Phenom II X6 1090T||4.4.5||8||1||8||1||1||1||1||1|
|64-bit Phenom II X4 940||4.6.2||8||1||8||1||1||1||1||1|
|64-bit Athlon 64 X2||4.6.1||8||1||8||1||1||1||1||1|
|32-bit Core 2 Duo||3.4.5||40||40||8||1||11||1||1||1|
|32-bit Athlon 64 X2||4.1.2||6||6||9||1||1||1||1||1|
|32-bit Pentium 3||4.1.2||12||12||6||N/A||4||4||N/A||4|
|32-bit VIA Nehemiah||4.1.2||1.2||1.3||1.5||N/A||1.4||1.4||N/A||1.4|