Home

Audio plugins on Linux

Denormals

Denormals (or subnormals) are very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them. These drops can be as much as a factor of 100 times!

What can be done?

There are a few things we can do which may solve this problem. They vary in efficacy depending on the CPU.

fast-math pass -ffast-math to GCC
SSE pass -msse -mfpmath=sse to GCC.
SSE DAZ pass -msse -mfpmath=sse to GCC and switch the CPU to denormals-are-zero mode.
SSE FTZ pass -msse -mfpmath=sse to GCC and switch the CPU to flush-to-zero mode.
SSE fast-math pass -ffast-math -msse -mfpmath=sse to GCC.
SSE DAZ fast-math do SSE fast-math and switch the CPU to denormals-are-zero mode.
SSE FTZ fast-math do SSE fast-math and switch the CPU to flush-to-zero mode.

Finally, it is usually possible to fix the plugin code so that it does not generate denormals.

What works?

The following rules appear to hold:
  1. All CPUs are slower when processing denormals with GCC’s default flags.
  2. All CPUs get back to full speed with -msse -mfpmath=sse -ffast-math.
  3. All CPUs get back to full speed with -msse -mfpmath=sse and DAZ set.
  4. Some CPUs get back to full speed with -ffast-math only, some do not.
  5. Some CPUs get back to full speed with -msse -mfpmath=sse and FTZ set, some do not.
  6. The VIA Nehemiah has no serious problem with denormals no matter what GCC flags or CPU modes are used.
  7. The P3 always has problems with denormals no matter what GCC flags or CPU modes are used.
  8. Fixing the plugin code will always work, if done right; but it may not be easy.

To summarise the summary

  1. Pass -msse -mfpmath=sse to GCC when building your plugins for distribution, unless you want to support 1999-ish-era CPUs, in which case build and distribute two versions: one with SSE and one without.
  2. Pass -msse -mfpmath=sse -ffast-math to GCC if you do not mind what -ffast-math does to your FP code.
  3. If you are writing a host, set the CPU to denormals-are-zero mode if the CPU supports it.
  4. If you want to get it right from the start, fix your plugin so it does not generate denormals in the first place. My plugin torture tester may help you with this.

Are you sure about all this?

Well, maybe.

I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiples the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.

This program was run on several different platforms. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.

CPU GCC No flags fast-math SSE SSE DAZ SSE FTZ SSE fast-math SSE DAZ fast-math SSE FTZ fast-math
64-bit Core i34.6.171717111
64-bit Phenom II X6 1090T4.4.581811111
64-bit Phenom II X4 9404.6.281811111
64-bit Atom4.6.112111111111
64-bit Athlon 64 X24.6.181811111
32-bit Xeon3.3.21131105715457155
32-bit Core 2 Duo3.4.540408111111
32-bit Athlon 64 X24.1.266911111
32-bit Pentium 34.1.212126N/A44N/A4
32-bit VIA Nehemiah4.1.21.21.31.5N/A1.41.4N/A1.4