Carl's web pages

Plugins — Denormals

Denormals (or subnormals) are simply very small floating point numbers. When they drop below a threshold size, many CPUs (under many different conditions) exhibit considerable drops in performance when processing them; the drops can be up to a factor of 100 times!

What's to be done?

There are a few weapons that we can use against this problem; they vary in efficacy depending on the CPU.

What works?

In summary, the following rules appear to hold:
  1. All CPUs are slower when processing denormals with GCC's default flags.
  2. All CPUs get back to full speed with -msse -mfpmath=sse -ffast-math.
  3. All CPUs get back to full speed with -msse -mfpmath=sse and DAZ set.
  4. Some CPUs get back to full speed with -ffast-math only, some do not.
  5. Some CPUs get back to full speed with -msse -mfpmath=sse and FTZ set, some do not.
  6. The VIA Nehemiah does not seem to care much, and the P3 does not seem to listen to any of the remedies.
  7. Fixing the plugin code will always work, if done right; but it may not be easy.

To summarise the summary

  1. Pass -msse -mfpmath=sse to GCC when building your plugins for distribution, unless you want to support 1999-ish-era CPUs, in which case build and distribute two versions: one with SSE and one without.
  2. Pass -msse -mfpmath=sse -ffast-math to GCC if you do not mind what -ffast-math does to your FP code.
  3. If you are writing a host, set the CPU to denormals-are-zero mode if the CPU supports it.
  4. If you want to get it right from the start, fix your plugin so it does not generate denormals in the first place. My plugin torture tester may help you with this.

Are you sure about all this?

Well, maybe.

I wrote a small, simple test program. This makes a 256-sample buffer of 1s and the multiples the buffer by some value x 10 million times. It compares the time taken to do this when x is 1 as against the case when x is 1e-39. It performs this test using various GCC flags and other settings.

I then asked some helpful members of the Linux audio community to run this test program on their computers. The results are shown below. Numbers are the approximate factors by which the denormal test is slower than the normal test.

CPU GCC No flags fast-math SSE SSE DAZ SSE FTZ SSE fast-math SSE DAZ fast-math SSE FTZ fast-math
64-bit Core i3 4.6.1 7 1 7 1 7 1 1 1
64-bit Phenom II X6 1090T 4.4.5 8 1 8 1 1 1 1 1
64-bit Phenom II X4 940 4.6.2 8 1 8 1 1 1 1 1
64-bit Atom 4.6.1 12 1 11 1 11 1 1 1
64-bit Athlon 64 X2 4.6.1 8 1 8 1 1 1 1 1
32-bit Xeon 3.3.2 113 110 57 1 54 57 1 55
32-bit Core 2 Duo 4.4.5 40 40 8 1 11 1 1 1
32-bit Athlon 64 X2 4.1.2 6 6 9 1 1 1 1 1
32-bit Pentium 3 4.1.2 12 12 6 N/A 4 4 N/A 4
32-bit VIA Nehemiah 4.1.2 1.2 1.3 1.5 N/A 1.4 1.4 N/A 1.4

It seems that 64-bit-mode CPUs tend to be happy with just -ffast-math, or -msse -mfpmath=sse and DAZ if you prefer. 32-bit-mode CPUs are a little harder to please; -msse -mfpmath=sse with DAZ appears to be the best option, provided the CPU supports SSE.