﻿ Petzold Book Blog - The Limits of float

PETZOLD BOOK BLOG

#### Charles Petzold on writing books, reading books, and exercising the internal UTM

 Recent Entries < Previous Browse the Archives Next > Subscribe to the RSS Feed

## The Limits of float

May 30, 2010
Roscoe, N.Y.

One of the first things the veteran C# programmer notices when learning XNA programming is that all floating-point values are single-precision float rather than double-precision double. This not only reduces storage space (4 bytes each rather than 8 bytes) but also improves performance — at least in theory. (My extremely brief experimentation of the performance differential on the PC reveals something in the range of only about 5% improvement, but it may be more substantial on other devices.)

Although float is fine for most purposes in computer graphics, it can be problematic is some circumstances. For example, suppose you want to animate some graphic or text by continuously rotating it 360° every second. Both the Draw and DrawString methods of SpriteBatch have overloads that accept a rotation angle of type float. It's common to store this rotation angle as a field:

float angle;

A new value is then calculated during every call to the Game derivative's Update method. Recently I've been using a calculation that looks something like this:

angle = MathHelper.TwoPi * (float)gameTime.TotalGameTime.TotalSeconds;

The GameTime argument to the Update method has a TotalGameTime property of type TimeSpan indicating the total time since the game began. The TotalSeconds property of the TimeSpan object is of type double. I simply cast that to a float and multiply it by MathHelper.TwoPi (an XNA static field of type float) to obtain an angle in radians.

From the very first time I typed in a statement like this, I've known two things:

• In the short term, the code works fine.
• In the long term, the code will fail.

I knew the code would fail because of the insufficiency of float to maintain accuracy with large values. But I didn't have an intuitive sense of when the "short term" became the "long term"!

As you might know from reading Chapter 23 of my book Code: The Hidden Language of Computer Hardware and Software, the ANSI/IEE Standard 754-1985, also known as the IEEE Standard for Binary Floating-Point Arithmetic defines single-precision floating-point values with a 1-bit sign (s in the formula below), a 23-bit significand fraction (f), and an 8-bit exponent (e). In the normal case, a number stored in this format can be calculated as:

(–1)s × 1.f × 2e–127

Let's take an example. Suppose I write a program with some code that calculates an angle value in the way I've shown. After the program has been running a full day, the TotalGameTime property represents a TimeSpan of 24 hours or 86,400 seconds. In binary, that's 1 0101 0001 1000 0000. Because the leading digit is always 1 in conversion to binary, it doesn't have to be stored. The remaining binary digits become the first 16 digits of the 23-bit fraction value:

s = 0
f = 010 1000 1100 0000 0000 0000
e = 143

Or:

86400 = (–1)0 × 1.0101 0001 1000 0000 0000 000 × 2143–127

Those 7 additional bits in the significand fraction allow the representation of fractional seconds. Thus, the values are accurate to 1/128 of a second. Since the video frame rate in XNA ranges from 30 frames per second (for a Zune or Windows Phone 7) to 60 or so (for a PC), accuracy of 1/128 second is fine.

Here's a shortcut technique for visualizing the float representation of numbers greater than 1: Simply write the number in binary with a leading 1 and exactly 24 digits. For example,

86400 = 1 0101 0001 1000 0000.0000 000

Notice the binary point preceding the fractional 7 digits. This shows clearly that numbers in this region have 7-bit fractions for an accuracy of 1/128.

Now let's run the game for a week. At the end of a week, the TotalGameTime property is a TimeSpan representing 604,800 seconds. Write that as a 24-bit binary:

604800 = 1001 0011 1010 1000 0000.0000

Now the fractional part is only 4 bits, and the number is accurate to only 1/16 second. As you cast this TotalSeconds property of the TotalGameTime to a float you are essentially rounding to the nearest 1/16 second, and effectively reducing your frame rate to 16 frames per second. Between one day and seven days, as the float value becomes increasingly unable to accurately represent total time, you'll get some visible jitter and skippiness in the animation.

Might it help to perform the calculation using the original double value of the TotalSeconds property and cast to float when storing the final value? Here's the code:

angle = (float)(Math.PI * gameTime.TotalGameTime.TotalSeconds);

Now I'm using the regular .NET Math.PI static field of type double rather than the XNA MathHelper.Pi static field of type float. After one week, the angle value is calculated as 604,800 × 2π which is about 4,301,109.8988 or in 24-bit binary:

604,800 × 2π = 11 1011 1011 0000 1001 1100.10

Now there's only a two-bit fractional part, so the angle is accurate to 1/4 radian, or 45°. No good! (This problem also exists with the original code but I chose to focus on the time rather than the resultant angle.) You'll have the same problem if you increment the angle based on the ElapsedGameTime property of the GameTime argument:

angle += MathHelper.TwoPi * (float)gameTime.ElapsedGameTime.TotalSeconds;

The ElapsedGameTime is (usually) the time between video frames, either 1/30 or 1/60 second or thereabouts. That's fine for a float and the multiplication is OK as well. The problem occurs when accumulating that incremental value in an already large angle field.

What's the solution? The easiest solution is simply assuming that nobody's going to be running your games for more than a few hours! But probably the best solution involves performing the calculations using double and then normalizing the result between 0 and 2π by finding the remainder using the modulus operation (%). Then it's safe to cast to a float:

angle = (float)(2 * Math.PI * gameTime.TotalGameTime.TotalSeconds %
(2 * Math.PI));

Or, increment the angle field using the ElapsedGameTime property and then normalize the result:

angle += MathHelper.TwoPi * (float)gameTime.ElapsedGameTime.TotalSeconds;
angle %= MathHelper.TwoPi;

Either version will preserve adequate calculational accuracy well beyond a week and for at least a millennium.

It's not on the PC where single-precision improves perfomance -- it's on the GPU. In fact, many GPUs don't have support for double-precision and on those that do, there are fewer double-precision units than single-precision ones.

— Curt Hagenlocher, Sun, 30 May 2010 14:52:54 -0400

Thanks! — Charles

"Now there's only a two-bit fractional part, so the angle is accurate to 1/4 radian, or 45°. No good!"

1/4 radian is actually (360° / 2π) / 4 ~= 14.3°.

Still no good though.

— Michael, Sun, 30 May 2010 17:41:43 -0400

Thanks! I guess I just multiplied 0.25 by 180°. Can't imagine what I was thinking... — Charles

Actually 14.3° isn't too bad. Consider I'm trying to rotate an object 360° every second. If the frame rate is 30 fps (as it is on the phone), then the rotation should be 12° per frame. — Charles

I posted this on reddit and will do so here too:

"I knew the code would fail because of the insufficiency of float to maintain accuracy with large values. But I didn't have an intuitive sense of when the "short term" became the "long term"!"

Wait, is the author implying he's going to let the degree value keep increasing?

if(fDegreeVal > 360) fDegreeVal -= 360;

OR

I've been doing this since starting 3D programming in uni. I'm finding it very difficult to believe the author of this article did not see this simple solution which I knew about before I could even fully understand and appreciate the way linkedlist's worked.

The idea that the way to solve the increasing inaccuracy of floating point values by using double precision floating points seems like a rather inelegant solution to a simple problem like this is extremely bad software practice. It's the kind of thing you might be able to get away with in business related apps but not in games.

The reason XNA does not support double precision floating point is probably because almost no games use double precision floating point and consoles such as the Xbox 360 and Playstation 3 do not even support double precision floating point values.

Seems like this article is a load of crap written by someone with no real game dev experience trying to apply his 'instant fix through brute force' methodology of solving problems to game development.

— linkedlist, Mon, 31 May 2010 02:53:53 -0400

As you can see on page 125 of the preview edition of my upcoming book on Windows Phone 7 programming, I know how to normalize angles. But — like I said — recently I've been using some simplified code and I was curious to know just how bad it was. I wanted to quantify the damage. — Charles

> if(fDegreeVal > 360) fDegreeVal -= 360;

> OR

I'll be, there's a reason for computing in degrees after all. It avoids accumulating a gradually increasing rounding error.

— 0.10471975511965977461542144610932 radians of separation, Mon, 31 May 2010 22:22:18 -0400

Actually normalizing angles by taking the % operation is not really accurate. It is okay for 'small' angles but not for 'big' ones the one that you will get by accumulating values. Angle reduction is very involved mathematically even different makes/models of scientific calculators produce different results for trig functions on 'big' angles.

Iqbal Santoso, Sat, 10 Jul 2010 03:30:23 -0400

In general I also wouldn't recommend using double instead of floats. My personal preference is to use floats for small counters which are wrapping and to use 64 bit integers for large timers (e.g. elapsed game time or current time stamp). The 64 bit integer may either represent milliseconds or performance counter's units. Then, you're free to do any subtraction without loosing _any_ precision.

— game developer, Tue, 23 Nov 2010 17:04:00 -0500

 Recent Entries < Previous Browse the Archives Next > Subscribe to the RSS Feed