UP | HOME

Floating point woes

I was packaging my personal C++ standard library, which contains (among other things) some code I use for doing hobby robotics projects. Part of that code includes support for vectors that are usually float or double (in Go, float32 vs float64). Typically, I work on the library itself on my Linux workstation where I have a shiny keyboard and big monitor. Everything was working: unit tests passed, valgrind reported no errors, CI was happy.

Everything was great, right up until I went to build packages on my Raspberry Pi, where the unit tests suddenly started failing.

Running tests...
Test project /home/kyle/tmp/scsl/build
      Start  1: test_buffer
 1/10 Test  #1: test_buffer ......................   Passed    0.01 sec
      Start  2: test_tlv
 2/10 Test  #2: test_tlv .........................   Passed    0.01 sec
      Start  3: test_dictionary
 3/10 Test  #3: test_dictionary ..................   Passed    0.01 sec
      Start  4: test_stringutil
 4/10 Test  #4: test_stringutil ..................   Passed    0.01 sec
      Start  5: test_coord2d
 5/10 Test  #5: test_coord2d .....................   Passed    0.01 sec
      Start  6: test_madgwick
 6/10 Test  #6: test_madgwick ....................   Passed    0.01 sec
      Start  7: test_math
 7/10 Test  #7: test_math ........................   Passed    0.01 sec
      Start  8: test_orientation
 8/10 Test  #8: test_orientation .................   Passed    0.01 sec
      Start  9: test_quaternion
 9/10 Test  #9: test_quaternion ..................   Passed    0.01 sec
      Start 10: test_vector
10/10 Test #10: test_vector ......................***Failed    0.01 sec

90% tests passed, 1 tests failed out of 10

Total Test time (real) =   0.16 sec

The following tests FAILED:
     10 - test_vector (Failed)

What was going on? The Raspberry Pi is an arm64 system, so I suspected that the tests would also fail on my Macbook Air. I was, unfortunately, correct. I fired up the debugger in CLion, narrowing it down to a particular test case that was failing both in the float and double versions of the vector:

geom::Vector3F e {-2.0, 1.0, 3.0};
geom::Vector3F f {-6.0, 3.0, 9.0};

SCTEST_CHECK(e.IsParallel(f));
SCTEST_CHECK_FALSE(e.IsOrthogonal(f));

The IsParallel check is the one that failed. Time to look at how that works.

/// \brief Compute the Angle between two vectors.
///
/// \param other Another Vector<T, N>.
/// \return The angle in radians between the two vectors.
T
Angle(const Vector<T, N> &other) const
{
    auto unitA = this->UnitVector();
    auto unitB = other.UnitVector();

    // Can't compute angles with a zero vector.
    assert(!this->IsZero());
    assert(!other.IsZero());

    return static_cast<T>(std::acos(unitA * unitB));
}

I was getting an angle of \(0.00034526698\) with the float vector, and nan with the double vector. This smells like floating point math problems, so I wrote up some test code in Python to check it out… which gave me the right answer. Then I turned to Go.

Go test this somewhere else

This is on the Go playground.

// Vector math illustration. In my C++ code, this is
// a class and it packaged a little better, but I wanted
// to verify that it was an FPU issue, not a software
// issue.
package main

import (
    "fmt"
    "math"
    "runtime"
)

func goVersion() string {
    return fmt.Sprintf("go version %s %s/%s", runtime.Version(),
        runtime.GOOS, runtime.GOARCH)
}

type VT interface{ float32 | float64 }

func dot[X VT](a []X, b []X) X {
    dot := X(0)
    if len(a) != len(b) {
        panic("vectors not in the same order")
    }

    for i := range a {
        dot += (a[i] * b[i])
    }

    return dot
}

func mag[X VT](v []X) X {
    sum := X(0)
    for _, elt := range v {
        sum += (elt * elt)
    }
    return X(math.Sqrt(float64(sum)))
}

func unit[X VT](v []X) []X {
    u := make([]X, len(v))
    m := mag(v)
    for i := range v {
        u[i] = v[i] / m
    }

    return u
}

func angle[X VT](a []X, b []X) X {
    ua := unit(a)
    ub := unit(b)

    dt := float64(dot(ua, ub))
    theta := X(math.Acos(dt))
    return theta
}

var va = []float32{-2.0, 1.0, 3.0}
var vb = []float32{-6.0, 3.0, 9.0}
var vc = []float64{-2.0, 1.0, 3.0}
var vd = []float64{-6.0, 3.0, 9.0}

func main() {
    fmt.Println("floating point sketch")
    fmt.Println(goVersion())
    fmt.Println()

    fmt.Println("*** float32 ***")
    fmt.Println(unit(va))
    fmt.Println(unit(vb))
    fmt.Println(dot(unit(va), unit(vb)))
    fmt.Println(angle(va, vb))

    fmt.Println()
    fmt.Println("*** float 64 ***")
    fmt.Println(unit(vc))
    fmt.Println(unit(vd))
    fmt.Println(dot(unit(vc), unit(vd)))
    fmt.Println(angle(vc, vd))
}

Go, Linux amd64

This is on the Go playground. I assumed it would be a cloud instance running Linux amd64 (narrator: it was.)

floating point sketch
go version go1.21.3 linux/amd64

*** float32 ***
[-0.5345225 0.26726124 0.8017837]
[-0.5345225 0.26726124 0.80178374]
1
0

*** float 64 ***
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
1
0

arm64

Just like on my Linux amd64 desktop, this looks right. What happens if we run it on MacOS?

% go run vec.go
floating point sketch
go version go1.21.0 darwin/arm64

*** float32 ***
[-0.5345225 0.26726124 0.8017837]
[-0.5345225 0.26726124 0.80178374]
0.99999994
0.00034526698

*** float 64 ***
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
1.0000000000000002
NaN

And on that pesky Raspberry Pi?

floating point sketch
go version go1.21.3 linux/arm64

*** float32 ***
[-0.5345225 0.26726124 0.8017837]
[-0.5345225 0.26726124 0.80178374]
0.99999994
0.00034526698

*** float 64 ***
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
[-0.5345224838248488 0.2672612419124244 0.8017837257372732]
1.0000000000000002
NaN

Okay, so it's not a problem with something I did, except that it's a problem with something I did. The math is right, it just doesn't work in the real world. That's probably a parallel for something else, but I'll stick to debugging weird computer problems instead.

Adjusting the math to fit reality

/// \brief Determine whether two vectors are parallel.
///
/// \param other Another vector
/// \return True if the Angle between the vectors is zero.
bool
IsParallel(const Vector<T, N> &other) const
{
    if (this->IsZero() || other.IsZero()) {
        return true;
    }

    // If the two unit vectors are equal, the two vectors
    // lie on the same path.
    //
    // Context: this used to use Vector::Angle to check for
    // a zero angle between the two. However, the vagaries
    // of floating point math meant that while this worked
    // fine on Linux amd64 builds, it failed on Linux arm64
    // and MacOS builds. Parallel float vectors would have
    // an angle of ~0.0003 radians, while double vectors
    // would have an angle of +NaN. I suspect this is due to
    // tiny variations in floating point math, such that a dot
    // product of unit vectors would be just a hair over 1,
    // e.g. 1.000000001 - which would still fall outside the
    // domain of acos.
    auto unitA = this->UnitVector();
    auto unitB = other.UnitVector();

    return unitA == unitB;
}

Why does this work? To answer that, we need to understand that for two vectors to be parallel, the angle between them must be 0. The way to calculate the angle between two vectors \(\vec a\) and \(\vec b\) is first to determine their unit vectors, \(\hat{a}\) and \(\hat{b}\).

The unit vector is a normalised vector; when vectors are used to represent spatial information (whether position, velocity, and so forth), they are sometimes called the direction vector.

\[ \hat{v} = \frac{\vec v}{||v||} \]

Conversely, \(||v|| \times \hat{v} = \vec v\).

So, we want to find the angle between the directions that our two vectors \(\vec a\) and \(\vec b\) are pointing. To do that, we take the dot product of the two vectors, which must both be of the same order (that is, they have the same dimensions):

\[ \dot v = \sum \vec a \times \vec b \]

For a three-dimensional vector with basis \(i\), \(j\), \(k\), we have

\[ \dot{v} = a_i b_i + a_j b_j + a_k b_k \]

Why do we need the unit vectors to calculate the angle? We're going to compute the inverse of the cosine (the arccosine, or \(\acos\) in most programming languages). This needs a value in the domain \([-1, 1]\), and maps that to a value \([0, \pi]\) , representing the angle between the two. We need vectors that sum up to 1 for this to happen, so we take the magnitude out of the equation through the unit vectors.

In the original code, there were two distinct errors happening:

  1. The angle between the two float vectors (which should have been zero), was actually \(0.00034526698\). The default tolerance for the library is \(0.0001\); so this falls out of tolerance by about \(0.00025\). I tried changing the tolerance to \(0.001\), and that worked for this case. However, it surfaced the other issue:
  2. The angle between the two double vectors is \(NaN\), or "not a number". If an input to \(acos\) is outside the range \([-1, 1]\), it is outside the domain of \(acos\) and it can't return a meaningful answer.

Why was the angle between the float vectors \(0.00035\), and the angle between the double vectors meaningless? It has to do with how floating point numbers work. Essentially, some numbers aren't representable, and small errors creep in. This tends to have a large effect on the 32-bit numbers, because there's fewer bits with which to represent numbers. However, it can be more insidious with the 64-bit numbers because things are off by such a tiny amount, and yet it ruins everything. Take for example the dot product of the two 64-bit unit vectors: \(1.0000000000000002\). That is off by 200 quintillionths, and yet that still falls outside the domain of \(acos\). These errors only compound the more floating point math you do, so things get progressively worse (by extremely small amounts).

The textbook definition of parallel vectors that I learned was that the angle between them is zero. However, we can also note that the unit vectors for any two vectors will be the same if they're going in the same direction - which would make them parallel. So rather than computing the angle, we can compare the unit vectors. If they're the same, they're parallel. That fix turns out to work - less math, fewer chances for error, I guess.

As an aside, one way around this is to use rational numbers, but that's a whole other topic for another day and this already took up enough of my day.