For one of our smaller projects, we wanted to try ARM Mbed for the first time. Surprisingly, we found a bug, just by executing the existing unit tests on an actual embedded target. This post describes how we found the bug, details the technical reasons behind it, and analyzes the implications for a typical test setup in our embedded domain.

When using libraries, we try to make sure that they also pass our quality standards.
First, I took a quick look at what is used to assure code quality within Mbed. I found some unit tests, which was a plus point, but unfortunately, they usually only run them on the x86 platform.

The project will use an ARM Cortex M4, so I tried to recompile those tests for this MCU.
Using our ExecutionPlatform, I was able to quickly run the tests on this embedded target.

I ran the test suits one by one, until I got:

[==========] Running 23 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 23 tests from TestAT_CellularNetwork
[ RUN      ] TestAT_CellularNetwork.Create
[       OK ] TestAT_CellularNetwork.Create (0 ms)
...
[ RUN      ] TestAT_CellularNetwork.test_AT_CellularNetwork_get_network_registering_mode
../features/cellular/framework/AT/at_cellularnetwork/at_cellularnetworktest.cpp:367: Failure
Value of: mode == -1
  Actual: false
Expected: true
[  FAILED  ] TestAT_CellularNetwork.test_AT_CellularNetwork_get_network_registering_mode (0 ms)
...
[----------] 23 tests from TestAT_CellularNetwork (0 ms total)

[----------] Global test environment tear-down
[==========] 23 tests from 1 test suite ran. (0 ms total)
[  PASSED  ] 22 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestAT_CellularNetwork.test_AT_CellularNetwork_get_network_registering_mode

 1 FAILED TEST

This test was passing on x86. So why didn’t it pass for my target?

Investigation

First, I looked at this specific test case:


TEST_F(TestAT_CellularNetwork, test_AT_CellularNetwork_get_network_registering_mode)
{
    EventQueue que;
    FileHandle_stub fh1;
    ATHandler at(&fh1, que, 0, ",");

    AT_CellularNetwork cn(at, *_dev);

    ATHandler_stub::int_value = 0;
    CellularNetwork::NWRegisteringMode mode = CellularNetwork::NWModeManual;
    EXPECT_TRUE(NSAPI_ERROR_OK == cn.get_network_registering_mode(mode));
    EXPECT_TRUE(mode == CellularNetwork::NWModeAutomatic);

    ATHandler_stub::nsapi_error_value = NSAPI_ERROR_DEVICE_ERROR;
    mode = CellularNetwork::NWModeManual;
    EXPECT_TRUE(NSAPI_ERROR_DEVICE_ERROR == cn.get_network_registering_mode(mode));
    EXPECT_TRUE(mode == -1); 
}

The test failed on

EXPECT_TRUE(mode == -1);

in the last line. At first gance, everything is okay. Let’s look deeper, starting from the expectation that failed. mode should be -1, which looks alright. But what type has mode? It is an enum NWRegisteringMode, that is defined as:

/// Network registering mode
enum NWRegisteringMode {
    NWModeAutomatic = 0,    // automatic registering
    NWModeManual,           // manual registering with plmn
    NWModeDeRegister,       // deregister from network
    NWModeSetOnly,          // set only  (for read command +COPS?), do not attempt registration/deregistration
    NWModeManualAutomatic   // if manual fails, fallback to automatic
};

The first enumerator has the value 0, the following ones always increase by 1, so this enum has values from 0 to 4. The test expects that the mode will be -1. This value is not part of the enum! You might assume that this is not a problem, because the underlying type of enum is always an int, right? Unfortunately that is the wrong answer. The enumeration declaration part of the C++ standard says:

For an enumeration, whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed. Which integral type is used is implementation-defined as the underlying type, except that the underlying type shall not be larger than int, unless the value of an enumerator cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with the value 0.

That means that when we use an enum, we don’t know what the underlying type will be, unless we specify it explicitly.

When we look into the test case more closely, we find that the following function is called:

nsapi_error_t AT_CellularNetwork::get_network_registering_mode(NWRegisteringMode &mode)
{
    int ret;
    nsapi_error_t error = _at.at_cmd_int("+COPS", "?", ret);
    mode = (NWRegisteringMode)ret;
    return error;
}

The integer ret variable is assigned to mode (line 205). When you look into _at.at_cmd_int, you will see that in error cases, -1 is assigned to ret. So there is indeed an assignment of -1 and a test expectation of -1. So, the problem must be connected to the enum type.

A target dependent enum-type

In my case, I was using different compilers for the x86 and Cortex-M4 targets. I checked the code with Compiler Explorer, to see which type each compiler is using:

As you see from the error messages, the ARM gcc is using an unsigned char, while the x86-64 gcc is using unsigned int as the underlying type for enum NWRegisteringMode.

That means that, during assignment, int will be converted to unsigned char (or unsigned int in the original case on our host system). The C++ Standard defines that:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation).  — end note ]

In case of unsigned char, truncation will occur and as a result, only 8 LSBs are assigned to mode. In case of -1, that will be 0xFFFFFFFF in a two’s complement representation, so mode will be equal to 0xFF after truncation.

In the failed expectation, mode (unsigned char) is compared to -1 (int). As the two types are different, mode is converted back to int and then compared. 255 doesn’t equal-1, and the expectation fails.

Furthermore, C++14 defines the assignment of a value that is not part of the enumeration as unspecified:

A value of integral or enumeration type can be explicitly converted to an enumeration type. The value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]). Otherwise, the resulting value is unspecified (and might not be within that range).

In C++20 draft, such assignment is defined as undefined:

A value of an integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]), and otherwise, the behavior is undefined.

Solution

There are different possibilities for fixing this problem:

  • Fix enum
  • Change error handling

Lets look at both of them.

Fix enum

There are various ways to fix this enum.

  • Add an extra value to enum that represents -1
  • Specify the underlying type
  • Both

In our case, I would prefer adding -1, which is used as an error indication. You should use only values that are defined inside enum. I would also convert this enum to a scoped one, like this:

/// Network registering mode
enum class NWRegisteringMode {
    Error = -1,
    Automatic = 0,    // automatic registering
    Manual,           // manual registering with plmn
    DeRegister,       // deregister from network
    SetOnly,          // set only  (for read command +COPS?), do not attempt registration/deregistration
    ManualAutomatic   // if manual fails, fallback to automatic
};

With scoped enums, we do not pollute namespace with all enum entries, so we don’t need prefixes (NWMode), and it doesn’t implicitly cast to an int. To access values, we have to use the enum name, followed by the value, similar to accessing static members (e.g. NWRegisteringMode::Automatic).

Change error handling

When we look at the call stack of this test case, we see these functions:

nsapi_error_t get_network_registering_mode(NWRegisteringMode &mode);
nsapi_error_t at_cmd_int(const char *cmd, const char *cmd_chr, int &resp, const char *format = "", ...);

Both follow the same pattern: the return value is used for error handling and the output is passed through reference (mode, resp). This pattern has some problems. First, what value output should there be in case of errors? We need to define some value(s) as invalid. Second, the return value could be ignored and as a result, the software would try to process an invalid value.

There are various ways to improve this, such as:

But error handling will be dealt with in a different post.

Conclusion

When you are writing software that runs on different architectures, you should check that it really behaves as expected on all these architectures. Depending on the compiler and compiler options you are using, you may get some differences.

enum is one example of these dark corners of C++, that are “implementation-defined”. There are many other things that the standard leaves to the compiler.
The index of implementation-defined behavior provides a good overview.

If you want to run unit tests on your target microcontroller, check out our ExecutionPlatform. It is a development tool that makes this really easy and automated, without any further hardware setup.


Edit (01.07.2020):
After filing a bug on the official repo, I received some more references from the C++ standard, about assigning values not defined in enum. I added these remarks. Fortunately, the Mbed team was also able to resolve this issue quickly.