Oovaide Index

C++ Complexity and Testing

This document examines the relationship between complexity and testing, and clarifies existing tools and measures. Much of this document applies to languages other than C++.

Existing Measures

The most common complexity measure appears to be cyclomatic or McCabe complexity that was developed in 1976. This complexity measurement finds the "number of linearly independent circuits" through a piece of code. McCabe also says in this document that, "using the total number of paths has been found to be impractical".

McCabe's idea was that finding "basic paths-that when taken in combination will generate every possible path". This is much less than the maximum possible number of combinations of groups of statements and is close to the minimum number of tests required for 100% path coverage. In addition, it calculates the number of flows without analyzing the conditional expressions, which can give higher complexity for some code (such as an else if that tests on the same variable as the initial if). This also means that the McCabe complexity is related to logical flow, but not data flow, threading, or arithmetic complexity.

The following examples will attempt to describe McCabe's cyclomatic complexity and show some of the limitations. Hopefully an improved measure of complexity can be found that closely indicates the number of tests that should be performed.

The paper printed in IEEE can be found here: McCabe Complexity
Another version can be found here: Structured Testing
Wikipedia has an article about some complexity measures: Programming_complexity

McCabe's original paper has some graph analysis ideas, followed by a keyword approach to finding complexity. Many of the tools use a keyword approach such as:

Flow Complexity Examples

Example 1

A source code example with an if and else statement is:
  if(c1)
    {
    a();
    }
  else
    {
    b();
    }
The number of possible flows through this code is:
Execution Condition c1
Function a() is executed true
Function b() is executed false

In graph theory, the edges are the connection lines between the nodes.

With McCabe complexity, an if statement always adds a complexity of one to an existing path. This is true whether or not the conditional evaluates to a constant true or false, and the branch is always or never executed.

An if/else also only adds a complexity of one since the only difference is when the other path is taken. With McCabe complexity, an else statement is much simpler than two if statements even though the else condition path's expression is equivalent to "if(!c1)".

This shows that the conditional expressions are important when evaluating complexity, and that McCabe complexity is a model, but is not accurate even for purely logical path complexity.

Example 2

A example with an if and else if statement is:
  if(c1)
    {
    a();
    }
  else if(c2)
    {
    b();
    }
This example is similar this:
  if(c1)
    {
    a();
    }
  else
    {
    if(c2)
      {
      b();
      }
    }
The number of possible flows through this code with two independent conditions is:
Execution Condition c1 Condition c2
No functions are executed false false
Function a() is executed true true or false
Function b() is executed false true

If both conditionals were testing on the same variable, then the execution is more like a switch/case, and complexity is similar to Example 1.

Example 3

A nested if example is:
  if(c1)
    {
    a();
    if(c2)
      {
      b();
      }
    }
The number of possible flows through this code is:
Execution Condition c1 Condition c2
No functions are executed false true or false
Function a() is executed true false
Function a() and b() are executed true true

In this example, the complexity is higher than in example 2 if the statements executed alter values that affect each other. This also means that a function or method that has side effects could be more complex.

Example 4

A source example with 2 sequential if statements is:
  if(c1)
    {
    a();
    }
  if(c2)
    {
    b();
    }
The number of possible flows through this code is:
Execution Condition c1 Condition c2
No functions are executed false false
a() is executed true false
b() is executed false true
a() and b() are executed true true

McCabe's original paper can be confusing because at the beginning of the paper, graph analysis is used, then later only a single count is added for each keyword. It is clear that he did not want McCabe complexity to be combinatorial complexity, but the fact that two sequential if's has the same complexity as two nested if's does not seem to be very accurate.

Example 5

A source example with 3 sequential if statements is:
  if(c1)
    {
    a();
    }
  if(c2)
    {
    b();
    }
  if(c3)
    {
    c();
    }
The number of possible flows through this code is:
Execution Condition c1 Condition c2 Condition c3
No functions are executed false false false
a() is executed true false false
b() is executed false true false
c() is executed false false true
a() and b() are executed true true false
b() and c() are executed false true true
a() and c() are executed true false true
a(), b(), and c() are executed true true true

This example shows that when analyzing graphs, the McCabe complexity (7) does not match the number of combinations (8). The discrepency is that the there is no extra edge indicated for the abc path compared to the ab and bc paths.

Example 6

An example of case statements is:
  switch(c1)
    {
    case 1:
      a();
      break;

    case 2:
    case 3:
      b();
      break;
    }
There are many variations of McCabe case statement complexity. I haven't found any variations that account for fall through case statements.

Example 7

An example with case and default statements:
  switch(c1)
    {
    case 1:
      a();
      break;

    case 4:
      b();
    case 5:
      c();
      break;

    default:
      b();
      break;
    }
The "default" keyword should not change the count since it removes the main path that would have skipped the switch if there were no matching cases.

Example 8

An example with Logical Operators and Statements:
  if(cond1() && cond2())
    a();
This is equivalent to the following.
  if(cond1())
    {
    if(cond2())
      a();
    }
C++ has short-circuit evaluation for the logical or and logical and operators. This means not all conditions (which could be statements) are executed within a conditional test. Short-circuit evaluation

Example 9

An example with Logical Operators:
  if(c1 && c2)
    a();
This is equivalent to the following.
  if(c1)
    {
    if(c2)
      a();
    }
Simple counting of logical operators as keywords is not a very accurate measure compared to other keywords.

Example 10

Empty expressions
Counting keywords would generate a complexity of 2 for a function with an empty if statement.
  if(v1 == 1)
    {
    }
Since this is not common, it is not required that this be analyzed differently than the default behavior.

Complexity Tools

Here are some tool outputs using the above examples compared to the number of actual paths, and McCabe complexity.

Combinations McCabe
Graph
McCabe
Keyword
ACQC vsCCM Source Monitor Oovaide
McCabe
Example 1
If Else
2 2 2 2 2 3 2
Example 2
If/Else If
3 3 3 3 3 3 3
Example 3
Nested If
3 3 3 3 3 3 3
Example 4
Sequential If
4 4 3 3 3 3 3
Example 5
3 Sequential Ifs
8 7 4 4 4 4 4
Example 6
case
3 3 4 4 4 5 4
Example 7
case/default
4 4 4 4 4 5 4
Example 8
logical or/and
statements
4 4 2:3 2 3 3 2
Example 9
logical or/and
2 2 2:3 2 3 3 2

There are many documents on the web indicating that switch case counts are too high. They may be a little high as seen in this table, but the actual reason is that if statement counts in many cases are too low. I have not found any tools that do not decrease the counts on case statements with no intervening statements. I haven't found any tools that incorrectly account for default statements.

Other Reference Documents

A critique of cyclomatic complexity as a software metric
Cyclomatic complexity metrics revisited
On the cyclomatic metric of program complexity
High cyclomatic complexity on switch statements
Measuring Complexity Correctly
McCabe's Cyclomatic Complexity and Why We Don't Use It

Types of Complexity

Test complexity (the number of tests required for a piece of code) is different from complexity. Some examples of things that increase complexity, but don't increase test complexity are: Sometimes extra statements and variables can indicate more testing is needed.

There are many types of complexity, and I have not found a list of the varying types, so some of the names are made up here. They are listed in roughly increasing order here. Types of complexity:

Data and Boundary Value Complexity

OK, so far, McCabe complexity could be tweaked in some manner to produce improved numbers for C++ and path analysis of some form seems to be a pretty good measure of complexity. What about the following code?
  int average(int a, int b) { return (a+b)/2; }
McCabe complexity indicates 1, so only one test is needed to fully test this. Obviously this is wrong. It may be possible to use some set theory similar to what is used in abstract interpretation.

There is no mathematically automated way to find the correct number of tests for testing an arithmetic/numeric algorithm. Some examples of algorithms that could be difficult to test would be something like finding n digits of pi, or sorting a set of data.

The following increase test complexity

The complexity of a method in C++ will depend on the input parameters. This includes class members and global state. (Including files etc.)

Structures and classes increase the complexity based on the number of variables in a struct or class are accessed. In C++, this is more difficult to determine the number of unique members accessed because different methods may be accessing the same class variable.

The number of values read adds to the number of tests that must be done. Fewer state variables are also simpler than more state variables. The following piece of code is an example where two independent variables are more complex than one.

  val = ((ULONG) (val1 << 24) + (ULONG) (val2 << 16);

The number of tests required for a boolean parameter is 2. The number of tests required for an unsigned is generally at the boundaries (2), and perhaps a test in the middle. The number of tests required increase depending on how the parameter is used. The number of tests for a signed parameter is at the boundaries (2), plus usually at zero, and possibly at other values.

An interesting point for C++ is that the size_t type is an unsigned type, but uses "static_cast<size_t>(-1)", which means the maximum value, as an invalid value.  This also should be tested at the min, max, and middle value.

There is a school of thought that in C++, the unsigned value is dangerous, but I think the complexity indication is more important, and that the danger should be understood.  An example of the danger is below where the i variable is never less than zero since it is unsigned:

	for(size_t i=5; i>=0; i++)  // NEVER FINISHES!!!
        {
        }
	

The minimum number of tests required for parameters is additive for parameters that are independent of each other. For two parameters, add the complexity of each parameter to each other. It is impossible to determine if external input parameters are independent and must be tested combinatorially. From the perspective of the function that is being examined, the external parameters are independent.

The following do not increase test complexity

The oring of constants together does not increase test complexity. There is some actual complexity increase since there can be a mistake in selecting the constants.
  var.vt = VT_ARRAY | VT_UI1;
Output parameters do not increase test complexity of the function being analyzed. They increase complexity of other functions.

Intermediate parameters do not increase test complexity. Buffer indexing and pointers are usually more complex (can introduce more errors). Typically indexing is just increment operations and limit tests. These do not increase the number of tests if they are dependent on input parameters.

Of course, Wikipedia has an article about boundary value analysis: Boundary Value Analysis
And for abstract interpretation: Abstract Interpretation
This tool calculates statements in a block. Gnu complexity tool

Control Complexity

The input to an if statement is a boolean condition.

If the condition is based on an input parameter, and the parameter is not used elsewhere, the complexity should not be increased over the complexity of the input parameter.

The number of tests on an input parameter can increase the complexity. If there is one conditional based on an input parameter, and another test is added, the complexity increases by two if the parameters are sequential.

Sequential if statements produce 2^n combinations, where n is number of if statements in sequence. This is not useful as the number of tests. The minimum number of tests that is reasonable is main branch(1), plus 3 branches(3), plus additional tests for combinations (n-1) = 2*n.

If a function is dominated by control complexity (instead of data complexity), the control complexity can be up to a factor of 2 more than McCabe complexity. This will happen when conditials are combinatorial (such as sequential if statements).

Side Effect and Method or Function Call Complexity

In C++, when a function calls a method in another class, the method may modify other state that could affect the complexity of the current function. If a const method is called, then generally there are no side effects, and the complexity does not increase more than the values that are returned from the function. Note that there are side effects if a mutable variable or global variable is modified.

In general, a const method will not increase complexity as much as a non-const method.

Since the test complexity is about the function being analyzed, complexity introduced by calling other functions is not added to the current function.

A New Measure of Complexity

A new complexity measure can be developed: Data Complexity Details It can be somewhat difficult to determine an input variable versus an output variable. An output variable is the lhs of an assignment statement. If a pointer or reference is taken from an input variable, and the reference is const, it is determined to be an input, otherwise it is assumed to be an output.

Control Flow Details

Condition variables are not considered combinatorial if the same dynamic input parameters are in multiple conditions. This implies that variables must be examined to detect constants such as #define's, const, and enums.

If the input variable is used in a condition, the complexity is not double counted. The largest value of the condition or variable complexity is used in each case.

Desired Flow Complexity Values

The Oovaide tool has a partial implementation of the desired flow complexity values. The condition matching uses a bit of a cheat where it checks expressions, but does not know what is dynamic and what is a constant. The switch/case and else/if are also not complete. The logical operators may also not be handled correctly.
Example Desired Value Oovaide
Example 1
If Else
2 2
Example 2
If Else If
2 or 3 depending on condition 3 or 4
Example 3
Nested If
3 3
Example 4
Sequential If
4 4
Example 5-a
3 Sequential Ifs-comb
6 6
Example 5-b
3 Sequential Ifs-same
4 4
Example 6
case
3 4
Example 7
case/default
4 4
Example 8
logical or/and
statements
4 ? - Depends on conditions
Example 9
logical or/and
2 ? - Depends on conditions

Data Complexity

Desired Data Complexity Values

The Oovaide tool has a partial implementation of the desired data complexity values. The write/read pointer evaluation is not complete yet. The Oovaide tool does not check to see if input variables are actually used. This means that function return types and non-const pointers and references passed to functions are also added to the complexity.

Other tools would consider all of the following a complexity of one.
Desired data complexity values:

Example Desired Value Oovaide
Example D-1
Input Boolean
2 2
Example D-2
Input Unsigned
3 3
Example D-3
Input Signed
4 4
Example D-4
Input Pointer
4 4
Example D-5
Read Member
4 4
Example D-6
Write Member
1 1
Example D-7
Read Pointer Member
4 4
Example D-9
Call Function Value Param
1 1
Example D-10
Call Function Ref Param
4 4
Example D-11
Call Function Return
4 4
Example D-8
Write Pointer Member
1 4

Data Complexity Examples

Example D-1

int ComplexityDataTest::testInputBoolParam(bool v1)
    {
    return !v1;
    }

Example D-2

int ComplexityDataTest::testInputUnsignedParam(unsigned int v1)
    {
    return v1 / 2;
    }

Example D-3

int ComplexityDataTest::testInputSignedParam(int v1)
    {
    return v1 / 2;
    }

Example D-4

int ComplexityDataTest::testInputPointerParam(int *p1)
    {
    return *p1 / 2;
    }

Example D-5

int ComplexityDataTest::testReadMemberRef()
    {
    return mMember1 / 2;
    }

Example D-6

void ComplexityDataTest::testWriteMemberRef()
    {
    mMember1 = 8;
    }

Example D-7

int ComplexityDataTest::testReadPointerMemberRef()
    {
    int *p = &mMember1;
    return *p;
    }

Example D-8

void ComplexityDataTest::testWritePointerMemberRef()
    {
    int *p = &mMember1;
    *p = 8;
    }

Example D-9

void ComplexityDataTest::testMemberFuncVal()
    {
    mChild.funcVal(1);
    }

Example D-10

int ComplexityDataTest::testMemberFuncRef()
    {
    int val;
    mChild.funcRef(val);
    return val;
    }

Example D-11

int ComplexityDataTest::testMemberFuncRet()
    {
    return mChild.funcRet();
    }

Summary

After running this on some different software packages, it was found that tweaks to the algorithms will move some functions up or down in relations to others, but that often the more complex functions still remain with the higher complexity numbers.

Adding the data complexity does increase some functions a great deal that initially had a lower complexity. These functions sometimes do not look complex if the input data is not combinatorial, but stringently testing each function with all available input does require more tests.

The Oovaide program outputs both complexity figures (McCabe and combined control and data complexity) so that they can be compared and sorted on easily.