Oovaide Complexity

McCabe's idea was that finding "basic paths-that when taken in combination will generate every possible path". This is much less than the maximum possible number of combinations of groups of statements and is close to the minimum number of tests required for 100% path coverage. In addition, it calculates the number of flows without analyzing the conditional expressions, which can give higher complexity for some code (such as an else if that tests on the same variable as the initial if). This also means that the McCabe complexity is related to logical flow, but not data flow, threading, or arithmetic complexity.

The following examples will attempt to describe McCabe's cyclomatic complexity and show some of the limitations. Hopefully an improved measure of complexity can be found that closely indicates the number of tests that should be performed.

McCabe's original paper has some graph analysis ideas, followed by a keyword approach to finding complexity. Many of the tools use a keyword approach such as:

Flow Complexity Examples

Example 1

A source code example with an if and else statement is:

  if(c1)
    {
    a();
    }
  else
    {
    b();
    }

The number of possible flows through this code is:

Execution	Condition c1
Function a() is executed	true
Function b() is executed	false

With McCabe complexity, an if statement always adds a complexity of one to an existing path. This is true whether or not the conditional evaluates to a constant true or false, and the branch is always or never executed.

An if/else also only adds a complexity of one since the only difference is when the other path is taken. With McCabe complexity, an else statement is much simpler than two if statements even though the else condition path's expression is equivalent to "if(!c1)".

This shows that the conditional expressions are important when evaluating complexity, and that McCabe complexity is a model, but is not accurate even for purely logical path complexity.

Example 2

A example with an if and else if statement is:

  if(c1)
    {
    a();
    }
  else if(c2)
    {
    b();
    }

This example is similar this:

  if(c1)
    {
    a();
    }
  else
    {
    if(c2)
      {
      b();
      }
    }

The number of possible flows through this code with two independent conditions is:

Execution	Condition c1	Condition c2
No functions are executed	false	false
Function a() is executed	true	true or false
Function b() is executed	false	true

If both conditionals were testing on the same variable, then the execution is more like a switch/case, and complexity is similar to Example 1.

Example 3

A nested if example is:

  if(c1)
    {
    a();
    if(c2)
      {
      b();
      }
    }

The number of possible flows through this code is:

Execution	Condition c1	Condition c2
No functions are executed	false	true or false
Function a() is executed	true	false
Function a() and b() are executed	true	true

In this example, the complexity is higher than in example 2 if the statements executed alter values that affect each other. This also means that a function or method that has side effects could be more complex.

Example 4

A source example with 2 sequential if statements is:

  if(c1)
    {
    a();
    }
  if(c2)
    {
    b();
    }

The number of possible flows through this code is:

Execution	Condition c1	Condition c2
No functions are executed	false	false
a() is executed	true	false
b() is executed	false	true
a() and b() are executed	true	true

Example 5

A source example with 3 sequential if statements is:

  if(c1)
    {
    a();
    }
  if(c2)
    {
    b();
    }
  if(c3)
    {
    c();
    }

The number of possible flows through this code is:

Execution	Condition c1	Condition c2	Condition c3
No functions are executed	false	false	false
a() is executed	true	false	false
b() is executed	false	true	false
c() is executed	false	false	true
a() and b() are executed	true	true	false
b() and c() are executed	false	true	true
a() and c() are executed	true	false	true
a(), b(), and c() are executed	true	true	true

This example shows that when analyzing graphs, the McCabe complexity (7) does not match the number of combinations (8). The discrepency is that the there is no extra edge indicated for the abc path compared to the ab and bc paths.

Example 6

Example 7

Example 8

An example with Logical Operators and Statements:

  if(cond1() && cond2())
    a();

This is equivalent to the following.

  if(cond1())
    {
    if(cond2())
      a();
    }

C++ has short-circuit evaluation for the logical or and logical and operators. This means not all conditions (which could be statements) are executed within a conditional test.

McCabe graph analysis complexity would indicate that this is a complexity of: 7 edges - 5 nodes + (2 * 1) = 4
McCabe keyword complexity would indicate that the complexity is: main path(1) + 1 if = 2
Another variation of McCabe keyword complexity would indicate that the complexity is: main path(1) + 1 if + 1 logical operator = 3

Short-circuit evaluation

Example 9

An example with Logical Operators:

  if(c1 && c2)
    a();

This is equivalent to the following.

  if(c1)
    {
    if(c2)
      a();
    }

Simple counting of logical operators as keywords is not a very accurate measure compared to other keywords.

McCabe graph analysis complexity would indicate that this is a complexity of: 3 edges - 3 nodes + (2 * 1) = 2
McCabe keyword complexity would indicate that the complexity is: main path(1) + 1 if = 2
Another variation of McCabe keyword complexity would indicate that the complexity is: main path(1) + 1 if + 1 logical operator = 3

Example 10

Empty expressions
Counting keywords would generate a complexity of 2 for a function with an empty if statement.

  if(v1 == 1)
    {
    }

Since this is not common, it is not required that this be analyzed differently than the default behavior.

Complexity Tools

Here are some tool outputs using the above examples compared to the number of actual paths, and McCabe complexity.

	Combinations	McCabe Graph	McCabe Keyword	ACQC	vsCCM	Source Monitor	Oovaide McCabe
Example 1 If Else	2	2	2	2	2	3	2
Example 2 If/Else If	3	3	3	3	3	3	3
Example 3 Nested If	3	3	3	3	3	3	3
Example 4 Sequential If	4	4	3	3	3	3	3
Example 5 3 Sequential Ifs	8	7	4	4	4	4	4
Example 6 case	3	3	4	4	4	5	4
Example 7 case/default	4	4	4	4	4	5	4
Example 8 logical or/and statements	4	4	2:3	2	3	3	2
Example 9 logical or/and	2	2	2:3	2	3	3	2

There are many documents on the web indicating that switch case counts are too high. They may be a little high as seen in this table, but the actual reason is that if statement counts in many cases are too low. I have not found any tools that do not decrease the counts on case statements with no intervening statements. I haven't found any tools that incorrectly account for default statements.

Types of Complexity

Test complexity (the number of tests required for a piece of code) is different from complexity. Some examples of things that increase complexity, but don't increase test complexity are:

tramp data
additional variable declarations
number of assignment statements

Sometimes extra statements and variables can indicate more testing is needed.

There are many types of complexity, and I have not found a list of the varying types, so some of the names are made up here. They are listed in roughly increasing order here. Types of complexity:

McCabe complexity is close to the count of the minimum number of tests that will test all flow paths.
Boundary value complexity is the number of tests required to test the input parameters
Function call complexity is the number of tests required to test the results of function call return values or of side effects from function calls that affect the calling function.
Halstead's complexity measures the number of operands and operators. Note that this is different than boundary value complexity since it ignores the types of operands.
Line number complexity is based on the number of lines
Combinatorial path complexity is the number of tests required to run all combinations of paths.
Algorithmic complexity is a count of the number of run-time iterations

Data and Boundary Value Complexity

OK, so far, McCabe complexity could be tweaked in some manner to produce improved numbers for C++ and path analysis of some form seems to be a pretty good measure of complexity. What about the following code?

  int average(int a, int b) { return (a+b)/2; }

McCabe complexity indicates 1, so only one test is needed to fully test this. Obviously this is wrong. It may be possible to use some set theory similar to what is used in abstract interpretation.

There is no mathematically automated way to find the correct number of tests for testing an arithmetic/numeric algorithm. Some examples of algorithms that could be difficult to test would be something like finding n digits of pi, or sorting a set of data.

The following increase test complexity

The complexity of a method in C++ will depend on the input parameters. This includes class members and global state. (Including files etc.)

Structures and classes increase the complexity based on the number of variables in a struct or class are accessed. In C++, this is more difficult to determine the number of unique members accessed because different methods may be accessing the same class variable.

The number of values read adds to the number of tests that must be done. Fewer state variables are also simpler than more state variables. The following piece of code is an example where two independent variables are more complex than one.

  val = ((ULONG) (val1 << 24) + (ULONG) (val2 << 16);

The number of tests required for a boolean parameter is 2. The number of tests required for an unsigned is generally at the boundaries (2), and perhaps a test in the middle. The number of tests required increase depending on how the parameter is used. The number of tests for a signed parameter is at the boundaries (2), plus usually at zero, and possibly at other values.

An interesting point for C++ is that the size_t type is an unsigned type, but uses "static_cast<size_t>(-1)", which means the maximum value, as an invalid value. This also should be tested at the min, max, and middle value.

There is a school of thought that in C++, the unsigned value is dangerous, but I think the complexity indication is more important, and that the danger should be understood. An example of the danger is below where the i variable is never less than zero since it is unsigned:

	for(size_t i=5; i>=0; i++)  // NEVER FINISHES!!!
        {
        }

The minimum number of tests required for parameters is additive for parameters that are independent of each other. For two parameters, add the complexity of each parameter to each other. It is impossible to determine if external input parameters are independent and must be tested combinatorially. From the perspective of the function that is being examined, the external parameters are independent.

The following do not increase test complexity

The oring of constants together does not increase test complexity. There is some actual complexity increase since there can be a mistake in selecting the constants.

  var.vt = VT_ARRAY | VT_UI1;

Output parameters do not increase test complexity of the function being analyzed. They increase complexity of other functions.

Intermediate parameters do not increase test complexity. Buffer indexing and pointers are usually more complex (can introduce more errors). Typically indexing is just increment operations and limit tests. These do not increase the number of tests if they are dependent on input parameters.

Of course, Wikipedia has an article about boundary value analysis: Boundary Value Analysis
And for abstract interpretation: Abstract Interpretation
This tool calculates statements in a block. Gnu complexity tool

Control Complexity

The input to an if statement is a boolean condition.

If the condition is based on an input parameter, and the parameter is not used elsewhere, the complexity should not be increased over the complexity of the input parameter.

The number of tests on an input parameter can increase the complexity. If there is one conditional based on an input parameter, and another test is added, the complexity increases by two if the parameters are sequential.

Sequential if statements produce 2^n combinations, where n is number of if statements in sequence. This is not useful as the number of tests. The minimum number of tests that is reasonable is main branch(1), plus 3 branches(3), plus additional tests for combinations (n-1) = 2*n.

If a function is dominated by control complexity (instead of data complexity), the control complexity can be up to a factor of 2 more than McCabe complexity. This will happen when conditials are combinatorial (such as sequential if statements).

Side Effect and Method or Function Call Complexity

In C++, when a function calls a method in another class, the method may modify other state that could affect the complexity of the current function. If a const method is called, then generally there are no side effects, and the complexity does not increase more than the values that are returned from the function. Note that there are side effects if a mutable variable or global variable is modified.

In general, a const method will not increase complexity as much as a non-const method.

Since the test complexity is about the function being analyzed, complexity introduced by calling other functions is not added to the current function.

A New Measure of Complexity

A new complexity measure can be developed:

A single number that represents the number of tests that should be performed on a function
Improve McCabe complexity (increase combinatorial complexity and slightly improve switch/case for C++)
Add input data complexity
The number should not be dependent on analyzing called functions, in other words represent only the complexity of the analyzed function
The number should be the worst case, and reduced if possible.

Data Complexity Details

For each input variable, use the type to determine the additional test complexity for the variable:
- bool = 1
- unsigned = 2
- other = 3
Sum all of the test complexities for each variable together
Input variables includes the member variables and globals
When a member function is called, the return type of a member or function will be added the same way as an input parameter
The number of accesses to a single variable or function does not increase the complexity

It can be somewhat difficult to determine an input variable versus an output variable. An output variable is the lhs of an assignment statement. If a pointer or reference is taken from an input variable, and the reference is const, it is determined to be an input, otherwise it is assumed to be an output.

Control Flow Details

Single condition adds one. Multiple conditions add one if condition variables are the same, add two if condition variables are combinatorial
Logical operators in conditions add if there are intervening statements
Case statements with break add one. Empty fall-through adds one
Reading a variable in a condition is not double counted. (Not counted for data and control)

Condition variables are not considered combinatorial if the same dynamic input parameters are in multiple conditions. This implies that variables must be examined to detect constants such as #define's, const, and enums.

If the input variable is used in a condition, the complexity is not double counted. The largest value of the condition or variable complexity is used in each case.

Desired Flow Complexity Values

The Oovaide tool has a partial implementation of the desired flow complexity values. The condition matching uses a bit of a cheat where it checks expressions, but does not know what is dynamic and what is a constant. The switch/case and else/if are also not complete. The logical operators may also not be handled correctly.

Example	Desired Value	Oovaide
Example 1 If Else	2	2
Example 2 If Else If	2 or 3 depending on condition	3 or 4
Example 3 Nested If	3	3
Example 4 Sequential If	4	4
Example 5-a 3 Sequential Ifs-comb	6	6
Example 5-b 3 Sequential Ifs-same	4	4
Example 6 case	3	4
Example 7 case/default	4	4
Example 8 logical or/and statements	4	? - Depends on conditions
Example 9 logical or/and	2	? - Depends on conditions

Data Complexity

Desired Data Complexity Values

The Oovaide tool has a partial implementation of the desired data complexity values. The write/read pointer evaluation is not complete yet. The Oovaide tool does not check to see if input variables are actually used. This means that function return types and non-const pointers and references passed to functions are also added to the complexity.

Other tools would consider all of the following a complexity of one.
Desired data complexity values:

Example	Desired Value	Oovaide
Example D-1 Input Boolean	2	2
Example D-2 Input Unsigned	3	3
Example D-3 Input Signed	4	4
Example D-4 Input Pointer	4	4
Example D-5 Read Member	4	4
Example D-6 Write Member	1	1
Example D-7 Read Pointer Member	4	4
Example D-9 Call Function Value Param	1	1
Example D-10 Call Function Ref Param	4	4
Example D-11 Call Function Return	4	4
Example D-8 Write Pointer Member	1	4

Data Complexity Examples

Example D-1

int ComplexityDataTest::testInputBoolParam(bool v1)
    {
    return !v1;
    }

Example D-2

int ComplexityDataTest::testInputUnsignedParam(unsigned int v1)
    {
    return v1 / 2;
    }

Example D-3

int ComplexityDataTest::testInputSignedParam(int v1)
    {
    return v1 / 2;
    }

Example D-4

int ComplexityDataTest::testInputPointerParam(int *p1)
    {
    return *p1 / 2;
    }

Example D-5

int ComplexityDataTest::testReadMemberRef()
    {
    return mMember1 / 2;
    }

Example D-6

void ComplexityDataTest::testWriteMemberRef()
    {
    mMember1 = 8;
    }

Example D-7

int ComplexityDataTest::testReadPointerMemberRef()
    {
    int *p = &mMember1;
    return *p;
    }

Example D-8

void ComplexityDataTest::testWritePointerMemberRef()
    {
    int *p = &mMember1;
    *p = 8;
    }

Example D-9

void ComplexityDataTest::testMemberFuncVal()
    {
    mChild.funcVal(1);
    }

Example D-10

int ComplexityDataTest::testMemberFuncRef()
    {
    int val;
    mChild.funcRef(val);
    return val;
    }

Example D-11

int ComplexityDataTest::testMemberFuncRet()
    {
    return mChild.funcRet();
    }

Summary

After running this on some different software packages, it was found that tweaks to the algorithms will move some functions up or down in relations to others, but that often the more complex functions still remain with the higher complexity numbers.

Adding the data complexity does increase some functions a great deal that initially had a lower complexity. These functions sometimes do not look complex if the input data is not combinatorial, but stringently testing each function with all available input does require more tests.

The Oovaide program outputs both complexity figures (McCabe and combined control and data complexity) so that they can be compared and sorted on easily.

C++ Complexity and Testing

Existing Measures

Flow Complexity Examples

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

Complexity Tools

Other Reference Documents

Types of Complexity

Data and Boundary Value Complexity

The following increase test complexity

The following do not increase test complexity

Control Complexity

Side Effect and Method or Function Call Complexity

A New Measure of Complexity

Desired Flow Complexity Values

Data Complexity

Desired Data Complexity Values

Data Complexity Examples

Example D-1

Example D-2

Example D-3

Example D-4

Example D-5

Example D-6

Example D-7

Example D-8

Example D-9

Example D-10

Example D-11

Summary