Using PREfast and SAL for C++ code static analysis

Answered

Goals of the project

In this project we use PREfast, a static analysis tool for C(++) developed at Microsoft, and the associated annotation language SAL, on some toy C code.

What static analysis tools can do (using automated theorem proving technology behind the scenes);
The amount of hassle, in the terms of annotations, a language like C(++) needs to make code amenable to such static analysis.

to appreciate some of the many things that can go wrong in C(++) code;
to understand the capabilities and the limitations of an (almost) state-of-the-art static analysis tool;
to understand the trade-offs in the design and in the use of such a tool.

The project is due Sunday Dec 5. You can do the project individually, as a pair or as a group of three students. If you do it as a group, please say that you did and mention all names as comments in the code. Also, write the group names in the comments of the submission.

If you've followed the installation instructions for PREfast above, then you should already have a copy of the exercise file prefast_exercise.cpp. (you can find it at the end of this file)

Get rid of the warnings in prefast_exercise.cpp that PREfast gives, by fixing the code. Mark places where you changed the code with a comment to keep track of the changes you made.

Keep the changes to the code minimal; the code is completely silly, no need to completely rewrite it.

There is no need to annotate the size of the argument of execute, as its size does not really matter. You also do not need to annotate validate. Fix any new warnings this produces.

Similarly, annotate the buffers returned as results by my_alloc and do_read to specify their size, using the annotations Fix any new warnings this produces.

As last step, we will add tainting annotations to trace any input passing from input to execute without passing through the validation operation, and add calls to the validation routine validate in the right places to fix any problems with missing input validation. The steps for this are explained in more detail below.

Annotate the first parameter of input with [SA_Post(Tainted=SA_Yes)], which specifies that this parameter will be tainted as postcondition, and

Annotate the parameter of execute with [SA_Pre(Tainted=SA_No)] to specify the precondition that this parameter should not be tainted.

Now annotate all the procedures that may handle or produce tainted data using pre- and/or postconditions as above. These procedures are:

Add calls to the validation routine validate in the right places to make such warnings disappear.

As you may notice, PREfast's tainting analysis is not reliable unless you annotate all procedures that may handle tainted data correctly.

Except for the functions execute and validate, for all other functions all parameters and return values that have a pointer type should have a size annotation specifing buffer lengths;

All parameters or return values of functions that might be tainted at some stage should have tainting annotations.

PREfast tries to check annotations -- and hence the properties they express -- at compile time. An alternative approach would be to check this at runtime. Two different aspects for which this could be done are 1) bounds-checking and 2) tainting & missing input validation. This would require some additional information to be tracked at runtime: for bounds-checking this could involve something like fat pointers to check access out of bounds at runtime; for tainting data would have to be marked and traced as being tainted. Name two advantages and two disadvantages of doing these checks at runtime instead of doing them at compile-time. (I can think of two each. Hint: also think of generic advantages and disadvantages when it comes to runtime vs compiletime checking. Maybe you can think of more?)

Sometimes PREfast only warns about problems after you add annotations. For example, the tool does not complain about zero() until after you add an annotation about the size of buf. An alternative tool design would be to produce a warning about zero() if there are no annotations for it. (The warning would then not so much be that there is a potential buffer overflow problem, but rather that the tool does not have enough information to determine whether there is a buffer overflow or not.) Can you give a plausible explanation why PREfast haas been designed so that it does not complain about such unannotated methods?

Get instant help from 5000+ experts for