Write your own Go static analysis tool
Writing a static analysis tool for a programming language could be daunting for those who hasn’t done it before. The good news is for Golang, it’s actually very straight forward if you know how to leverage existing packages exposed by the Go compiler itself.
In this series of blog posts, I would like to share some of tips I learned from building my first Go static analysis tool: sqlvet.
To keep it simple, I won’t be using sqlvet as the example. Instead I am going
to build a dummy static analysis tool to warn the use of fmt.Println
and
fmt.Printf
functions.
Parsing Go code
Source code is stored as a blob of text, which is easy for human to read, but hard for computer to manipulate. So the first step is to parse them into an in memory data structure:
|
|
Yes, it’s that easy! Using just two function calls, we have our source file
fully parsed into an AST1. If you don’t know what an AST is, don’t worry
about it, just think of it as source code represented in a tree data structure
that’s easy for a machine to consume. As noted in the comment above, the parsed
AST is stored in the variable f
with a type of
*ast.File
.
Here is what it looks like to run this code on itself:
$ go run . main.go
Parsing source file main.go...
Found imports:
"fmt"
"go/parser"
"go/token"
"os"
Finding function calls
Finding fmt.Println
and fmt.Printf
calls can be done through two steps.
First, find all function calls. Then filter those calls by function name.
All statements, including function calls are stored as a tree node within the AST we generated from the source code. If we do a full traverse of the AST, we should be able to hit all the function calls.
Because AST traversal is such a common operation, go/ast
package comes with a
helper function called ast.Inspect
. When invoked, this function will travrse
the AST in depth-first order and process each syntax tree node with a provided
callback:
|
|
Let’s run the code and see if we can find the AST node for fmt.Printf
:
$ go run . ./main.go
Parsing source file ./main.go...
<...>
0 *ast.SelectorExpr {
1 . X: *ast.Ident {
2 . . NamePos: ./main.go:13:2
3 . . Name: "fmt"
4 . }
5 . Sel: *ast.Ident {
6 . . NamePos: ./main.go:13:6
7 . . Name: "Printf"
8 . }
9 }
<...>
Notice fmt.Printf
calls are parsed into *ast.SelectorExpr
structs with
fmt
as the expression (struct field X) and Printf
as the selector (struct field Sel)
With this information, we can add couple filter rules in the callback to focus only on print function calls:
|
|
Here is what the final output looks like:
$ go run . ./main.go
Use of `fmt.Printf` detected at ./main.go:33:9
Use of `fmt.Println` detected at ./main.go:43:7
exit status 1
Not bad for less than 50 lines of code right?
Get advanced
What we have built so far only works for a single source file, which is not
very useful. Using golang.org/x/tools/go/packages
package, we can parse all
source files within a given package path with just couple lines of code.
While go/ast
package is straight forward to use, we can’t run deeper analysis
using just it without a lot of extra work. For example, we don’t have access to
type information and function call graphs. Luckily, we can get all of those
through golang.org/x/tools/go/ssa
and golang.org/x/tools/go/pointer
packages with very little effort.
In the next blog post, I will cover how sqlvet leverages those packages and other techniques to discovery SQL statements in a code base and analyze them at build time to prevent runtime errors.
- AST stands for abstract syntax tree, see: https://en.wikipedia.org/wiki/Abstract_syntax_tree. [return]