Playing with GCC’s GIMPLE: How to Generate, Save, and Modify Intermediate Code (Tutorial + Examples)
Compilers are the unsung heroes of software development, translating high-level code (like C or C++) into machine-executable instructions. At the heart of every compiler lies a series of intermediate representations (IRs)—abstract versions of your code that simplify analysis, optimization, and code generation. For GCC (GNU Compiler Collection), one of the most critical IRs is GIMPLE.
GIMPLE (short for "GNU IMPLEmentation") is GCC’s mid-level IR, sitting between the high-level abstract syntax tree (AST) and low-level Register Transfer Language (RTL). Designed to be simple and expressive, GIMPLE enables GCC’s powerful optimizations (e.g., loop unrolling, constant propagation) and transformations. Whether you’re interested in compiler internals, static analysis, custom optimizations, or code instrumentation, understanding GIMPLE is a gateway to unlocking GCC’s full potential.
In this tutorial, we’ll demystify GIMPLE. You’ll learn how to:
- Generate and save GIMPLE from your C code,
- Interpret GIMPLE’s textual syntax,
- Modify GIMPLE using GCC plugins,
- And apply these skills with hands-on examples.
No prior compiler expertise is required—just basic C knowledge and a willingness to dive into GCC’s toolchain. Let’s get started!
Table of Contents#
- What is GIMPLE?
- Prerequisites
- Generating GIMPLE: The Basics
3.1 Step 1: Write a Simple C Program
3.2 Step 2: Compile with-fdump-tree-gimple
3.3 Understanding the GIMPLE Dump File - Decoding GIMPLE Syntax
4.1 Basic Blocks and Control Flow
4.2 Statements: Assignments, Calls, and Returns
4.3 Variables and Temporaries - Saving GIMPLE for Later Analysis
- Modifying GIMPLE: A Hands-On Guide with GCC Plugins
6.1 Why Modify GIMPLE?
6.2 Anatomy of a GCC Plugin
6.3 Example 1: Insert a Printf Call
6.4 Example 2: Replace Addition with Multiplication - Advanced Topics
7.1 GIMPLE in SSA Form
7.2 Debugging GIMPLE Modifications - Conclusion
- References
What is GIMPLE?#
GIMPLE is a mid-level intermediate representation (IR) in GCC, designed to balance simplicity and expressiveness. It sits between the high-level abstract syntax tree (AST) generated by the frontend (e.g., gcc -c for C) and the low-level RTL (Register Transfer Language) used by the backend to generate machine code.
Key characteristics of GIMPLE:
- Simplified Syntax: GIMPLE has a small set of statement types (e.g., assignments, function calls, returns), making it easy to analyze and transform.
- Three-Address Code: Each statement has at most three operands (e.g.,
x = a + b), simplifying optimization logic. - Static Single Assignment (SSA): During optimizations, GIMPLE is converted to SSA form, where each variable is assigned exactly once. This simplifies data-flow analysis (e.g., dead code elimination).
- Textual Dump: GCC can output GIMPLE in a human-readable textual format for debugging and inspection.
GIMPLE is critical for GCC’s optimization pipeline. Passes like constant propagation, loop unrolling, and inlining operate on GIMPLE to improve code efficiency.
Prerequisites#
Before diving in, ensure you have the following:
- GCC 7+: Older versions may lack modern plugin APIs. Use
gcc --versionto check. - Basic C Knowledge: Understanding variables, functions, and pointers is essential.
- Compiler Fundamentals: Familiarity with phases like parsing, optimization, and code generation helps (but isn’t strictly required).
- Command-Line Proficiency: You’ll need to compile code and navigate files via the terminal.
Generating GIMPLE: The Basics#
Generating GIMPLE from your C code is surprisingly simple, thanks to GCC’s built-in debugging flags. Let’s walk through the process.
Step 1: Write a Simple C Program#
Create a file named test.c with a basic function:
// test.c
int add(int a, int b) {
return a + b;
}
int main() {
int x = 5, y = 3;
int z = add(x, y);
return z;
}Step 2: Compile with -fdump-tree-gimple#
GCC’s -fdump-tree-gimple flag tells the compiler to dump the GIMPLE representation of your code to a file. Compile test.c with:
gcc -O0 -fdump-tree-gimple test.c-O0: Disables optimizations to keep GIMPLE simple (optimizations add complexity like SSA).-fdump-tree-gimple: Triggers the GIMPLE dump.
Step 3: Understanding the GIMPLE Dump File#
After compilation, GCC generates a dump file named test.c.003t.gimple (the suffix may vary slightly by GCC version). Open it with a text editor:
;; Function add (add, funcdef_no=0, decl_uid=1952, cgraph_uid=1, symbol_order=1)
add (int a, int b)
{
int D.1955;
<bb 2>:
D.1955 = a + b;
return D.1955;
}
;; Function main (main, funcdef_no=1, decl_uid=1956, cgraph_uid=2, symbol_order=2)
main ()
{
int x;
int y;
int z;
int D.1959;
<bb 2>:
x = 5;
y = 3;
D.1959 = add (x, y);
z = D.1959;
return z;
}This dump shows GIMPLE for both add and main. Let’s break it down:
;; Function add ...: Metadata about the function (name, parameters, internal IDs).int D.1955;: A temporary variable generated by GCC (more on this later).<bb 2>: A basic block (a sequence of statements with no jumps except at the end).D.1955 = a + b;: A GIMPLE assignment statement.
Decoding GIMPLE Syntax#
GIMPLE’s textual format is straightforward once you know the basics. Let’s dissect key components.
Basic Blocks and Control Flow#
GIMPLE groups statements into basic blocks (BBs)—contiguous sequences of code with a single entry and exit point. For example, in main, all code is in <bb 2> (no conditionals, so only one block).
For a function with conditionals, you’ll see multiple blocks. For example:
int max(int a, int b) {
if (a > b) return a;
else return b;
}The GIMPLE dump for max would include blocks for the condition check, the if branch, and the else branch, connected by edges (e.g., <bb 2> -> <bb 3>).
Statements: Assignments, Calls, and Returns#
GIMPLE has a small set of statement types. The most common are:
Assignments (gimple_assign)#
The workhorse of GIMPLE: dest = op1 op2 (for binary ops) or dest = op (unary ops).
Example: D.1955 = a + b; (add a and b, store result in D.1955).
Function Calls (gimple_call)#
Represented as dest = func(args). If the function returns void, dest is omitted.
Example: D.1959 = add (x, y); (call add with x and y, store result in D.1959).
Returns (gimple_return)#
Ends a function: return expr;.
Example: return D.1955; (return the value of D.1955).
Variables and Temporaries#
GIMPLE uses two types of variables:
- User-Defined Variables: Explicitly declared in your code (e.g.,
x,y,zinmain). - Temporaries: Internal variables generated by GCC (e.g.,
D.1955,D.1959). These are prefixed withD.and numbered uniquely.
Temporaries help GCC simplify complex expressions into three-address code. For example, z = add(x, y); becomes D.1959 = add(x, y); z = D.1959; because GIMPLE prefers single-assignment statements.
Saving GIMPLE for Later Analysis#
GCC’s -fdump-tree-gimple flag automatically saves GIMPLE to a file, but you can customize the output:
- Custom Filename: Use
-fdump-tree-gimple=my_gimple.dumpto specify a filename. - Dump All Phases:
-fdump-tree-allgenerates dumps for all GIMPLE-based passes (e.g., SSA, loop optimizations). - Verbose Mode:
-fdump-tree-gimple-detailsadds extra metadata (e.g., variable types, block edges).
Example:
gcc -O0 -fdump-tree-gimple=add_gimple.dump test.cThis saves GIMPLE for test.c to add_gimple.dump.
Modifying GIMPLE: A Hands-On Guide with GCC Plugins#
Generating and inspecting GIMPLE is useful, but the real power lies in modifying it. This allows you to instrument code, implement custom optimizations, or enforce coding standards. The primary way to modify GIMPLE is via GCC plugins.
Why Modify GIMPLE?#
- Code Instrumentation: Insert logging or profiling code (e.g.,
printffor debugging). - Custom Optimizations: Add domain-specific optimizations GCC doesn’t support (e.g., math library replacements).
- Static Analysis: Detect bugs by analyzing GIMPLE (e.g., uninitialized variables).
Anatomy of a GCC Plugin#
A GCC plugin is a shared library (*.so) that hooks into GCC’s compilation pipeline. Plugins register passes (functions) to run at specific stages (e.g., after parsing, before optimizations).
Key steps to write a plugin:
- Include GCC headers (e.g.,
gcc-plugin.h,tree.h,gimple.h). - Define a
plugin_initfunction (entry point for the plugin). - Register a custom pass to modify GIMPLE.
Example 1: Insert a Printf Call#
Let’s write a plugin that inserts printf("Hello from GIMPLE!"); at the start of main.
Step 1: Write the Plugin Code (gimple_plugin.c)#
#include <gcc-plugin.h>
#include <tree.h>
#include <gimple.h>
#include <function.h>
#include <basic-block.h>
#include <tree-iterator.h>
#include <diagnostic.h>
// Required by GCC: Plugin version and name
int plugin_is_GPL_compatible;
// Callback to run on each function
static unsigned int gimple_modify_execute(void) {
// Get the current function being processed
function *curr_func = current_function();
if (!curr_func) return 0;
// Only modify "main"
tree func_decl = DECL_STRUCT_FUNCTION(curr_func);
const char *func_name = IDENTIFIER_POINTER(DECL_NAME(func_decl));
if (strcmp(func_name, "main") != 0) return 0;
// Get the entry basic block (first block of the function)
basic_block bb = ENTRY_BLOCK_PTR_FOR_FUNCTION(curr_func)->next_bb;
if (!bb) return 0;
// Create a GIMPLE call to printf("Hello from GIMPLE!")
location_t loc =UNKNOWN_LOCATION; // No source location
tree printf_decl = builtin_decl_explicit(BUILT_IN_PRINTF);
tree format_str = build_string_literal(strlen("Hello from GIMPLE!\n") + 1, "Hello from GIMPLE!\n");
tree args[] = {format_str}; // printf arguments
// Create the call statement
gimple *call = gimple_build_call_vec(printf_decl, 1, args);
gimple_set_location(call, loc);
// Insert the call at the start of the basic block
gimple_stmt_iterator gsi = gsi_start_bb(bb);
gsi_insert_before(&gsi, call, GSI_SAME_STMT);
return 0; // No need to re-run passes
}
// Register the pass
static const pass_data gimple_modify_pass_data = {
.type = GIMPLE_PASS,
.name = "gimple_modify", // Pass name (for debugging)
.optinfo_flags = OPTGROUP_NONE,
.tv_id = TV_NONE,
.properties_required = 0,
.properties_provided = 0,
.properties_destroyed = 0,
.tunable = 0,
.num_params = 0,
.static_pass_number = LAST_PASS_NUMBER + 1, // Run after all default passes
};
class gimple_modify_pass : public gimple_opt_pass {
public:
gimple_modify_pass(gcc::context *ctxt) : gimple_opt_pass(gimple_modify_pass_data, ctxt) {}
// Run the pass on the current function
unsigned int execute(function *func) override {
current_function() = func;
return gimple_modify_execute();
}
};
// Plugin initialization
int plugin_init(struct plugin_name_args *plugin_info, struct plugin_gcc_version *version) {
// Check GCC version compatibility (adjust as needed)
if (version->major < 7) {
error("GIMPLE plugin requires GCC 7 or later");
return 1;
}
// Register the custom pass
register_pass(new gimple_modify_pass(g));
return 0;
}Step 2: Compile the Plugin#
Compile the plugin into a shared library:
gcc -fPIC -shared gimple_plugin.c -o gimple_plugin.so \
-I$(gcc -print-file-name=plugin)/include \
-I$(gcc -print-file-name=include)-fPIC -shared: Generate a position-independent shared library.-I...: Include GCC’s internal headers (required for plugin development).
Step 3: Use the Plugin#
Compile test.c with the plugin to modify GIMPLE:
gcc -O0 -fplugin=./gimple_plugin.so test.c -o testRun ./test—you’ll see:
Hello from GIMPLE!
To verify the GIMPLE modification, recompile with -fdump-tree-gimple:
gcc -O0 -fplugin=./gimple_plugin.so -fdump-tree-gimple test.cThe main function’s GIMPLE will now include the printf call:
main ()
{
int x;
int y;
int z;
int D.1959;
<bb 2>:
printf ("Hello from GIMPLE!\n"); // Added by the plugin
x = 5;
y = 3;
D.1959 = add (x, y);
z = D.1959;
return z;
}Example 2: Replace Addition with Multiplication#
Let’s modify our plugin to replace a + b with a * b in the add function.
Modified Plugin Code#
Update gimple_modify_execute to iterate over GIMPLE statements and replace additions:
static unsigned int gimple_modify_execute(void) {
function *curr_func = current_function();
if (!curr_func) return 0;
tree func_decl = DECL_STRUCT_FUNCTION(curr_func);
const char *func_name = IDENTIFIER_POINTER(DECL_NAME(func_decl));
if (strcmp(func_name, "add") != 0) return 0; // Target "add"
// Iterate over all basic blocks in the function
basic_block bb;
FOR_EACH_BB_FN(bb, curr_func) {
gimple_stmt_iterator gsi;
// Iterate over all statements in the block
for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
gimple *stmt = gsi_stmt(gsi);
// Check if the statement is an addition
if (is_gimple_assign(stmt) && gimple_assign_rhs_code(stmt) == ADD_EXPR) {
// Replace ADD_EXPR with MUL_EXPR
gimple_assign_set_rhs_code(stmt, MUL_EXPR);
update_stmt(stmt); // Update GCC's internal state
}
}
}
return 0;
}Test the Modification#
Recompile the plugin and test.c:
gcc -fPIC -shared gimple_plugin.c -o gimple_plugin.so ... # Same as before
gcc -O0 -fplugin=./gimple_plugin.so test.c -o testRun ./test—since add(5, 3) now computes 5 * 3, the output is 15 (instead of 8).
Advanced Topics#
GIMPLE in SSA Form#
At -O1 or higher, GCC converts GIMPLE to SSA form, where each variable is assigned once. SSA uses φ functions to merge values from different control-flow paths. For example:
int foo(int a, int b) {
int x = a;
if (b > 0) x = b;
return x;
}In SSA, x becomes x_1 (from a) and x_2 (from b), with a φ function merging them:
x_3 = PHI <x_1 (2), x_2 (3)>; // x_3 = x_1 if coming from BB 2, x_2 if from BB 3Dump SSA with -fdump-tree-ssa.
Debugging GIMPLE Modifications#
- Validate GIMPLE: Use
gcc -fplugin=./plugin.so -fdump-tree-gimple -fsyntax-onlyto check for invalid IR (e.g., undefined variables). - Check Pass Order: Insert the plugin at the right stage (use
-fdump-pass-statsto list passes). - GDB: Debug plugins with
gdb --args gcc -fplugin=./plugin.so test.c.
Conclusion#
GIMPLE is a powerful window into GCC’s internals. By generating, inspecting, and modifying GIMPLE, you can unlock custom optimizations, instrumentation, and analysis. This tutorial covered the basics—generating dumps, decoding syntax, and writing plugins—but there’s much more to explore (e.g., loop transformations, SSA optimizations).
Experiment with small modifications, validate your changes, and refer to GCC’s documentation for deeper dives. Happy hacking!