Week 11 - Project Stage 2: Phase 1&2 (Create New Pass + Identify Cloned Function)


Introduction
Welcome again! I hope you are doing well and are ready to learn and tackle the second stage of our project! In this article, I will guide you through the process of creating a custom GCC pass. Please note that I have built upon my previous project stage. Therefore, I will not cover the steps for adding the pass to the corresponding files inside the source directory. If you’d like more information on those steps please read my previous article on “Project Stage 1”. Nonetheless, I will detail the process of creating the new pass, the logic behind it, and how I overcame common issues.
Overview of custom GCC Pass
Before I dive into the logic and code, I will briefly explain the goal of this pass. If you read my previous article on AFMV, you will quickly understand what this pass is accomplishing. In simple terms, the goal of this pass is to identify functions that have been cloned. A function that has been cloned has a revolver and a corresponding variant. After identifying the cloned functions, our pass will attempt to determine if they are the same or different, and based on this factor, it will output PRUNE
OR NOPRUNE
. In this article, I will cover the logic for identifying cloned functions.
Algorithm for Identifying Cloned Functions
Before writing the code, I took a step back and sketched the algorithm on paper. Here is the complete logic:
Iterate over all functions:
- For each function, retrieve its complete name
Obtain the base name and suffix:
If the function name contains a dot (.) then I split the name into 2 parts
Base name → This is done by getting the name before the first dot
Suffix → The substring after the dot
First Pass - Set the resolver map
- If the suffix is resolver then I store the mapping from the base name to the full resolver function name in the map and output this
Second pass - identifying the cloned variants
- In here, I have the same logic as the first pass for majority of the pass but whats different is that when I check for the resolver name, I log that a clone variant has been found along with its resolver.
There is probably a better and more modular way to accomplish this. However I took this approach. Maybe in a next iteration, I will make it more simpler.
Prerequisite Actions
Prior to coding, I performed the following steps to ensure a smooth transition and any potential drawbacks while coding.
Create a new GitHub repository to store our custom GCC pass
- I already created one for project stage 1, but because I don’t want to override it and want to keep things separate. I created a new GitHub repository where I will push my gcc pass changes here. This is so I don’t lose any data.
Push old Stage 1 code to the new repository. No need to re-create everything again.
Compile code to identify cloned functions
Now you may be wondering why I didn’t perform the same steps as project stage 1. Although we can start from scratch, I decided to utilize my old pass to save time. In addition, I reviewed the passes.def
file and it seems like my pass is already defined pretty late into the compilation stage. If for some reason, my code does not work. Then I will come back and move my pass in the passes.def
file further down and re-build the gcc build directory.
To access the github repository, please visit the following link: https://github.com/Hamza-Teli/spo600-project-stage-2
Identifying the cloned functions!
Below is the code I initially compiled as I created this function. What I’ll do is I will show the code in stages and discuss the issues I ran into as I created it
Part 1 - Building Resolver Map
/* This pass was creatd by Hamza Teli with the help of
Professor Chris Tyler's SPO600 wiki and lecture.
This pass accomplishes the following:
*/
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
// Added headers
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
#include <string>
#include <map>
#include <cstdio>
// Namespace <--- This section I learned from SPO600 Week 7 - Class 1 Lecture from Professor Chris Tyler
namespace{
const pass_data pass_data_hteli1 = {
GIMPLE_PASS, /* type */
"hteli1", /* name of my pass [We will use this inside passes.def as pass_hteli1_pass]*/
OPTGROUP_NONE, /* optinfo_ flags */
TV_NONE, /* tv_id */
PROP_cfg, /* specify that we need properties */
0, /* the properties provided */
0, /* the properties destroyed */
0, /* todo_flags_start */
0, /* todo_flags_finish */
};
/*
Please refer to the instructions below from Professor CHris Tyler that helped me build the class:
Pass Code
Each pass provides a C++ source file which provides:
- A pass_data structure which defines the pass details (defined in tree-pass.h);
- A class which defines a gimple_opt_pass and includes the public methods gate and execute.
- The gate method controls whether the pass is active. It can consider multiple factors, including command-line arguments, source language, and target. This method returns a boolean.
- The execute method executes the pass. It accepts a pointer to a function object and will be called once for each function in the unit being compiled (which is not always what you want!).
- A method in the anon namespace named make_pass_name which returns a pointer to the gimple_opt_pass described above.
*/
// This is where you identify the class
class pass_hteli1 : public gimple_opt_pass {
public:
// Constructor
pass_hteli1(gcc::context *ctxt) : gimple_opt_pass(pass_data_hteli1, ctxt) {
}
// The gate function
bool gate(function *) final override {
// return
return true;
}
// The execute function: this is where the magic happens
unsigned int execute (function * /*func*/) override {
// Instantiate function name
/*
Inside function.cc, there's a function_name method that returns
the name of a function. Check out line 6454:
https://github.com/gcc-mirror/gcc/blob/master/gcc/function.cc
*/
//const char* function_Name = function_name(func);
// This is where we will get started with identifying the functions that have been cloned
if (dump_file) {
// Lets create a map that holds the functions
std::map<std::string, std::string> resolverMap;
// Use cgraph node
cgraph_node *node;
// Lets use FOR_EACH_FUNCTION
FOR_EACH_FUNCTION(node) {
// Get the function pointer
function *current_function_pointer = node->get_fun();
// Validate
if (!current_function_pointer)
continue;
// Get the complete funciton name
std::string functionName(function_name(current_function_pointer));
// Get the dot
size_t dot = functionName.find('.');
// Check if the dot is there
if (dot== std::string::npos){
continue;
}
// Get the suffix first
std::string suffix = functionName.substr(dot + 1);
// Now we check that if the function has a resolver suffix, we simply store its base name
if (suffix == "resolver") {
std::string baseName = functionName.substr(0, dot);
resolverMap[baseName] = functionName;
// Show an output
fprintf(dump_file, "Resolver was found for base function: %s\n", baseName.c_str());
}
}
// Second pass goes here where we use the names inside our map and find all the variants
}
// Return value
return 0;
}
};
}
// This is used inside the tree-pass.h file
gimple_opt_pass* make_pass_hteli1(gcc::context *ctxt) {
return new pass_hteli1(ctxt);
}
Issues I ran with the first pass above
When I created this I ran into a few issues below:
Incorrect variable names (This is a novice issue)
I was initially using the
char
data type but withchar
, I have to do some extra memory management whereas withstd::string
, I can easily manipulate strings as I please usingsubstr
. This is where I ran into issues with creating the base name, so I switched to using string and imported the<string>
libraryThe last issue I faced was I had undeclared variables like func so when I executed make inside my gcc-build-001 directory, it would return that error. Therefore, I commented that code.
Part 2: Identifying the clone variants
/* This pass was creatd by Hamza Teli with the help of
Professor Chris Tyler's SPO600 wiki and lecture.
This pass accomplishes the following:
*/
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
// Added headers
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
#include <string>
#include <map>
#include <cstdio>
// Namespace <--- This section I learned from SPO600 Week 7 - Class 1 Lecture from Professor Chris Tyler
namespace{
const pass_data pass_data_hteli1 = {
GIMPLE_PASS, /* type */
"hteli1", /* name of my pass [We will use this inside passes.def as pass_hteli1_pass]*/
OPTGROUP_NONE, /* optinfo_ flags */
TV_NONE, /* tv_id */
PROP_cfg, /* specify that we need properties */
0, /* the properties provided */
0, /* the properties destroyed */
0, /* todo_flags_start */
0, /* todo_flags_finish */
};
/*
Please refer to the instructions below from Professor CHris Tyler that helped me build the class:
Pass Code
Each pass provides a C++ source file which provides:
- A pass_data structure which defines the pass details (defined in tree-pass.h);
- A class which defines a gimple_opt_pass and includes the public methods gate and execute.
- The gate method controls whether the pass is active. It can consider multiple factors, including command-line arguments, source language, and target. This method returns a boolean.
- The execute method executes the pass. It accepts a pointer to a function object and will be called once for each function in the unit being compiled (which is not always what you want!).
- A method in the anon namespace named make_pass_name which returns a pointer to the gimple_opt_pass described above.
*/
// This is where you identify the class
class pass_hteli1 : public gimple_opt_pass {
public:
// Constructor
pass_hteli1(gcc::context *ctxt) : gimple_opt_pass(pass_data_hteli1, ctxt) {
}
// The gate function
bool gate(function *) final override {
// return
return true;
}
// The execute function: this is where the magic happens
unsigned int execute (function * /*func*/) override {
// Instantiate function name
/*
Inside function.cc, there's a function_name method that returns
the name of a function. Check out line 6454:
https://github.com/gcc-mirror/gcc/blob/master/gcc/function.cc
*/
//const char* function_Name = function_name(func);
// This is where we will get started with identifying the functions that have been cloned
if (dump_file) {
// Lets create a map that holds the functions
std::map<std::string, std::string> resolverMap;
// Use cgraph node
cgraph_node *node;
// Lets use FOR_EACH_FUNCTION
FOR_EACH_FUNCTION(node) {
// Get the function pointer
function *current_function_pointer = node->get_fun();
// Validate
if (!current_function_pointer)
continue;
// Get the complete funciton name
std::string functionName(function_name(current_function_pointer));
// Get the dot
size_t dot = functionName.find('.');
// Check if the dot is there
if (dot== std::string::npos){
continue;
}
// Get the suffix first
std::string suffix = functionName.substr(dot + 1);
// Now we check that if the function has a resolver suffix, we simply store its base name
if (suffix == "resolver") {
std::string baseName = functionName.substr(0, dot);
resolverMap[baseName] = functionName;
// Show an output
fprintf(dump_file, "Resolver was found for base function: %s\n", baseName.c_str());
}
}
// Second pass goes here where we use the names inside our map and find all the variants
FOR_EACH_FUNCTION(node) {
// Get the function pointer
function *current_function_pointer = node->get_fun();
// Validate
if (!current_function_pointer)
continue;
// Get the complete funciton name
std::string functionName(function_name(current_function_pointer));
// Get the dot
size_t dot = functionName.find('.');
// Check if the dot is there
if (dot== std::string::npos){
continue;
}
// Get the suffix first
std::string suffix = functionName.substr(dot + 1);
// Now we check that if the function has a resolver suffix, we simply store its base name
if (suffix == "resolver") {
continue;
}
// Get base name
std::string baseName = functionName.substr(0, dot);
// Now we check if base has a resolver
if (resolverMap.find(baseName) != resolverMap.end()) {
fprintf(dump_file, "Clone variant successfully found: %s (base function: %s) with resolver: %s\n", functionName.c_str(), baseName.c_str(), resolverMap[baseName].c_str());
}
}
}
// Return value
return 0;
}
};
}
// This is used inside the tree-pass.h file
gimple_opt_pass* make_pass_hteli1(gcc::context *ctxt) {
return new pass_hteli1(ctxt);
}
Re-building and testing the pcc
Please note as I made changes to the custom gcc pass file. I performed the following commands inside the gcc-build directory.
cd ~/gcc-build-001
→ Navigate to build directorytime make -j$(nproc) |& tee rebuild_log.log
make install
After successfully building the gcc again, I utilized the sample code provided by Professor Chris Tyler to test out my code. Within each of the SPO600 servers, there is a file containing sample code to test out our code. It is available here: /public/spo600-test-clone.tgz
. These are the steps I performed to utilize this directory
Navigate to home directory →
cd ~
Unpack this file →
tar xvf /public/spo600-test-clone.tgz
Navigate to the test-clone directory within the unpacked file above
Edit the Makefile and uncomment
DUMP_ALL
Check which gcc you are using →
which gcc
- NOTE: If you are not using the gcc you built, then please execute the following command
PATH=”$HOME/gcc-test-001/bin:$PATH”
- NOTE: If you are not using the gcc you built, then please execute the following command
Execute →
make clean
Execute →
make
Find your dump file and output it using cat
- For example: My dump file is named as such:
cat clone-test-aarch64-prune-clone-test-core.c.265t.hteli1
- For example: My dump file is named as such:
Result
As you can see, my first function works as it found the variant and its corresponding resolver. (If only you could see the smile on my face as this part did take me quite a bit of time :) ).
Subscribe to my newsletter
Read articles from Hamza Teli directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
