Week 11 - Project Stage 2: Phase 1&2 (Create New Pass + Identify Cloned Function)

Hamza TeliHamza Teli
11 min read

Introduction

Welcome again! I hope you are doing well and are ready to learn and tackle the second stage of our project! In this article, I will guide you through the process of creating a custom GCC pass. Please note that I have built upon my previous project stage. Therefore, I will not cover the steps for adding the pass to the corresponding files inside the source directory. If you’d like more information on those steps please read my previous article on “Project Stage 1”. Nonetheless, I will detail the process of creating the new pass, the logic behind it, and how I overcame common issues.

Overview of custom GCC Pass

Before I dive into the logic and code, I will briefly explain the goal of this pass. If you read my previous article on AFMV, you will quickly understand what this pass is accomplishing. In simple terms, the goal of this pass is to identify functions that have been cloned. A function that has been cloned has a revolver and a corresponding variant. After identifying the cloned functions, our pass will attempt to determine if they are the same or different, and based on this factor, it will output PRUNE OR NOPRUNE. In this article, I will cover the logic for identifying cloned functions.

Algorithm for Identifying Cloned Functions

Before writing the code, I took a step back and sketched the algorithm on paper. Here is the complete logic:

  1. Iterate over all functions:

    • For each function, retrieve its complete name
  2. Obtain the base name and suffix:

    • If the function name contains a dot (.) then I split the name into 2 parts

      • Base name → This is done by getting the name before the first dot

      • Suffix → The substring after the dot

  3. First Pass - Set the resolver map

    • If the suffix is resolver then I store the mapping from the base name to the full resolver function name in the map and output this
  4. Second pass - identifying the cloned variants

    • In here, I have the same logic as the first pass for majority of the pass but whats different is that when I check for the resolver name, I log that a clone variant has been found along with its resolver.

There is probably a better and more modular way to accomplish this. However I took this approach. Maybe in a next iteration, I will make it more simpler.

Prerequisite Actions

Prior to coding, I performed the following steps to ensure a smooth transition and any potential drawbacks while coding.

  1. Create a new GitHub repository to store our custom GCC pass

    • I already created one for project stage 1, but because I don’t want to override it and want to keep things separate. I created a new GitHub repository where I will push my gcc pass changes here. This is so I don’t lose any data.
  2. Push old Stage 1 code to the new repository. No need to re-create everything again.

  3. Compile code to identify cloned functions

Now you may be wondering why I didn’t perform the same steps as project stage 1. Although we can start from scratch, I decided to utilize my old pass to save time. In addition, I reviewed the passes.def file and it seems like my pass is already defined pretty late into the compilation stage. If for some reason, my code does not work. Then I will come back and move my pass in the passes.def file further down and re-build the gcc build directory.

To access the github repository, please visit the following link: https://github.com/Hamza-Teli/spo600-project-stage-2

Identifying the cloned functions!

Below is the code I initially compiled as I created this function. What I’ll do is I will show the code in stages and discuss the issues I ran into as I created it

Part 1 - Building Resolver Map

/* This pass was creatd by Hamza Teli with the help of
Professor Chris Tyler's SPO600 wiki and lecture. 

This pass accomplishes the following:


*/
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"

// Added headers
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
#include <string>
#include <map>
#include <cstdio>



// Namespace <--- This section I learned from SPO600 Week 7 - Class 1 Lecture from Professor Chris Tyler
namespace{
    const pass_data pass_data_hteli1 = {
            GIMPLE_PASS, /* type */
            "hteli1", /* name of my pass [We will use this inside passes.def as pass_hteli1_pass]*/ 
            OPTGROUP_NONE, /* optinfo_ flags */
            TV_NONE, /* tv_id */
            PROP_cfg, /* specify that we need properties */
            0, /* the properties provided */
            0, /* the properties destroyed */
            0, /* todo_flags_start */
            0, /* todo_flags_finish */
    };

    /*
        Please refer to the instructions below from Professor CHris Tyler that helped me build the class:
        Pass Code
            Each pass provides a C++ source file which provides:
            -   A pass_data structure which defines the pass details (defined in tree-pass.h);
            -   A class which defines a gimple_opt_pass and includes the public methods gate and execute.
            -   The gate method controls whether the pass is active. It can consider multiple factors, including command-line arguments, source language, and target. This method returns a boolean.
            -   The execute method executes the pass. It accepts a pointer to a function object and will be called once for each function in the unit being compiled (which is not always what you want!).
            -   A method in the anon namespace named make_pass_name which returns a pointer to the gimple_opt_pass described above.
    */


    // This is where you identify the class
    class pass_hteli1 : public gimple_opt_pass {
        public:
            // Constructor
            pass_hteli1(gcc::context *ctxt) : gimple_opt_pass(pass_data_hteli1, ctxt) {

            }

            // The gate function 
            bool gate(function *) final override {
                // return 
                return true;

            }

            // The execute function: this is where the magic happens
            unsigned int execute (function * /*func*/) override {

                // Instantiate function name
                /* 
                    Inside function.cc, there's a function_name method that returns
                    the name of a function. Check out line 6454:
                    https://github.com/gcc-mirror/gcc/blob/master/gcc/function.cc

                */
                //const char* function_Name = function_name(func);

                // This is where we will get started with identifying the functions that have been cloned
                if (dump_file) {

                    // Lets create a map that holds the functions
                    std::map<std::string, std::string> resolverMap;

                    // Use cgraph node
                    cgraph_node *node;
                    // Lets use FOR_EACH_FUNCTION
                    FOR_EACH_FUNCTION(node) {
                        // Get the function pointer
                        function *current_function_pointer = node->get_fun();

                        // Validate
                        if (!current_function_pointer)
                            continue;
                        // Get the complete funciton name
                        std::string functionName(function_name(current_function_pointer));

                        // Get the dot
                        size_t dot = functionName.find('.');

                        // Check if the dot is there
                        if (dot== std::string::npos){
                            continue;
                        }



                        // Get the suffix first
                        std::string suffix = functionName.substr(dot + 1);

                        // Now we check that if the function has a resolver suffix, we simply store its base name
                        if (suffix == "resolver") {
                          std::string baseName = functionName.substr(0, dot);
                          resolverMap[baseName] = functionName;

                          // Show an output
                          fprintf(dump_file, "Resolver was found for base function: %s\n", baseName.c_str());
                        }


                    }

                    // Second pass goes here where we use the names inside our map and find all the variants
                }

                // Return value
                return 0;
            }
    };
}

// This is used inside the tree-pass.h file
gimple_opt_pass* make_pass_hteli1(gcc::context *ctxt) {
    return new pass_hteli1(ctxt);
}

Issues I ran with the first pass above

When I created this I ran into a few issues below:

  • Incorrect variable names (This is a novice issue)

  • I was initially using the char data type but with char, I have to do some extra memory management whereas with std::string, I can easily manipulate strings as I please using substr. This is where I ran into issues with creating the base name, so I switched to using string and imported the <string>library

  • The last issue I faced was I had undeclared variables like func so when I executed make inside my gcc-build-001 directory, it would return that error. Therefore, I commented that code.

Part 2: Identifying the clone variants

/* This pass was creatd by Hamza Teli with the help of
Professor Chris Tyler's SPO600 wiki and lecture. 

This pass accomplishes the following:


*/
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
// These headers were taken from Professor Chris Tyler's Week 7 Lecture
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"

// Added headers
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
#include <string>
#include <map>
#include <cstdio>



// Namespace <--- This section I learned from SPO600 Week 7 - Class 1 Lecture from Professor Chris Tyler
namespace{
    const pass_data pass_data_hteli1 = {
            GIMPLE_PASS, /* type */
            "hteli1", /* name of my pass [We will use this inside passes.def as pass_hteli1_pass]*/ 
            OPTGROUP_NONE, /* optinfo_ flags */
            TV_NONE, /* tv_id */
            PROP_cfg, /* specify that we need properties */
            0, /* the properties provided */
            0, /* the properties destroyed */
            0, /* todo_flags_start */
            0, /* todo_flags_finish */
    };

    /*
        Please refer to the instructions below from Professor CHris Tyler that helped me build the class:
        Pass Code
            Each pass provides a C++ source file which provides:
            -   A pass_data structure which defines the pass details (defined in tree-pass.h);
            -   A class which defines a gimple_opt_pass and includes the public methods gate and execute.
            -   The gate method controls whether the pass is active. It can consider multiple factors, including command-line arguments, source language, and target. This method returns a boolean.
            -   The execute method executes the pass. It accepts a pointer to a function object and will be called once for each function in the unit being compiled (which is not always what you want!).
            -   A method in the anon namespace named make_pass_name which returns a pointer to the gimple_opt_pass described above.
    */


    // This is where you identify the class
    class pass_hteli1 : public gimple_opt_pass {
        public:
            // Constructor
            pass_hteli1(gcc::context *ctxt) : gimple_opt_pass(pass_data_hteli1, ctxt) {

            }

            // The gate function 
            bool gate(function *) final override {
                // return 
                return true;

            }

            // The execute function: this is where the magic happens
            unsigned int execute (function * /*func*/) override {

                // Instantiate function name
                /* 
                    Inside function.cc, there's a function_name method that returns
                    the name of a function. Check out line 6454:
                    https://github.com/gcc-mirror/gcc/blob/master/gcc/function.cc

                */
                //const char* function_Name = function_name(func);

                // This is where we will get started with identifying the functions that have been cloned
                if (dump_file) {

                    // Lets create a map that holds the functions
                    std::map<std::string, std::string> resolverMap;

                    // Use cgraph node
                    cgraph_node *node;
                    // Lets use FOR_EACH_FUNCTION
                    FOR_EACH_FUNCTION(node) {
                        // Get the function pointer
                        function *current_function_pointer = node->get_fun();

                        // Validate
                        if (!current_function_pointer)
                            continue;
                        // Get the complete funciton name
                        std::string functionName(function_name(current_function_pointer));

                        // Get the dot
                        size_t dot = functionName.find('.');

                        // Check if the dot is there
                        if (dot== std::string::npos){
                            continue;
                        }

                        // Get the suffix first
                        std::string suffix = functionName.substr(dot + 1);

                        // Now we check that if the function has a resolver suffix, we simply store its base name
                        if (suffix == "resolver") {
                          std::string baseName = functionName.substr(0, dot);
                          resolverMap[baseName] = functionName;

                          // Show an output
                          fprintf(dump_file, "Resolver was found for base function: %s\n", baseName.c_str());
                        }
                    }

                    // Second pass goes here where we use the names inside our map and find all the variants
                    FOR_EACH_FUNCTION(node) {
                        // Get the function pointer
                        function *current_function_pointer = node->get_fun();

                        // Validate
                        if (!current_function_pointer)
                            continue;
                        // Get the complete funciton name
                        std::string functionName(function_name(current_function_pointer));

                        // Get the dot
                        size_t dot = functionName.find('.');

                        // Check if the dot is there
                        if (dot== std::string::npos){
                            continue;
                        }

                        // Get the suffix first
                        std::string suffix = functionName.substr(dot + 1);

                        // Now we check that if the function has a resolver suffix, we simply store its base name
                        if (suffix == "resolver") {  
                            continue;
                        }

                        // Get base name
                        std::string baseName = functionName.substr(0, dot);

                        // Now we check if base has a resolver
                        if (resolverMap.find(baseName) != resolverMap.end()) {
                            fprintf(dump_file, "Clone variant successfully found: %s (base function: %s) with resolver: %s\n", functionName.c_str(), baseName.c_str(), resolverMap[baseName].c_str());
                        }
                    }
                }

                // Return value
                return 0;
            }
    };
}

// This is used inside the tree-pass.h file
gimple_opt_pass* make_pass_hteli1(gcc::context *ctxt) {
    return new pass_hteli1(ctxt);
}

Re-building and testing the pcc

Please note as I made changes to the custom gcc pass file. I performed the following commands inside the gcc-build directory.

  • cd ~/gcc-build-001 → Navigate to build directory

  • time make -j$(nproc) |& tee rebuild_log.log

  • make install

After successfully building the gcc again, I utilized the sample code provided by Professor Chris Tyler to test out my code. Within each of the SPO600 servers, there is a file containing sample code to test out our code. It is available here: /public/spo600-test-clone.tgz. These are the steps I performed to utilize this directory

  • Navigate to home directory → cd ~

  • Unpack this file → tar xvf /public/spo600-test-clone.tgz

  • Navigate to the test-clone directory within the unpacked file above

  • Edit the Makefile and uncomment DUMP_ALL

  • Check which gcc you are using → which gcc

    • NOTE: If you are not using the gcc you built, then please execute the following command PATH=”$HOME/gcc-test-001/bin:$PATH”
  • Execute → make clean

  • Execute → make

  • Find your dump file and output it using cat

    • For example: My dump file is named as such: cat clone-test-aarch64-prune-clone-test-core.c.265t.hteli1

Result

As you can see, my first function works as it found the variant and its corresponding resolver. (If only you could see the smile on my face as this part did take me quite a bit of time :) ).

0
Subscribe to my newsletter

Read articles from Hamza Teli directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Hamza Teli
Hamza Teli