Rewriting "ls" #3: Getting the list of files and their info

In this article we will start collecting info about files

If you want to know more about command, please read Part 1.

If you want to know how we configured tests, check Part 2.

Note: Source codes can be found at: https://github.com/Miradils-Blog/linux-ls

Getting the list of files

First things first, our file structure looks like this (tree command was used for output):

File structure

We have separate folder with different types of files for us to cover all cases. For now, we will hard-code the folder where listing will happen, just focusing on functionality. So, how do we get the list of all files and their respective data about ownership, size, links etc.? Lucky for us, C provides function for this (and honestly, for everything else as well):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data

    dir = opendir("folder");

    while ((file = readdir(dir)) != NULL)
        printf("%s\n", file->d_name);

    return 0;
}

The output would be:

Output for code above

Okay, cool, we got the names of files. However, we need more than that. The flag -lasi requires all the data about every file in folder. For each normal and hidden file (. and .. included) we need the following data:

I-nodeBlock SizePermissionsLink CountOwnerOwner GroupFile sizeModification TimeName
2687033000drwxr-xr-x3miradilzaxusers174May 5 10:04.
4026669110drwxr-xr-x9miradilzaxusers188May 10 10:21..
702616923180-rw-r--r--1miradilzaxusers23734296Apr 24 10:43archive.tar.xz
4026716700brw-r--r--1rootroot1, 2Apr 27 19:19block1
4026691260crw-r--r--1rootroot89, 1Apr 27 19:09charfile1
4026716690-rwxr-xr-x1miradilzaxusers0Apr 27 19:06exe1
4026716640-rw-r--r--1miradilzaxusers0Apr 27 19:00file1
4026716660drwxr-xr-x4miradilzaxusers66Apr 27 19:18folder1
2687033010-rw-r--r--1miradilzaxusers0May 5 10:04.hiddenfile
226680-rw-r--r--2miradilzaxusers0Apr 27 19:06hlink1
4026716630prw-r--r--1miradilzaxusers0Apr 27 19:12pipe1
4026753970lrwxrwxrwx1miradilzaxusers24Apr 27 19:05slink1-> folder1/folder11/file112
4026754240srwxr-xr-x1miradilzaxusers0Apr 27 19:15socket1

What are these data?

  • i-node: Short for "index node". It is a file data structure storing every data (file size and type, UID and GID of owner, permissions, last modification/access/change timestamps) except its name and content.
  • block size: The number of blocks the file takes. 1 block is usually 512B, and it is the multiple of 4 (in my case).
  • permissions: The first character is file type: - for normal file, d for directory, b for block, c for character, p for pipe, l for link and s for socket. The rest can be summarized with an image:

  • link count: The number of hard links. That is how many files points to this file (soft links excluded).
  • owner: The username of the owner. When flag is -n, UUID of owner is shown.
  • owner group: The username of the group. When flag is -n, GUID is shown.
  • file size: Size in bytes. When -h flag is set, it is shown in more readable form: 2M, 12K etc. For block and character files, their device number is shown instead of size.
  • modification time: Shows the last modification time. For -t creation time, and for -u access time is shown.
  • name: Name of the file. If file is link, unless -L flag is set, it shows the file it is linked to.

Collecting all the necessary info

To easily access every attribute of file, let's use a struct. But, wait! C already has a struct stat which stores almost all needed information (we miss user name, group name, color etc.). Of course, we have to write functions to handle missing data. We can use those functions when needed (maybe when it is already print time), or we create another struct and fill it beforehand with everything. I prefer going with latter, as it is easier to handle everything. Let's create another identical struct with extra data......

WAIT, YOU CAN EXTEND STRUCTS?

Yep, I am shocked as well. I found this at time writing this post, when this question suddenly struck me. After googling (and coming across this answer, I found out that with gcc flag -fms-extensions one can indeed extend a struct. So, we have the following code:

typedef struct
{
    struct stat;
    char name[256];
    char permission[11];
    char color[16];
    char extension[256];
    char indicator; // for -F and -p
    char owner_name[256];
    char group_name[256];

} file_info;

Now, our file_info struct has its own attributes and attributes of struct stat. Question is: can we use this for C functions? Or do we need original struct stat and copy its contents? Let's check ourselves. We will try to get the size of files:

#include <stdio.h>
#include <dirent.h>
#include <options.h>
#include <file.h>
#include <unistd.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data
    file_info files;

    chdir("folder");
    dir = opendir(".");

    while ((file = readdir(dir)) != NULL)
    {
        stat(file->d_name, (struct stat *)&files);
        printf("%s %d\n", file->d_name, files.st_size);
    }

    return 0;
}

Output:

. 174
.. 188
folder1 66
archive.tar.xz 23734296
block1 0
charfile1 0
exe1 0
hlink1 0
file1 0
socket1 0
slink1 0
pipe1 0
.hiddenfile 0

IT WORKS! If you check with ls, these are indeed the correct numbers. However, there are some data that is missing.

Collecting the rest of the data

Now we need the following: get user and group name based on their IDs, construct permission string, parse and store extension of file, and based on file type get color and indicator. Let's go in the same order.

User and Group Names

Lucky for us, C provides to get user and group names from /etc/passwd and /etc/group files, respectively:

void get_username_from_uid(int uid, char *username)
{
    struct passwd *pws;
    pws = getpwuid(uid);

    strncpy(username, pws->pw_name, 256);
}

void get_groupname_from_gid(int gid, char *groupname)
{
    struct group *grp;
    grp = getgrgid(gid);

    strncpy(groupname, grp->gr_name, 256);
}

We are using strncpy to prevent buffer overflow (in case, user/group name is more than 256 chars).

Permission String

Unfortunately, for this, we do not have a ready function. Thus, we need to analyze every permission bit ourselves and construct a string. At least for this, C does provide needed functions:

void construct_permission_str(mode_t mode, char *permission)
{
    // Get the first character from file type
    if      (S_ISREG(mode)) permission[0] = '-';
    else if (S_ISDIR(mode)) permission[0] = 'd';
    else if (S_ISLNK(mode)) permission[0] = 'l';
    else if (S_ISCHR(mode)) permission[0] = 'c';
    else if (S_ISBLK(mode)) permission[0] = 'b';
    else if (S_ISSOCK(mode)) permission[0] = 's';
    else if (S_ISFIFO(mode)) permission[0] = 'p';

    permission[1] = (mode & S_IRUSR) ? 'r' : '-';
    permission[2] = (mode & S_IWUSR) ? 'w' : '-';
    permission[3] = (mode & S_IXUSR) ? 'x' : '-';

    permission[4] = (mode & S_IRGRP) ? 'r' : '-';
    permission[5] = (mode & S_IWGRP) ? 'w' : '-';
    permission[6] = (mode & S_IXGRP) ? 'x' : '-';

    permission[7] = (mode & S_IROTH) ? 'r' : '-';
    permission[8] = (mode & S_IWOTH) ? 'w' : '-';
    permission[9] = (mode & S_IXOTH) ? 'x' : '-';
    permission[10] = 0;  // terminator char, just in case
}

Parse extension of file

This one is quite straightforward:

void get_file_extension(char *file_name, char *extension)
{
    char *p = strrchr(file_name, '.');

    if (p)
        strcpy(extension, strrchr(file_name, '.'));
}

strchr is function which finds first occurence of given character and points there (NULL otherwise). strrchr is the same function, but reversed, meaning it shows the last occurence of given character. And for us, the last part, after dot is considered an extension.

Color and indicator

Again, nothing serious here:

#define COLOR_RESET         "\x1b[0m" // reset color settings
#define COLOR_REGULAR       COLOR_RESET
#define COLOR_EXECUTABLES   "\x1b[1;32m"
#define COLOR_DIRECTORIES   "\x1b[1;34m"
#define COLOR_LINK          "\x1b[1;36m"
#define COLOR_FIFO          "\x1b[40;33m"
#define COLOR_SOCKET        "\x1b[1;35m"
#define COLOR_BLK_CHR       "\x1b[40;33;1m"
#define COLOR_BROKEN_LINK   "\x1b[40;31;1m"
#define COLOR_ARCHIVE       "\x1b[1;31m"

void get_color(char* permission, char *extension, char *color)
{
    switch (permission[0])
    {
    case '-':
        if (permission[3] == 'x' || permission[6] == 'x' || permission[9] == 'x')
            strcpy(color, COLOR_EXECUTABLES);
        else if (is_archive_file(extension))
            strcpy(color, COLOR_ARCHIVE);
        else
            strcpy(color, COLOR_REGULAR);
        break;
    case 'd':
        strcpy(color, COLOR_DIRECTORIES);
        break;
    case 'l':
        strcpy(color, COLOR_LINK);
        break;
    case 'c':
    case 'b':
        strcpy(color, COLOR_BLK_CHR);
        break;
    case 's':
        strcpy(color, COLOR_SOCKET);
        break;
    case 'p':
        strcpy(color, COLOR_FIFO);
        break;
    default:
        break;
    }
}

char get_indicator(char* permission)
{
    switch (permission[0])
    {
    case '-':
        if (permission[3] == 'x' || permission[6] == 'x' || permission[9] == 'x')
            return '*';

        return ' '; 
    case 'd':
        return '/';
    case 'l':
        return '@';
    case 's':
        return '=';
    case 'p':
        return '|';
    default:
        return ' ';
    }
}

is_archive_file is just function checking if the extension corresponds to archive file or not. What are these weird color values? Those are MAGIC! With super useful ASCII escape codes, we can format and color our output. I got the exact values for those constants with dircolors -p command.

Collecting everything together

So, now we have everything ready as separate parts and functions, let's join them and get full array of file info:

#include <stdio.h>
#include <dirent.h>
#include <options.h>
#include <file_data.h>
#include <unistd.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data
    file_info files[256];  // For now static array

    chdir("folder");
    dir = opendir(".");
    int i = 0;

    while ((file = readdir(dir)) != NULL)
    {
        lstat(file->d_name, (struct stat *)&files[i]);

        strncpy(files[i].name, file->d_name, 256);
        get_username_from_uid(files[i].st_uid, files[i].owner_name);
        get_groupname_from_gid(files[i].st_gid, files[i].group_name);
        construct_permission_str(files[i].st_mode, files[i].permission);
        get_file_extension(files[i].name, files[i].extension);
        get_color(files[i].permission, files[i].extension, files[i].color);

        files[i].indicator = get_indicator(files[i].permission);
        ++i;
    }

    char filetime[20];
    struct tm *timeinfo;

    for (int j = 0; j < i; ++j)
    {
        timeinfo = localtime(&files[j].st_mtime);
        strftime(filetime, 20, "%b %-2d %H:%M", timeinfo);

        printf("%9d %5d %s %d %-8s %-7s %8d %s %s%s%s\n", files[j].st_ino, files[j].st_blocks / 2, files[j].permission, files[j].st_nlink, 
                                                    files[j].owner_name, files[j].group_name, files[j].st_size, filetime, files[j].color, files[j].name, COLOR_RESET);
    }

    return 0;
}

Guess the output?

Our first output

VS

Original ls output

VOILA! Almost perfect output :)

Conclusion

Of course, we are missing some data:

  • First and foremost, the formatting. The original ls output automatically adjusts every column by its most wide line, however, we have done that manually.
  • The order of files in outputs are different.
  • For char and block files, instead of size, device numbers should be shown. We have that data in struct stat we just need separate if statement and formatting to fix that.
  • For soft links, their reference files should be shown.

But for now, these are okay, because we are sure that we collected all needed data about all files in directory. In the upcoming posts we will handle all of these issues and have one-to-one same output.

You can get codes from this repository.

0
Subscribe to my newsletter

Read articles from Miradil Zeynalli directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Miradil Zeynalli
Miradil Zeynalli

💬 Who am I? I am Miradil Zeynalli, a graduate of ADA University, one of the leading universities in Baku, Azerbaijan. Currently studying Embedded Electronics Engineering in Lund University, Sweden. Have Embedded Software Engineer experience of 3.5 years, and one year experience as Lead Django Developer. 🔭 What else do I do? Besides embedded engineering, I am also interested in Data Science and Machine Learning. In the last year of bachelor's, I did my Senior Design Project in "Statistical Analysis and Visualization of BP DTS data", where we worked with data from oil wells of British Petroleum. Furthermore, I developed a timetable program that uses AI for assigning time slots to courses. 🤔 How do I use my time? During my leisure time, I prefer to read (I love Dan Brown’s books), watch movies, and sometimes read articles about quantum computing. As a former professional chess player (and the world champion), I actively play chess on online platforms. Feel free to challenge me! In fact, right here, right now! See below :) ⚡ Fun facts Favorite movie: The Dark Knight, Twelve Angry Men Favorite actor: Morgan Freeman Favorite author: Dan Brown Favorite book: The Da Vinci Code Favorite sports: Chess, Basketball