Creating a Simple HTTP Server

Directory Structure and Makefile

Directory Structure

.
├── build
├── include
├── Makefile
├── README.md
└── src
    └── main.c

The directory structure is organized as follows:

build: This directory is intended to hold the compiled output of the project. It typically contains the executable files and any intermediate object files generated during the build process.
include: This folder is used to store header files (.h) that contain declarations for the functions and variables used in the source code. These headers are included in the source files to provide necessary definitions and interfaces.
Makefile: This file contains instructions for the build process. It defines rules and dependencies for compiling the source code into an executable program. The Makefile automates the compilation process, making it easier to build the project.
README.md: This markdown file provides an overview of the project. It usually includes information about the project's purpose, how to set it up, how to build it, and any other relevant details that users or developers might need to know.
src: This directory contains the source code files for the project. It is where the main implementation of the program resides.

Makefile

# Compiler
CC = gcc

# Directories
SRC_DIR = src
INC_DIR = include
BUILD_DIR = build

# Files
SRCS = $(wildcard $(SRC_DIR)/*.c)
OBJS = $(patsubst $(SRC_DIR)/%.c,$(BUILD_DIR)/%.o,$(SRCS))
TARGET = $(BUILD_DIR)/main

# Flags
CFLAGS = -I$(INC_DIR) -Wall -Wextra -g

# Rules
all: $(TARGET)
    $(BUILD_DIR)/%.o: $(SRC_DIR)/%.c | $(BUILD_DIR)
    $(CC) $(CFLAGS) -c $< -o $@
    $(TARGET): $(OBJS)
    $(CC) $(CFLAGS) $^ -o $@

$(BUILD_DIR):
    mkdir -p $(BUILD_DIR)

run: $(TARGET)
    ./$(TARGET)

clean:
rm -rf $(BUILD_DIR)

.PHONY: all clean

A Makefile simplifies our workflow. Instead of typing out a lengthy command with all the necessary compiler flags or pressing the up arrow on your terminal, we can predefine the commands and execute them using simple commands like make run, make clean, etc.

Understanding File Descriptors

File descriptors are unique identifiers used by the operating system to access files or other input/output resources. They enable programs to read from or write to these resources efficiently. Each open file or resource is associated with a specific file descriptor, typically an Integer.

#include <stdio.h>
#include <fcntl.h>

int main() {

    int filefd = open("test.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (filefd < 0) {
        perror("Failed to open file");
        return 1;
    }

    printf("File descriptor: %d\n", filefd);

    return 0;
}

In this program, we open a file described by test.txt using the options O_WRONLY | O_CREAT | O_TRUNC. Here's a brief explanation of these options:

O_WRONLY: Opens the file for writing only.
O_CREAT: Creates the file if it does not exist.
O_TRUNC: Truncates the file to zero length if it already exists.

These options are bitwise OR'd together. They are defined in the fcntl.h header file and are essentially binary numbers. We OR these flags and pass the resulting integer to the open system call. Additionally, we can set the file permissions to 0644, which is the same format used with the chmod command. The leading zero in 0644 signifies that the number is in octal format.

You can use Ctrl + Click on the flags to navigate to the file located at /usr/include/x86_64-linux-gnu/bits/fcntl.h.

#define O_ACCMODE       0003
#define O_RDONLY         00
#define O_WRONLY         01
#define O_RDWR             02
#ifndef O_CREAT
# define O_CREAT       0100    /* Not fcntl.  */
#endif
#ifndef O_EXCL
# define O_EXCL           0200    /* Not fcntl.  */
#endif
#ifndef O_NOCTTY
# define O_NOCTTY       0400    /* Not fcntl.  */
#endif
#ifndef O_TRUNC
# define O_TRUNC      01000    /* Not fcntl.  */
#endif
#ifndef O_APPEND
# define O_APPEND      02000
#endif
#ifndef O_NONBLOCK
# define O_NONBLOCK      04000
#endif
#ifndef O_NDELAY
# define O_NDELAY    O_NONBLOCK
#endif
#ifndef O_SYNC
# define O_SYNC           04010000
#endif
#define O_FSYNC        O_SYNC
#ifndef O_ASYNC
# define O_ASYNC     020000
#endif
#ifndef __O_LARGEFILE
# define __O_LARGEFILE    0100000
#endif

#ifndef __O_DIRECTORY
# define __O_DIRECTORY    0200000
#endif
#ifndef __O_NOFOLLOW
# define __O_NOFOLLOW    0400000
#endif
#ifndef __O_CLOEXEC
# define __O_CLOEXEC   02000000
#endif
#ifndef __O_DIRECT
# define __O_DIRECT     040000
#endif
#ifndef __O_NOATIME
# define __O_NOATIME   01000000
#endif
#ifndef __O_PATH
# define __O_PATH     010000000
#endif
#ifndef __O_DSYNC
# define __O_DSYNC     010000
#endif
#ifndef __O_TMPFILE
# define __O_TMPFILE   (020000000 | __O_DIRECTORY)
#endif

Above, you can see that we have defined an octal number, which we can pass using the | operator and send it to the open syscall.

Sockets

Introduction

Sockets are endpoints for communication between two machines. They enable data exchange over a network using protocols like TCP or UDP. Sockets are fundamental in network programming, allowing applications to send and receive data.

The primary distinction between normal file descriptors and socket descriptors lies in their binding behavior. When you use the open function in C to access a file, you receive a file descriptor that is directly bound to a specific file or device on the system. In contrast, when you create a socket and receive a file descriptor, it is not automatically bound to any port. You must manually bind the socket descriptor to a port using additional functions.

Socket Creation

In networking, socket creation is the initial step in establishing a connection between two devices. It involves generating a unique endpoint for communication, typically using a combination of IP address and port number. This process is fundamental in both client-server and peer-to-peer communication models.

// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
    perror("Error opening socket");
    exit(1);
}

The syntax for creating a socket is shown above. It accepts three arguments and returns a descriptor.

The initial parameter is referred to as the Protocol family. There are numerous families, including AF_INET, AF_BLUETOOTH, and AF_INET6. This parameter determines the fundamental type of socket being created. For instance, AF_INET is used for IPv4 connections.

The subsequent parameter is known as the Socket Type, with two primary options: SOCK_DGRAM and SOCK_STREAM. SOCK_DGRAM refers to datagrams, not exclusively UDP, as there are various types of datagrams, with UDP being one of them. This parameter defines a protocol class, while the specific protocol is determined by the next parameter. It's important to note that SOCK_STREAM represents an ordered, reliable stream, whereas SOCK_DGRAM signifies an unordered, unreliable stream.

The final parameter is named Protocol, and it specifies the particular protocol to be used. For instance, if we select SOCK_DGRAM, we can utilize the UDP protocol, UDPLite, or ICMPv6. It's important to note that you cannot use SOCK_STREAM with a protocol that belongs to the SOCK_DGRAM class, such as UDP or UDPLite, and vice versa.

Honestly, I'm aware that recalling protocol numbers for protocols can be challenging, and it's easy to forget or mix them up. To simplify this, we can utilize a function called getprotobyname. This function accepts the protocol name as a parameter, like tcp, and returns a protoent structure. From this structure, we can access the protocol number using protoent->p_proto.

struct protoent *proto
proto = getprotobyname('tcp')
sockfd = socket(AF_INET, SOCK_STREAM, proto->p_proto);

Binding to an Address and Port

Binding to an address and port refers to the process of assigning a specific network address and port number to a network socket. This allows a program to listen for incoming data or send data to a particular destination. It's a fundamental concept in network programming, enabling communication between different devices and applications.

int bind(int sockfd, const struct sockaddr_in *addr, socklen_t addrlen);

To utilize the bind function, we must supply it with our socket file descriptor and a structure known as sockaddr. Let's take a closer look at the sockaddr structure.

struct sockaddr_in {
    sa_family_t    sin_family;  // Always AF_INET
    in_port_t      sin_port;    // Port number (must use htons())
    struct in_addr sin_addr;    // IP address
    char           sin_zero[8]; // Padding (unused, just fill with 0s)
};

Let's proceed to construct a sockaddr_in structure.

struct sockaddr_in server_addr;

When we create it, it might contain garbage values, so we need to reset all of them to 0. We can achieve this using the memset function. We need to provide three things: the struct, the value to write, and the size.

memset(&server_addr, 0, sizeof(server_addr));

Let's configure the server address to bind to.

server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(8080);
server_addr.sin_addr.s_addr = inet_addr("0.0.0.0");

The sin_family must be set to AF_INET exclusively because we are utilizing sockaddr_in, which is designed for IPv4 communication. Therefore, no other value can be used for sin_family

sin_port This can be configured to any port, but avoid using well-known ports such as 22.

htons converts a 16-bit unsigned short from host byte order to network byte order. It ensures that the byte order is consistent across different systems when transmitting data over a network. This function is crucial for network programming to maintain data integrity.

htons -> Host to Network Short

To understand the necessity, consider that different systems may employ either little-endian or big-endian formats. However, data transmitted over networks consistently uses big-endian. Consequently, we utilize the host-to-network byte order function, commonly abbreviated as htons.

Finally, we can configure sin_addr and its parameter s_addr using inet_addr. The inet_addr function takes an IPv4 string as its parameter.

We can also configure sin_addr.s_addr to INADDR_ANY, which allows the system to accept connections on any available network interface.

IANA Port Number Ranges

There are three ranges of port numbers defined by the Internet Assigned Numbers Authority (IANA):

Well-Known Ports (0-1023)
- These ports are reserved for specific protocols such as HTTP, SSH, and DNS.
- We must avoid binding sockets to ports within this range.
Registered Ports (1024-49151)
- These ports are available for use in binding and other operations.
Dynamic/Ephemeral Ports (49152-65535)
- These ports are used by the system as ephemeral ports during communication.

Listening on a Socket

The listen() function in network programming is used to mark a socket as a passive socket, indicating that it will be used to accept incoming connection requests. This function is typically called on a server-side socket to enable it to listen for incoming connections from client sockets. The listen() function takes a backlog parameter, which specifies the maximum number of pending connections that can be queued before new connection requests are rejected.

listen(sockfd, 5)

The initial parameter represents the socket file descriptor, while the second parameter denotes the size of the queue.

Returns 0 on success.
Returns -1 on failure.

Accepting a Connection

Accepting a connection involves using the accept() function, which is typically called on a socket that is already bound to a specific port and is listening for incoming connections. This function returns a new socket file descriptor that can be used to communicate with the connected client. The accept() function is usually used in conjunction with bind() and listen() functions to set up the server socket.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

To handle incoming connections, you need to pass the socket file descriptor to the accept function. For each accepted connection, you must store the client's sockaddr information. To achieve this, you'll need to create a new sockaddr structure and pass it along with its length as the next two parameters to the accept function.

struct sockaddr_in client_addr; 
socklen_t client_len = sizeof(client_addr);


int newsockfd = accept(sockfd, (struct sockaddr *)&client_addr, &client_len);
if (newsockfd < 0) {
    perror("accept");
    close(sockfd);
    return 1;
}

You should be aware that the accept function call is a blocking call. If you invoke the accept call, it will remain in a blocked state until a connection is received and closed.

printf("Accepted connection from %s:%d\n",
        inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));

const char *response = "HTTP/1.1 200 OK\r\n"
                        "Content-Type: text/plain\r\n"
                        "Content-Length: 13\r\n"
                        "\r\n"
                        "Hello, World!";
write(newsockfd, response, strlen(response));
close(newsockfd);

Upon accepting a connection, the details of the connection, such as the client's IP address and port, are stored in the sockadd structure that we provided. These details can be retrieved from this structure.

Next, we'll create an HTTP 1.1 response to send back to the client. For now, we'll manually send a simple "hello world" string. Don't worry about the HTTP protocol details; it's text-based, and you can understand what's happening just by reading the response value.

Currently, we employ the write call in a manner similar to writing to any other file descriptor. We pass the file descriptor, the buffer, and the length to write.

We then terminate the connection.

`curl` and Check

Let's now utilize the curl command to establish a connection with the server.

razor@beast:~$ curl localhost:8080
Hello, World!

Here's what we've got: the "hello world" text displayed in the terminal. We've just created a basic HTTP server using C.

Accepting Connections with a `while` Loop

You may have noticed that when you attempt to use curl again, the connection fails because the server shuts down after the previous unsuccessful connection. This happens because the accept primitive is a blocking call. It listens for a connection the first time, and once it receives one, it executes the code and exits, causing the entire program to terminate. To prevent this, you can wrap the entire accept connection code in an infinite while loop.

while(1){
    struct sockaddr_in client_addr; 
    socklen_t client_len = sizeof(client_addr);


    int newsockfd = accept(sockfd, (struct sockaddr *)&client_addr, &client_len);
    if (newsockfd < 0) {
        perror("accept");
        close(sockfd);
        return 1;
    }

    printf("Accepted connection from %s:%d\n",
           inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));

    const char *response = "HTTP/1.1 200 OK\r\n"
                           "Content-Type: text/plain\r\n"
                           "Content-Length: 13\r\n"
                           "\r\n"
                           "Hello, World!";
    write(newsockfd, response, strlen(response));
    close(newsockfd);

}

When a connection request is received, the accept call will pause and wait. Upon arrival of a connection, the associated code runs, generating a response. Due to the infinite while loop, accept is called again, causing the process to pause once more waiting for other connections.

A Simple HTTP Server in C

Directory Structure and Makefile

Directory Structure

Makefile

Understanding File Descriptors

Sockets

Introduction

Socket Creation

Binding to an Address and Port

IANA Port Number Ranges

Listening on a Socket

Accepting a Connection

`curl` and Check

Accepting Connections with a `while` Loop

Subscribe to my newsletter

Praful M

Praful M

A Simple HTTP Server in C

Directory Structure and Makefile

Directory Structure

Makefile

Understanding File Descriptors

Sockets

Introduction

Socket Creation

Binding to an Address and Port

IANA Port Number Ranges

Listening on a Socket

Accepting a Connection

curl and Check

Accepting Connections with a while Loop

Subscribe to my newsletter

Praful M

Praful M

`curl` and Check

Accepting Connections with a `while` Loop