Introduction to TCP (Transmission Control Protocol)

TCP is a connection-oriented, reliable, byte-stream transport protocol that provides:
Reliability: Guarantees delivery of data without errors, duplicates, or out-of-order packets
Flow Control: Prevents sender from overwhelming receiver
Congestion Control: Prevents network congestion
Full-Duplex Communication: Bidirectional data flow
Connection Management: Explicit connection setup and teardown
Ordered Delivery: Data arrives in the same order it was sent
TCP Header Size and Payload
Variable Header Length
Header Size: 20-60 bytes
Minimum: 20 bytes (fixed fields only)
Maximum: 60 bytes (20 bytes fixed + up to 40 bytes of options)
Why variable?: TCP options field allows additional functionality like:
Window scaling
Selective acknowledgments (SACK)
Timestamps
Maximum Segment Size negotiation
Maximum Segment Size (MSS)
Definition: The largest amount of data (payload only, excluding headers) that can be sent in a single TCP segment
Typical MSS: 1460 bytes for Ethernet networks
Who dictates it?:
Each host announces its MSS during TCP handshake
Calculated as: MTU - IP header - TCP header
The smaller MSS between two hosts is used
Path MTU Discovery can further reduce it if intermediate networks have smaller MTUs
Maximum Sizes - Different Scenarios
1. Theoretical Maximum (No MTU constraints)
Max TCP segment size: 65,535 bytes (limited by 16-bit length field in IP header)
Max TCP payload: 65,495 bytes (65,535 - 20 byte TCP header - 20 byte IP header)
2. With Standard Ethernet MTU (1500 bytes)
MTU: 1500 bytes
Max data with IP + TCP headers: 1500 bytes total
Max TCP payload: 1460 bytes (1500 - 20 IP - 20 TCP)
With TCP options (40 bytes): 1420 bytes payload
3. Size Breakdown Summary
Ethernet frame: 1518 bytes (including 14-byte Ethernet header + 4-byte FCS)
IP packet within frame: 1500 bytes (MTU)
TCP segment within IP: 1480 bytes (1500 - 20 IP header)
TCP payload: 1460 bytes (1480 - 20 TCP header)
TCP Header Fields (20 bytes minimum)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 | Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
64 | Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
96 | Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
128| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
160| Options (0-40 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Field Details:
Source Port (16 bits): Sending application's port number
Destination Port (16 bits): Receiving application's port number
Sequence Number (32 bits): Position of first data byte in this segment
Acknowledgment Number (32 bits): Next expected sequence number
Data Offset (4 bits): TCP header length in 32-bit words
Reserved (6 bits): Set to zero
Control Flags (6 bits):
URG: Urgent pointer field significant
ACK: Acknowledgment field significant
PSH: Push function
RST: Reset connection
SYN: Synchronize sequence numbers
FIN: No more data from sender
Window (16 bits): Flow control - bytes receiver willing to accept
Checksum (16 bits): Error detection for header and data
Urgent Pointer (16 bits): Points to urgent data
Options (0-40 bytes): Additional TCP options
Port Identification Fields
The Source Port and Destination Port fields, each consuming 16 bits, form the fundamental addressing mechanism that allows TCP to multiplex multiple connections over a single IP address. The source port identifies the sending application's socket, while the destination port specifies where the segment should be delivered on the receiving host. Together with the source and destination IP addresses from the IP header, these ports create a unique four-tuple that identifies each TCP connection globally.
Sequence and Acknowledgment Numbers
The Sequence Number field, spanning 32 bits, serves as TCP's primary mechanism for ensuring ordered delivery and detecting lost segments. This field contains the sequence number of the first data byte in the current segment, with each byte of data transmitted assigned a unique sequence number. During connection establishment, each side chooses an Initial Sequence Number (ISN) randomly to prevent sequence number prediction attacks and avoid confusion with segments from previous connections.
The Acknowledgment Number field, also 32 bits, implements TCP's reliability mechanism by indicating the next sequence number the receiver expects. This cumulative acknowledgment approach means that acknowledging sequence number X confirms receipt of all bytes up to X-1. The field remains meaningful only when the ACK flag is set, creating an efficient piggybacking mechanism where data segments can simultaneously acknowledge received data.
Header Length and Reserved Fields
The Data Offset field, despite using only 4 bits, plays a crucial role in TCP's extensibility. It specifies the TCP header length in 32-bit words, indicating where the header ends and data begins. With 4 bits, it can represent values from 0 to 15, which when multiplied by 4 bytes gives a range of 0 to 60 bytes. Since the minimum header size is 20 bytes (5 words), and the maximum is 60 bytes (15 words), this field efficiently encodes all possible header lengths while allowing for up to 40 bytes of options.
The 6-bit Reserved field remains set to zero and exists for potential future protocol extensions.
Control Flags: The Command Structure
The six control flags, each occupying one bit, encode TCP's control commands and state information in an extremely compact format. Each flag fundamentally alters how the segment is processed and what state transitions occur.
The URG (Urgent) flag signals that the Urgent Pointer field contains valid data, indicating high-priority information that should bypass normal stream processing. This mechanism allows applications to send out-of-band control information, though it's rarely used in modern applications due to implementation inconsistencies and security concerns.
The ACK flag indicates that the Acknowledgment Number field is valid. After the initial SYN segment, virtually every TCP segment has this flag set, as TCP uses every opportunity to acknowledge received data. This piggybacking of acknowledgments on data segments significantly improves protocol efficiency.
The PSH (Push) flag requests immediate delivery of data to the receiving application without waiting for additional segments to fill the buffer. This flag helps reduce latency for interactive applications where small amounts of data need immediate processing, such as terminal sessions or real-time protocols.
The RST (Reset) flag forcibly terminates a connection, typically in response to error conditions like receiving segments for non-existent connections or detecting protocol violations. RST provides an immediate, non-graceful termination mechanism that bypasses the normal closing handshake.
The SYN (Synchronize) flag initiates connections and synchronizes sequence numbers. It appears only in the first segment from each side during connection establishment. The presence of SYN triggers special processing rules and state transitions in the TCP state machine.
The FIN (Finish) flag indicates the sender has no more data to transmit, initiating the graceful connection termination sequence. Unlike RST, FIN allows both sides to finish sending any remaining data before fully closing the connection.
Flow Control and Error Detection
The Window field implements TCP's flow control mechanism through a 16-bit value advertising how many bytes the receiver can accept beyond the acknowledged sequence number. This sliding window protocol prevents fast senders from overwhelming slow receivers. The 16-bit limit (65,535 bytes) became constraining on high-speed networks, leading to the Window Scale option that effectively extends this field to 30 bits.
Window values dynamically adjust based on available buffer space and processing capacity. A window of zero stops the sender entirely, creating a natural backpressure mechanism. The receiver can later send a window update to resume transmission, though TCP includes persist timer mechanisms to probe for window updates in case they're lost.
The Checksum field provides error detection covering both the TCP header and data payload. This 16-bit one's complement sum also includes a pseudo-header from the IP layer, ensuring segments aren't delivered to the wrong destination due to IP header corruption. While relatively weak by modern standards, the checksum catches most common transmission errors. TCP requires this checksum, unlike UDP where it's optional, reflecting TCP's commitment to reliability.
Urgent Data Mechanism
The Urgent Pointer field, meaningful only when URG is set, points to the sequence number of the last urgent data byte. This mechanism was designed to handle interrupt-type commands that need immediate attention, bypassing normal stream processing. The receiving TCP stack notifies the application about urgent data through special APIs, allowing it to process control commands even when the normal data stream is blocked.
TCP Options: Extending the Protocol
The Options field demonstrates TCP's extensible design, allowing new features without changing the core header format. Options use a Type-Length-Value (TLV) encoding, where each option specifies its type and length, enabling receivers to skip unrecognized options. This forward compatibility has allowed TCP to evolve significantly since its inception.
Common options include Maximum Segment Size (MSS) negotiation during connection establishment, Window Scaling to support larger windows on high-speed networks, Timestamps for round-trip time measurement and sequence number wraparound protection, and Selective Acknowledgment (SACK) for efficient retransmission of multiple lost segments.
TCP and Go-Back-N ARQ Relationship
While TCP doesn't use pure Go-Back-N, it incorporates similar concepts:
Similarities:
Sequence Numbers: Each byte has a sequence number
Acknowledgments: Receiver sends ACKs for received data
Retransmission: Lost segments are retransmitted
Window-based Flow Control: Limits outstanding unacknowledged data
Key Differences:
Selective Acknowledgment (SACK): TCP can acknowledge out-of-order segments
Cumulative ACKs: ACK number indicates next expected sequence number
Adaptive Retransmission: TCP uses RTT estimation for timeout values
Connection Establishment
Client Server
| |
| SYN, Seq=X |
|---------------------------------------->|
| |
| SYN+ACK, Seq=Y, Ack=X+1 |
|<----------------------------------------|
| |
| ACK, Seq=X+1, Ack=Y+1 |
|---------------------------------------->|
| |
| CONNECTION ESTABLISHED |
Step-by-Step Process:
Client → Server: SYN segment
SYN flag = 1
Sequence number = X (randomly chosen)
ACK flag = 0
Server → Client: SYN+ACK segment
SYN flag = 1, ACK flag = 1
Sequence number = Y (server's initial sequence)
Acknowledgment number = X + 1
Client → Server: ACK segment (can contain data!)
ACK flag = 1
Sequence number = X + 1
Acknowledgment number = Y + 1
This ACK can carry application data
Data Transfer
Client (Initial Seq = 1000) Server (Initial Seq = 2000)
| |
| Seq=1001, Ack=2001, Data="Hello" |
|---------------------------------------->|
| |
| Seq=2001, Ack=1006 |
|<----------------------------------------|
| |
| Seq=1006, Ack=2001, Data="World!" |
|---------------------------------------->|
| |
| Seq=2001, Ack=1012 |
|<----------------------------------------|
Key Points:
Sequence numbers increment by the number of data bytes sent
ACK numbers indicate the next expected sequence number
Each direction maintains its own sequence number space
TCP Connection States
Client (Active Open, client actively opens the connection) State Diagram:
CLOSED
|
send SYN |
----------- |
| ↓
| SYN_SENT ←─────────────────┐
| | |
| recv | | recv RST
| SYN+ACK | |
| send ACK | |
| ↓ |
| ESTABLISHED |
| | |
| send FIN | |
| ↓ |
| FIN_WAIT_1 |
| | |
| recv ACK | recv FIN+ACK |
| | send ACK |
| ↓ | |
| FIN_WAIT_2 | |
| | | |
| recv FIN | ↓ |
| send ACK | CLOSING |
| | | |
| ↓ ↓ |
| TIME_WAIT ←───┘ |
| | |
| timeout | |
| (2MSL) | |
| ↓ |
└─────────→ CLOSED ←──────────────────┘
Server (Passive Open, only opens when client wants to, otherwise remains passive) State Diagram:
CLOSED
|
bind()
listen()
↓
LISTEN ←──────────────────┐
| |
recv SYN | |
send SYN+ACK | |
↓ |
SYN_RCVD |
| |
recv ACK | | recv RST
| |
↓ |
ESTABLISHED |
| |
recv FIN | |
send ACK | |
↓ |
CLOSE_WAIT |
| |
send FIN | |
↓ |
LAST_ACK |
| |
recv ACK | |
↓ |
CLOSED ───────────────────┘
Connection Establishment - State Correspondence:
Step | Client State | Action | Server State | Action |
1 | CLOSED → SYN_SENT | Send SYN(seq=x) | LISTEN | Wait for connection |
2 | SYN_SENT | Wait for SYN+ACK | LISTEN → SYN_RCVD | Recv SYN, Send SYN+ACK(seq=y, ack=x+1) |
3 | SYN_SENT → ESTABLISHED | Recv SYN+ACK, Send ACK(seq=x+1, ack=y+1) | SYN_RCVD → ESTABLISHED | Recv ACK |
Data Transfer - State Correspondence:
Client State | Server State | Description |
ESTABLISHED | ESTABLISHED | Both can send/receive data bidirectionally |
ESTABLISHED | ESTABLISHED | Sequence numbers track data bytes |
ESTABLISHED | ESTABLISHED | ACKs confirm receipt of data |
Connection Termination - State Correspondence:
Step | Client State | Action | Server State | Action |
1 | ESTABLISHED → FIN_WAIT_1 | Send FIN(seq=x) | ESTABLISHED → CLOSE_WAIT | Recv FIN, Send ACK(ack=x+1) |
2 | FIN_WAIT_1 → FIN_WAIT_2 | Recv ACK | CLOSE_WAIT | Wait for application close() |
3 | FIN_WAIT_2 → TIME_WAIT | Recv FIN(seq=y), Send ACK(ack=y+1) | CLOSE_WAIT → LAST_ACK | Send FIN(seq=y) |
4 | TIME_WAIT → CLOSED | Wait 2MSL, then close | LAST_ACK → CLOSED | Recv ACK |
Half-Close States and Their Purpose
The TCP protocol supports an elegant feature called half-close, which allows one direction of the connection to terminate while the other remains open. This asymmetric closure is crucial for scenarios where one side has finished sending data but still needs to receive remaining information from the peer.
When a client initiates closure by sending a FIN packet and receives an ACK, it enters the FIN_WAIT_2 state. In this state, the client can no longer send application data but remains capable of receiving data from the server. This design accommodates common patterns like HTTP responses where a client finishes its request but the server may still be generating and sending a large response.
Conversely, when a server receives a FIN from the client, it enters the CLOSE_WAIT state after acknowledging the FIN. The server remains in this state until its application explicitly calls close(). During CLOSE_WAIT, the server retains full sending capability, allowing it to transmit any remaining data, complete ongoing operations, or send final status information before initiating its own closure sequence.
Special States and Edge Cases
The CLOSING state represents a relatively rare simultaneous close scenario where both endpoints send FIN packets before receiving the peer's FIN. This state exists because TCP must handle all possible timing scenarios gracefully. When both sides decide to close at nearly the same moment, they each send a FIN, creating a race condition. The CLOSING state ensures proper handling of this situation by requiring each side to acknowledge the other's FIN before proceeding to TIME_WAIT.
The TIME_WAIT state serves as TCP's final safeguard, maintaining the connection context for twice the Maximum Segment Lifetime (2MSL). This seemingly excessive wait period serves critical purposes: ensuring the final ACK reaches the peer, preventing old packets from contaminating new connections using the same port pairs, and allowing any delayed segments to expire harmlessly in the network.
The SYN_RCVD state includes an interesting property where it can transition back to LISTEN upon receiving a RST (reset) packet. This mechanism helps servers recover from half-open connections caused by client crashes or network issues, allowing the server to return to accepting new connections on that port.
The Foundation: CLOSED and LISTEN States
The CLOSED state represents the absence of any TCP connection context. It serves as both the initial state before any connection attempt and the final state after all connection resources have been released. This clean-slate approach ensures that each new connection begins without any residual state from previous connections, preventing confusion and ensuring predictable behavior.
The LISTEN state transforms a passive socket into an active connection acceptor. When a server application binds to a port and calls listen(), the TCP stack creates a structure ready to process incoming SYN packets. This state allows a single server socket to spawn multiple connection instances, each handling a different client. The server remains in LISTEN state indefinitely, processing SYN packets and creating new connection contexts in SYN_RCVD state for each valid connection request.
Connection Establishment States
The SYN_SENT state captures the client's position after initiating a connection attempt. The client has transmitted its SYN packet containing its initial sequence number and now awaits the server's response. This waiting period is necessary because TCP requires both sides to acknowledge each other's sequence numbers before data transfer can begin. The state includes timeout mechanisms that trigger SYN retransmission or connection abandonment if no response arrives within reasonable time bounds.
SYN_RCVD represents the server's intermediate state during connection establishment. Upon receiving a client's SYN, the server responds with SYN+ACK and enters this state. The server must wait for the client's final ACK to complete the three-way handshake. This state is particularly vulnerable to SYN flood attacks, where malicious clients send many SYN packets without completing the handshake, exhausting server resources with half-open connections.
The Data Transfer State: ESTABLISHED
The ESTABLISHED state represents a fully functional TCP connection where both sides have successfully synchronized their sequence numbers and confirmed the connection parameters. In this state, data flows bidirectionally with full reliability guarantees. Both endpoints can send and receive data simultaneously, with TCP's sliding window protocol managing flow control and congestion avoidance. The connection remains in this state indefinitely until either side initiates closure, making it the primary operational state for most TCP connections.
Connection Termination Sequence
FIN_WAIT_1 marks the beginning of active close from one side. When an application calls close(), TCP sends a FIN packet and enters this state. The endpoint waits for acknowledgment of its FIN, which confirms the peer has seen the close request. The state serves as a synchronization point, ensuring the closing intention has been communicated before proceeding. Depending on what arrives first (an ACK or a FIN from the peer), the connection transitions to either FIN_WAIT_2 or CLOSING state.
FIN_WAIT_2 represents a half-closed state where the local endpoint has finished sending data and received acknowledgment, but the peer hasn't initiated its own closure. This state accommodates asymmetric close scenarios where one side finishes before the other. The endpoint remains receptive to incoming data, allowing the peer to complete any ongoing transfers or computations before closing its side of the connection.
CLOSE_WAIT mirrors FIN_WAIT_2 from the peer's perspective. After receiving and acknowledging a FIN, the endpoint enters CLOSE_WAIT and notifies the application about the peer's closure. The application might need time to finish processing, flush buffers, or save state before closing. During this time, the endpoint retains full sending capability, ensuring graceful shutdown without data loss.
The LAST_ACK state occurs after an endpoint in CLOSE_WAIT finally calls close() and sends its own FIN. The endpoint now waits for acknowledgment of this FIN, which will confirm both sides have completed their closure sequences. This state ensures reliable delivery of the final FIN packet before releasing connection resources.
Understanding TIME_WAIT and Maximum Segment Lifetime
The TIME_WAIT state embodies TCP's commitment to reliable communication even during connection termination. After sending the final ACK in the closure sequence, the endpoint maintains connection state for 2MSL before fully closing. This design prevents several potential problems that could arise from immediate connection termination.
Maximum Segment Lifetime (MSL) represents the longest time a TCP segment can survive in the network before being discarded by routers due to TTL expiration or other mechanisms. Traditional implementations used 2 minutes as MSL, though modern systems often use shorter values like 60 or 30 seconds, reflecting improvements in network reliability and routing efficiency.
The 2MSL duration in TIME_WAIT serves multiple critical functions. First, it ensures the final ACK has sufficient time to reach the peer. If this ACK is lost, the peer will retransmit its FIN, and the endpoint in TIME_WAIT must be ready to retransmit the ACK. Second, it prevents delayed packets from the old connection from being misinterpreted by a new connection using the same port pairs. Without this protection, a new connection could receive data intended for the previous connection, causing data corruption or security issues.
Consider a scenario without TIME_WAIT protection: A connection between client port 5000 and server port 80 terminates. Immediately, a new connection establishes using the same ports. A delayed packet from the first connection, perhaps queued in a congested router, finally arrives and gets delivered to the new connection. The receiving TCP stack has no way to distinguish this old packet from legitimate new data, potentially corrupting the application's data stream or causing protocol violations.
Connection Termination (Four-Way Handshake)
Client Server
| |
| FIN, Seq=X |
|---------------------------------------->|
| |
| ACK, Ack=X+1 |
|<----------------------------------------|
| |
| FIN, Seq=Y |
|<----------------------------------------|
| |
| ACK, Ack=Y+1 |
|---------------------------------------->|
| |
| CONNECTION CLOSED |
Subscribe to my newsletter
Read articles from Jyotiprakash Mishra directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Jyotiprakash Mishra
Jyotiprakash Mishra
I am Jyotiprakash, a deeply driven computer systems engineer, software developer, teacher, and philosopher. With a decade of professional experience, I have contributed to various cutting-edge software products in network security, mobile apps, and healthcare software at renowned companies like Oracle, Yahoo, and Epic. My academic journey has taken me to prestigious institutions such as the University of Wisconsin-Madison and BITS Pilani in India, where I consistently ranked among the top of my class. At my core, I am a computer enthusiast with a profound interest in understanding the intricacies of computer programming. My skills are not limited to application programming in Java; I have also delved deeply into computer hardware, learning about various architectures, low-level assembly programming, Linux kernel implementation, and writing device drivers. The contributions of Linus Torvalds, Ken Thompson, and Dennis Ritchie—who revolutionized the computer industry—inspire me. I believe that real contributions to computer science are made by mastering all levels of abstraction and understanding systems inside out. In addition to my professional pursuits, I am passionate about teaching and sharing knowledge. I have spent two years as a teaching assistant at UW Madison, where I taught complex concepts in operating systems, computer graphics, and data structures to both graduate and undergraduate students. Currently, I am an assistant professor at KIIT, Bhubaneswar, where I continue to teach computer science to undergraduate and graduate students. I am also working on writing a few free books on systems programming, as I believe in freely sharing knowledge to empower others.