💣The Fundamental Failures of Fibre Network Operators in Network Management 🤯

Ronald BartelsRonald Bartels
5 min read

Fibre network operators have become critical players in delivering high-speed internet to homes and businesses worldwide. However, despite the promise of blazing-fast connectivity, many operators struggle to manage their networks effectively, leading to congestion, latency, and frustrated customers. At the heart of these issues lie outdated backhaul capacities, oversubscription practices, and inadequate monitoring systems—all of which expose the fundamental failures in their operational and commercial strategies.

The Backhaul Bottleneck | A Legacy of 1Gbps Constraints

A significant number of fibre network operators rely on backhauls— the connections that link local networks to the broader internet or core infrastructure—with capacities as low as 1 gigabit per second (Gbps). This might seem sufficient when most customers subscribe to plans of 100 megabits per second (Mbps) or less. However, this assumption unravels during periods of high demand. For instance, when companies like Microsoft or Apple release large software updates, millions of devices simultaneously download gigabytes of data, driving traffic levels far beyond typical usage patterns.

The problem is exacerbated because these 1Gbps backhauls often serve not just a single site but an entire Optical Line Terminal (OLT) or headend, which can support hundreds or thousands of users. In broadband networks, the OLT acts as a central hub in a shared service model, meaning all connected customers collectively contribute to the traffic load. When this load exceeds the backhaul’s capacity, congestion occurs, resulting in packet loss, increased latency, and a degraded user experience.

Oversubscription | A Commercial Necessity with Technical Trade-offs

Why do operators design their networks with such limited backhaul capacity? The answer lies in oversubscription, a cornerstone of their commercial model. Oversubscription is the practice of selling more bandwidth to customers than the network can physically deliver at any given time, banking on the fact that not all users will demand peak capacity simultaneously. For example, an operator might provision a 1Gbps backhaul to serve 100 customers, each with a 100Mbps plan, assuming that typical usage will only require a fraction of the total subscribed bandwidth.

This approach allows operators to maximize revenue while minimizing infrastructure costs—a key driver of profitability in a highly competitive market. However, it leaves little margin for error. When unexpected traffic spikes occur, such as during major software updates or streaming events, the oversubscribed backhaul becomes a choke point. Congestion isn’t limited to the OLT level either; it can propagate further upstream if the backhaul itself connects to an oversubscribed higher-tier link, compounding the issue.

Broadband & Shared Services | A Double-Edged Sword

To understand why this congestion is so pervasive, it’s worth exploring how broadband networks function as shared services. In fibre-to-the-home (FTTH) deployments, technologies like Gigabit Passive Optical Networks (GPON) use an OLT to distribute bandwidth to multiple customers over a single fibre strand. This shared architecture is cost-effective, as it reduces the need for dedicated infrastructure per user. However, it also means that the available bandwidth—both at the OLT and its backhaul—is a finite resource split among all active users.

When traffic levels surge, the shared nature of the service amplifies the impact. If dozens of households on the same OLT begin downloading a 5GB update simultaneously, their collective demand can easily overwhelm a 1Gbps backhaul. The result? Slowdowns, buffering, and dropped connections—issues that customers attribute to poor service rather than the inherent limits of a shared system.

LAG | A Band-Aid for Capacity, Prone to Silent Failures

To mitigate backhaul limitations, operators often deploy Link Aggregation Groups (LAGs). A LAG combines multiple physical links—say, two or more 1Gbps circuits—into a single logical connection, effectively increasing capacity. For example, a two-member LAG could provide 2Gbps of throughput, offering a buffer against congestion. This technique is widely used to scale backhaul capacity without overhauling the underlying infrastructure.

However, LAGs introduce their own vulnerabilities. If one member link fails, the total capacity drops—e.g., from 2Gbps to 1Gbps in a two-member setup. Traffic that once flowed smoothly may now exceed the reduced capacity, leading to congestion. Worse, these failures often go unnoticed because operators rely on inadequate monitoring tools.

The Monitoring Misstep | ICMP & Bit Rates vs. SNMP

Effective network management hinges on robust monitoring, yet many operators fall short here. A common practice is to use Internet Control Message Protocol (ICMP) pings and bit rate graphs to assess network health. ICMP, essentially a “ping” test, checks if a device is reachable, while bit rate graphs track the volume of data passing through an interface. On the surface, these metrics might suggest all is well: the backhaul responds to pings, and traffic appears to flow.

But this approach is deeply flawed. ICMP doesn’t reveal interface-level issues, such as a failed LAG member, because it only tests end-to-end connectivity, not the health of individual links. Similarly, bit rate monitoring shows aggregate throughput but won’t flag a drop in capacity if one link in a LAG goes down—traffic simply squeezes through the remaining links, masking the problem until congestion emerges.

A far superior method is Simple Network Management Protocol (SNMP), which allows operators to poll specific interface statuses and metrics. SNMP can detect if a LAG member is offline, report errors on individual ports, and provide granular insight into network performance. Yet, many operators skimp on implementing SNMP-based monitoring, either due to cost, complexity, or a lack of expertise. The result? Silent failures persist, and customers bear the brunt of the fallout.

The Customer Experience & Operator Denial

When backhauls congest or LAGs degrade, customers notice immediately—streaming stutters, downloads crawl, and online gaming becomes unplayable. Yet, when they contact the operator’s Network Operations Center (NOC), they’re often met with denial: “Our systems show no issues.” This disconnect stems from the operator’s reliance on rudimentary tools like ICMP and bit rate graphs, which fail to expose the root cause.

The fundamental failure here isn’t just technical—it’s systemic. Operators prioritize cost-cutting over resilience, oversubscribing networks to boost profits while neglecting the monitoring and management systems needed to ensure service quality. Support and service assurance processes crumble under this approach, leaving NOCs ill-equipped to diagnose or resolve issues promptly.

Wrap | A Call for Accountability & Investment

Fibre network operators tout the transformative power of their services, but their operational shortcomings undermine this promise. Backhauls stuck at 1Gbps, oversubscription as a business model, and inadequate monitoring practices like ICMP over SNMP reveal a troubling pattern: a focus on short-term gains at the expense of long-term reliability. Until operators invest in higher-capacity backhauls, adopt robust network management systems, and prioritize customer experience over blind cost efficiency, these fundamental failures will persist. Customers deserve better—and it’s time operators deliver.


10
Subscribe to my newsletter

Read articles from Ronald Bartels directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ronald Bartels
Ronald Bartels

Driving SD-WAN Adoption in South Africa