Istio Service Mesh Deep Dive: Architecture, Traffic Control, Secu

*I’m quite enjoying this “playful and whimsical“ image created by ChatGPT……

This continues from my previous blog post “Getting Started with Istio Service Mesh“.

We’ll be diving a bit deeper, specifically into Istio's architecture, traffic management, security, and observability features.

Recap

Let’s recap what a service mesh is and what Istio can help us with.

Istio is a service mesh that acts as an infrastructure layer for managing networking, security, and traffic control across microservices. It enables these capabilities without requiring any changes to the application code. Instead, Istio abstracts this functionality into a dedicated control plane and data plane, allowing teams to implement repeatable, complex configurations such as zero-trust security and advanced traffic management, independent of the applications running in their Kubernetes cluster or clusters.

Architecture

So what makes up Istio?

The Data plane

The Envoy proxy is typically injected into pods as a sidecar container. This sidecar container essentially takes over all the networking, working very closely with the other containers in the pod, all network calls in and out go via the Istio injected sidecar container. This is known as the Data Plane.

The Control Plane

The Control Plane part of this architecture is powered by the Istiod deployment in the istio-system namespace, which provides the service discovery, the configuration management and the certificate management.

Hopefully, this diagram helps

Istiod (shown with the Istio logo at the bottom) is the control centre, storing and distributing configurations.
Pods (e.g., App/Pod 1 and 2) contain both the application container and the Envoy sidecar proxy, injected at pod creation.
All ingress and egress traffic flows through the Envoy proxies, where traffic management, authentication, and security policies are enforced based on configurations propagated from the Control Plane (Istiod).

With these core components running and working together, we now have a proxy layer which are the Envoy proxies containers running in the pods, injected at pod deployment they can then handle the traffic control features, failover and fault injection, security and authentication features like enforcing security policies and access control.

You can start to imagine if we wanted to introduce this and add these capabilities to our application code it would start to look very different and get more complex very quickly.

Instead, Istio decouples these concerns from the application itself, allowing teams to focus on business logic while Istio handles network, security, and observability at scale.

Traffic Management

Now we’ve had a bit of a recap of what Istio is and why it might be a good idea to introduce a decoupled network layer to configure, intercept and mediate our mesh network, let’s deep dive into just some of the traffic management capabilities it offers.

Let’s start at Gateways, after all, we probably want to learn how we get traffic and requests from outside the Kubernetes cluster to our applications and how we can control and configure them to behave how we want.

When you first install and run Istio on a cluster you can start with a demo profile which will create an Ingress Gateway and an Egress Gateway. They are deployed as Kubernetes objects and essentially act as load balancers (I’m working with Google’s GKE, so that will create a Google Cloud Load Balancer) for incoming and outgoing network requests at the outer edges of the cluster.

Here’s mine running on a GKE cluster.

The load balancer service it creates in GKE which is a network load balancer in Google Cloud which forwards the requests to the nodes in the GKE cluster.

This is created when Istio is installed with the default profile, I don’t have an istio-egressgateway, using kubectl describe on the service in the istio-system namespace gives us some more info about the load balancer. The istio=ingressgateway label is useful for when we create our Ingress Gateway resource which acts as a load balancer at the edge of our cluster.

I’ll create a Gateway object. This will configure and point to the Istio ingressgateway which we have above and sets up a proxy to configure the Istio Ingress Gateway and tell it where to send the traffic request.

We’ll configure the hosts for *. I added this as a wildcard so I can access it without a domain name because I haven’t gotten around to setting up the DNS yet (It’s Friday, what are you gonna do?! :shrug)

But if we wanted to access the ingress based on a domain, for example ferrishall.dev. I would add that to the hosts field in the yaml file and add the istio-ingressgateway external IP address as an A record in a domain's DNS settings.

This is a super simple gateway code snippet I used to create the Gateway object. I’ll list all the sources that helped me learn and write this up at the bottom of this post.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - '*'

I was reading that there is support for both Istio Gateway and Kubernetes Gateway APIs and the Kubernetes Gateway API will soon be the default so I’ll probably have to re-learn and re-write this…. Documentation

I’ll spin up a hello-world application first so I have something to manage the traffic to.

The hello-world pod has 2 containers, 1 for the hello-world application and the 2nd for the Istio Envoy proxy sidecar.

Not shown, the hello-world container, but you can see just some of the istio-proxy container info when you run kubectl describe pod hello-world-xxxx.

So I’ve got an Ingress load balancer, a gateway and an application…. how do we tell Istio when it gets traffic from the ingress load balancer to route that request to hello-world? We need a Virtual Service.

Virtual Service

A Virtual Service is used to configure the actual routing rules to the backend services we want to send the traffic to, the hello-world service we just created for example.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: hello-world-vs
spec:
  hosts:
    - "*"
  gateways:
    - gateway
  http:
    - route:
        - destination:
            host: hello-world.default.svc.cluster.local
            port:
              number: 80

Again, I’m using the * in the hosts field because I haven’t set up DNS etc and we’re telling the virtual service that it is bound to the Gateway object called….. gateway and to please route requests to the backend service hello-world which I’m using the fully qualified domain name in the cluster to avoid any potential confusion with namespaces etc

Destination Rules & Virtual Services

Let’s try something else because that seems like a lot of work to get a hello-world application working!

I’ve deployed a web frontend and a customers backend application onto my cluster.

Now say there is a customers deployment version v2 which we want to test, we want to configure a percentage of some of the traffic to the v2 labelled pods to test some new features, ie we want to perform some canary testing.

We can create a Destination Rule for our service and define the two subsets representing v1 and v2 versions. A Destination Rule defines a policy that is applied to the traffic which is intended for a service but after the routing has taken place, rules can specify load balancing configuration. like ROUND_ROBIN, connection pool settings etc.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: customers-dr
spec:
  host: customers.default.svc.cluster.local
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

This Destination Rule is telling Istio there are 2 versions of the customers service, which are labelled version: v1 and version: v2, known as subsets. Both pod versions are under the same customer ClusterIP service.

So when I create my Virtual Service to direct the traffic to my customer’s service, I can configure it with some rout weighting options.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: customers-vs
spec:
  hosts:
    - 'customers.default.svc.cluster.local'
  http:
    - route:
        - destination:
            host: customers.default.svc.cluster.local
            port:
              number: 80
            subset: v1
          weight: 50
        - destination:
            host: customers.default.svc.cluster.local
            port:
              number: 80
            subset: v2
          weight: 50

So now Istio knows to send 50% of traffic to version: v1 and the other 50% to version: v2 of the customer’s pods in the customers service.

We didn’t have to change any of the application code or Kubernetes deployment manifest files, it’s all configured and deployed with these Istio yaml manifest files and decoupled from the application code and Kubernetes manifest yaml files.

I’m just scratching the surface and there is a lot more that can be done like matching traffic requests to a service based on request header content, traffic mirroring etc.

I’m just trying to keep it simple and in context to what hopefully, the people reading this might find helpful and relatable. When you get playing with it you’ll find your own advanced use cases and content!

Security

Istio also helps us apply security to our distributed applications using mutual TLS.

Here is an image taken from my Kiali dashboard, to demonstrate I can access the customer service directly and I can also access via the gateway IP address which is managed by Istio.

Peer Authentication

What I’ve done here is reset some things, I deployed the web-frontend application without the Istio proxy injection and deployed the customers application which does have the Istio proxy injected. I also updated the virtual service to send traffic to the customers application via the istio-ingressgateway load balancer IP.

The unknown entry in the dashboard is the web-frontend application because it doesn’t have the Istio proxy injected, Istio doesn’t know what or where it is and importantly, doesn’t have the security lock symbol on the traffic, it’s not protected with mTLS.

Proxies send plain text traffic between services that do not have the sidecar injected.

I updated the virtual service for customers to point to the gateway and send a curl command with the customers service URI in the header, the Gateway object has an Istio proxy sidecar injected so it will send the request traffic using mTLS.

(In a nutshell Mutual TLS is where both the client and the server have to authenticate and verify there identity, with TLS its one way where the server’s identity is authenticated by the client. This article on mTLS explains the differences nicely.)

So how can we enforce mTLS security? We can apply peer authentication configuration.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT

Now our applications, via the Istio Envoy proxy will only accept and transmit requests via mTLS connections but what does this mean for our web-frontend we deployed without the Istio proxy sidecar?….

Nope, not happening.

The current web-frontend hasn’t been deployed with the Istio proxy sidecar container injected into the pod so it’s trying to send requests to the customers service application pod via plain text. Istio configuration which has the PeerAuthentication configured to strict is saying no thanks, let’s redeploy the web-fontend with the Istio injection enabled….

That looks better!

Authorisation Policies

We can also control the flow of requests, just because all the pods in our cluster are deployed and managed by us does that mean they should have access to everything running the cluster? Does everything need access to that database? Or our customers application?

No of course not we want to be specific and use the principle of least privilege not just to users, service accounts and the permissions they have but to network connections and requests of the workloads running the in the cluster.

There’s where Authorisation Policies can help (I’m not writing Authorization, you can’t make me….)

We can configure what pods in which particular namespace and which principal (eg service account) is allowed or denied access to the pod or anything in the whole namespace right down to which operation they are allowed to perform on a particular path.

In our case, say we want to lock down access to the customers application to only accept requests from the web-frontend application and that is only allowed to come via the istio-ingressgateway load balancer.

First, we’ll flat out deny everything, giving us a bit of a clean slate.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
 name: deny-all
 namespace: default
spec:
  {}

Then we can apply the first authorisation policy.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-ingress-frontend
  namespace: default
spec:
  selector:
    matchLabels:
      app: web-frontend
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["istio-system"]
        - source:
            principals: ["cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"]

That covers the istio-ingressgateway authorised allowed to access the pods labelled app: web-frontend, we need the web-frontend to be allowed to access the customers application.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-web-frontend-customers
  namespace: default
spec:
  selector:
    matchLabels:
        app: customers
        version: v1
  action: ALLOW
  rules:
  - from:
    - source:
        namespaces: ["default"]
      source:
        principals: ["cluster.local/ns/default/sa/web-frontend"]

This covers the requests coming from web-frontend being allowed to access the customers pods which are labelled app: customers and the requests are coming from the default namespace and pods running as the web-frontend service account.

Let’s test that the deny rule is still working, I’ve just created a pod running a busybox curl image

This pod doesn’t have the expected principal or labels we specified in the Authorisation Policies so it’s getting RBAC: denied. Just as expected!

You might be thinking “Why not just use the network policy Kubernetes object?“ you absolutely can, but the key difference is network policies are layer 4 IP based network policies and the Istio authorisation policies are layer 7 so you can add HTTP header checks and other attributes. So you just get more granular options with Istio authorisation policies than network policies, all depends on your use case at the end of the day.

Again, I’m just scratching the surface here. There are a lot more security and Authorisation Policies you can adopt and deploy with Istio. More info you can find here

Observability

Finally, we come to observability. We’ve done some cool things with Istio but another feature it adds to is giving us insight using logs and metrics showing us what is actually happening within our distributed network.

Also, wouldn’t it be nice to measure the performance and alert us when performance starts degrading before it gets unacceptable to our users? That’s what we’ll take a look at in this final section of our deep dive.

Proxy Metrics

The Istio Envoy Proxies produce metrics and logs. Collecting valuable insight in the form of metrics about the traffic passing in and out of the proxies. Documentation here.

Service Metrics

Istio also provides metrics at the service level, the Istio addons via the GitHub repo supply some very handy default dashboards for visualising in Grafana.

Control Plane Metrics

Istio itself should also be monitored, so you can keep an eye on the performance and that it is behaving as expected, especially as you scale.

Kiali dashboard

Now who doesn’t love a good dashboard?! The Kiali dashboard is a nice way to get started and get some visual insight into your applications and the Istio proxies.

Grafana

You might be acquainted with Prometheus and Grafana so I won’t spend any time explaining what and why. You can find some default dashboards with the samples/addons directory of the Istio GitHub repo that provides visibility using the metrics collected from Prometheus.

Visualise the metrics for the services we have the Envoy Proxies injected into. The customers application metrics.

You’ll find a really nice set of addons in the Istio GitHub repo to help you get started with observability in Istio and the applications or services we are using and proxying with Istio.

I’ll leave observability there, there’s a lot more to discover but it’s important to take away that metrics, logs etc are all things that Istio adds. This is super handy as our applications and architecture get more complex, more distributed and start scaling, that we have as much insight into performance and will help us troubleshoot any issues that might arise.

The more data we have, the more informed technical and business decisions we can make!

Conclusion

Sorry, that was a bit longer than I was aiming for but you know how it is when you get into it.

In conclusion, Istio provides a robust and flexible service mesh solution that enhances the management of microservices in Kubernetes environments. By decoupling network, security, and observability concerns from application code, Istio allows development teams to focus on business logic while ensuring efficient traffic management, robust security through mutual TLS, and comprehensive observability.

The integration of tools like Kiali and Grafana further enriches the user experience and observability by providing valuable insights into service performance and network behaviour.

As you continue to explore Istio, especially in multi-cluster environments, you'll discover even more advanced capabilities that can further optimize your microservices architecture.

That’s all folks!

Hopefully, this was a handy deep dive into Istio, I’ve been learning and tinkering with Istio for the last few months and really having some fun with it (proper nerdy, I know…) writing about it helps me understand and commit to memory!

Hopefully, my words help explain some of the parts and use to beginners starting with Istio or just why would you use it in the first place.

I’m going to be taking my learning to multi cluster Istio service mesh, so hopefully I’ll write something up about that in the near future!

As always, really interested to hear what people’s thoughts are and what alternatives to try out. (Linkerd is next on my list) and if I got anything wrong please let me know!

Helpful links

Istio documentation

Getting started with Istio

Istio GitHub repository

Introduction to Istio Linux Foundation course (Free!)

Linux Foundation Intro to istio GitHub repository

Disclaimer: I have no affiliation with the Linux Foundation, this course or its authors, or claim to have created any of this code, but it is very helpful to use for learning this topic and recommend it very highly!

Istio Service Mesh Deep(ish) Dive: Architecture, Traffic Control, Security and Observability

Table of contents