Monitor gRPC Microservices in Kubernetes with Amazon X-Ray

Microservice architecture is typically useful to solve certain scaling problems where service decoupling/segregation is required to improve development velocity, make service more fault tolerant or handle performance hotspots.

However, everything comes with a price and so does microservice. One typical issue is:

While this is half joking, monitoring and fault resilency are definitely more challenging in microservice world. While there are frameworks like Hystrix and resilience4j to handle circuit breaking, rate limiting and stuff like that, this post focuses on the first thing: how the heck are my services talking to each other?

AWS X-Ray can fill the gap here by offering service mapping and tracing and thus you can see something like

Compared to generic service monitoring, X-Ray has some additional benefits around AWS ecosystem in that it will auto expose your AWS resource write (yes only write unfortunately) call insights when you use AWS SDK. This applies to SQS, SNS and DynamoDB.

But first of all, you need to understand how X-Ray works:

For inter-service communication, gRPC is often used. Compared to JSON over REST, gRPC offers more flexibility around query design and better performance thanks to the efficiency of (de)serialization with protobuf and the usage of http2 multiplexing. The extra typing and backward compatibility from protobuf also help documentation and maintenance, improving the overall quality of service quorum.

However, while X-Ray SDK offers J2EE servlet filter for general http servers, gRPC does not follow that. The canonical gRPC Java implementation uses netty and has no knowledge around that.

This means we’d have to write some custom code. Unfortunately the documentation around that is next to none. Luckily, gRPC has implicit support via io.grpc.ServerInterceptor and io.grpc.ClientInterceptor so it’s just a matter of how to wire pieces together.

Overall there are 4 steps:

Let’s do this step by step:

Set up Kubernetes daemonset

There’s an example offered by Amazon regarding how to install it: link

Grant permission to Kubernetes nodes

This is a bit tricky depending how your kube cluster is set up.

If you use EKS/EC2, you need to grant X-Ray write permission by attaching the canned policy to your IAM role for the worker nodes.

If you host your kubenetes outside AWS ecosystem, well chances are you don’t need X-Ray but something generic like Istio’s sidecar approach. But if you do need it then you can create IAM users, attach the policy and use these users in your code.

Write/use interceptors in code

(Update: added repo here)

First, we need to make sure we use the same language between server and client. In typical HTTP this is the headers. In gRPC this is the metadata, keyed by Key.

public class Keys {    public static final Metadata.Key<String> TRACE_ID_HEADER = Metadata.Key.of(“traceId”, Metadata.ASCII_STRING_MARSHALLER);
public static final Metadata.Key<String> PARENT_ID_HEADER = Metadata.Key.of(“parentId”, Metadata.ASCII_STRING_MARSHALLER);
}

Now, let’s implement client interceptor.

First you need some X-Ray stuff in classpath (assuming Gradle is used for dependence managmement, should be similar for maven/ivy/sbt):

dependencies {
compile (
"com.amazonaws:aws-xray-recorder-sdk-core",
"com.amazonaws:aws-xray-recorder-sdk-aws-sdk",
)
}

Note for demonstration purpose the verion is omitted here, for actual usage you should peg the latest version at the time.

If you want X-Ray to instrument your AWS resource calls, you also need:

compile(“com.amazonaws:aws-xray-recorder-sdk-aws-sdk-instrumentor”)

Now the code:

There’s quite a lot of code here but the key gotchas are:

After that, wire it up when you build the client:

newBlockingStub(channel).withInterceptors(new XRayClientInterceptor());

Next, let’s build the server side interceptor:

This has some extra flavors in that it assumes you use a spring based gRPC server like the LogNet Springboot one. The GRpcGlobalInterceptor would tell the runner to inject the interceptor automagically. If that’s not the case, that’s fine, just replace the appName with some other logic, and wire up the interceptor using ServerInterceptors.intercept(serviceDefinition, interceptors).

Route metrics to X-Ray daemon

Last but not least, we need to tell X-Ray SDK to forward them to our daemon. This is done by adding specific environment variable in your Kubenetes deployment yaml config:

spec:
containers:
...
— name:
...
env:
— name: AWS_XRAY_DAEMON_ADDRESS
value: xray-daemon:2000

The value corresponds to your daemon name.

AWS_XRAY_DAEMON_ADDRESS will be read by AWS SDK at runtime.

Done

And that’s it. Just deploy the apps to kube cluster. Bear in mind that the service map is bound to time range. It won’t show up until you get traffic across your apps. And if you have traffic split like A/B testing or service migration, you’ll see how things evolve over time, which is pretty cool.

Originally published at xunnanxu.github.io on November 25, 2018.

Software Engineer in Bay Area