Making OpenTelemetry auto instrumentation work with poetry and Kubernetes
OpenTelemetry has this concept of “zero-code” instrumentation also known as auto-instrumentation. Auto instrumentation uses monkey-patching to instrument popular libraries present in your python environment. This lends itself to an almost “magical” zero-code instrumentation experience which does a lot of heavy lifting when it comes to providing observability for your applications.
Most of the services I’m interested in instrumenting are written in Python and run on Kubernetes clusters. They also use poetry as the package manager and to run the application.
The good news is that OpenTelemetry offers its own Kubernetes operator capable of auto-instrumenting code running in Kubernetes clusters. It accomplishes this by injecting an init container that contains all the instrumentation code. However, when I implemented the OpenTelemetry operator and configured auto-instrumentation, I discovered that my services weren’t emitting any metrics or traces.
The problem turned out to be with my application setup. I use the command poetry run app
to launch all my services. Under the hood, this activates the virtual environment poetry created specifically for my application and loads the application within it. When the OTel auto instrumentation code executes, it searches for the list of libraries installed in the default environment, and since poetry manages its own isolated virtual environment, the auto instrumentation code could not find any relevant libraries to patch!
How do we fix this?
Before we get to the solution, let’s understand what happens when the application starts. The Kubernetes pod specification specifies the command to run is something like poetry run app
. When this command executes, poetry (installed in the root environment) activates the virtual environment it created specifically for the application. It then runs the application code inside that virtual environment.
Now let’s see how the auto instrumentation code works. Auto-instrumentation magic in Kubernetes uses mutating webhooks, which is a Kubernetes feature that allows you to watch for any changes to a pod definition and respond accordingly. The controller watches for a special auto-instrumentation annotation on our pods. When it finds a suitable pod, it injects an auto-instrumentation container as init container. This init container runs before the actual application container. This container is simple: it copies instrumentation code to an emptyDir
volume and modifies the PYTHONPATH
environment variable to include the path to this volume
If you’re familiar with Python, you might know about the sitecustomize.py file. It’s a special file from the site
module that gets executed before the main module whenever it’s found in one of the paths listed in the PYTHONPATH
variable. When you invoke poetry, due to the auto-instrumentation code having added its own module to the PYTHONPATH
, its own sitecustomize.py file is invoked before poetry has a chance to start.
This code loads various exporters, configures endpoints, and calls the load_instrumentors
function that handles the patching. Without going into too much detail, it essentially searches for popular libraries and knows how to patch them with functionality needed to emit traces, logs, or metrics.
When poetry is invoked, the sitecustomize.py file within that path executes and invokes the auto instrumentation code that examines the installed modules and patches these libraries in the current Python environment.
Before executing the application code, poetry overrides the PYTHONPATH
back to its default value thus removing the path that was appended by the original init container. So, when the application code initializes, the auto instrumentation code does not execute and hence the libraries are not patched. This issue could be fixed easily if poetry allowed adding custom paths to the PYTHONPATH
variable but it does not appear to be something poetry maintainers want to implement.
One way around this situation is to manually append this path back before poetry is able to initialize our application code. That way, when the application is started, it will inadvertently call the same initialization code again, this time finding all the libraries required by the actual environment.
So I modified the opentelemetry auto instrument init container image and added these two lines to the sitecustomize.py
file.
Now our sitecustomize file looks like this
# Copyright The OpenTelemetry Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# <http://www.apache.org/licenses/LICENSE-2.0>
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from opentelemetry.instrumentation.auto_instrumentation import initialize
initialize()
import os
otel_path = '/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation'
current_python_path = os.environ.get("PYTHONPATH", "")
if otel_path not in current_python_path:
os.environ['PYTHONPATH'] = f'{current_python_path}:{otel_path}'
When poetry runs the application, the auto-instrumentation code executes first, allowing it to see all necessary libraries.
Saved me a lot of time and helped me test OTel implementations and compare it with vendor specific agents quickly.