My problem with zabbix docker container monitoring
Problem
The default zabbix template for container monitoring has several problems:
- Offline containers are not discovered any more and don’t notify an error (e.g. containers that have no restart policy configured in docker compose) ⇒ Errors do not appear where you would expect them
- While setting up new containers, zabbix already notifies errors that can be confusing for other team members ⇒ Errors appear where you don’t want them
Example
On the host there are currently 2 containers: mysql-prod
and mysql-dev
:
We assume that all containers are deployed via docker compose
and the mysql-prod
has no restart policy configured.
- The
mysql-prod
container is mandatory, because it is used by other containers in production. After restart of the host server the container don’t start automatically (due to missing restart policy indocker-compose.yml
). We would assume that zabbix will send a error notification so that we can fix the problem immediately. But because the container is not discovered anymore, in some cases no error notification will be triggered - The
mysql-dev
container is unimportant for monitoring purposes. Imagine we want to test a new version of the mysql image which causes errors on container start. This will lead into a error notification by zabbix what might be confusing for admins, because it’s only dev.
Next we assume that we want to test a new service (e.g. grafana
). We start the container, configure something wrong and immediately trigger a error notification which is not a desired behaviour for initial setup. The notification might be eventually okay when we transition from setup to production phase and depend on the availability of the container.
The conceptional solution
We make use of the macros of the default docker monitoring template:
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES}
=> Add names of all containers that are important for production and should trigger an error, when down.{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES}
⇒ Add all names to dev-containers here that you don’t want to monitor
Create a item that makes sure all production containers are running:
- Scan all containers of macro
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES}
- Check if the container is contained in
docker ps
- Create trigger that alerts an
error
if there is one required container that is not running
Create a discovery for undecided containers.
- Scan for running container names with
docker ps
- Subtract names of
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES}
and{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES}
- Create a
warning
trigger for every undecided container
Why trigger for every undecided container and not only one trigger?
Each trigger can be suppressed separately, until we decide wether it will become a production container.
Implementation in Zabbix
I didn’t implemented my solution yet. I will eventually add the implementation later on. Feel free to send screenshots to my me (e.g. medium@maax.gr) if you have implemented my solution and i should add the screenshots here.
Have a nice day!
Max