$$ High Performance Cluster Monitoring System Implementation Log - 7.14.2024 $$
Author: MrDew
Date: July 14, 2024
Objective: Implement a comprehensive monitoring system for our high-performance cluster using Prometheus and Grafana.
Environment:
Initial State:
Phase 1: Enable Docker Metrics on all Nodes
Challenge: Docker was not exposing metrics on any nodes besides the manager node where Prometheus was running.
Solution:
Identify Operating System: Determine the operating system running on each node to use the correct package manager (apk for Alpine Linux, apt-get for Ubuntu).
Install Docker (if necessary):
Alpine Linux:
`sudo apk update
sudo apk add docker sudo addgroup ${USER} docker sudo rc-service docker start sudo rc-update add docker default docker --version`
content_copy Use code with caution.
Ubuntu:
`sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install -y docker-ce docker --version`
content_copy Use code with caution.
Configure Docker Daemon: Create/edit the /etc/docker/daemon.json file on each node:
`{
"metrics-addr" : "0.0.0.0:9323", "experimental" : true }`
content_copy Use code with caution.Json
Restart Docker: Restart the Docker daemon on each node to apply the changes.
Alpine Linux:
`sudo rc-service docker restart`
content_copy Use code with caution.
Ubuntu:
`sudo systemctl restart docker
sudo service docker restart`
content_copy Use code with caution.