$$ High Performance Cluster Monitoring System Implementation Log - 7.14.2024 $$

https://youtu.be/bLwp75_rHOU

Author: MrDew

Date: July 14, 2024

Objective: Implement a comprehensive monitoring system for our high-performance cluster using Prometheus and Grafana.

Environment:

Initial State:

Phase 1: Enable Docker Metrics on all Nodes

Challenge: Docker was not exposing metrics on any nodes besides the manager node where Prometheus was running.

Solution:

  1. Identify Operating System: Determine the operating system running on each node to use the correct package manager (apk for Alpine Linux, apt-get for Ubuntu).

  2. Install Docker (if necessary):

  3. Configure Docker Daemon: Create/edit the /etc/docker/daemon.json file on each node:

       `{
    

    "metrics-addr" : "0.0.0.0:9323", "experimental" : true }`

    content_copy Use code with caution.Json

  4. Restart Docker: Restart the Docker daemon on each node to apply the changes.