Skip to main content

monitoring

Monitoring

The first thing you'll need to run is

kubectl get pods -A

This will give you a breakdown of all the pods running in the infrastructure, this is an example:

NAMESPACE              NAME                                                     READY   STATUS                   RESTARTS        AGE
argocd argo-argocd-application-controller-0 1/1 Running 0 46h
argocd argo-argocd-applicationset-controller-7fdfdc786f-hzbwv 1/1 Running 0 16d
argocd argo-argocd-dex-server-654c59ffdf-mj77j 1/1 Running 0 16d
argocd argo-argocd-notifications-controller-7987fc45b5-jgxwr 1/1 Running 0 16d
argocd argo-argocd-redis-7f845dff5-97dch 1/1 Running 0 16d
argocd argo-argocd-repo-server-9c7bf9cb-zfdcj 1/1 Running 12 (25h ago) 16d
argocd argo-argocd-server-86b5d7ddcf-686kp 1/1 Running 0 46h
bot-dev bot-dev-5d95dfc74c-6kjk9 1/1 Running 1 (114m ago) 8h
bot-dev bot-dev-6b6f9f776d-8jnts 0/1 ContainerStatusUnknown 1 28h
bot-dev bot-dev-6b6f9f776d-xtf9z 0/1 ContainerStatusUnknown 1 25h
bot-dev bot-dev-85db98944b-slj7p 0/1 ContainerStatusUnknown 2 (33h ago) 46h
bot-mat bot-mat-b5c7d6c55-xs9zh 1/1 Running 0 7h37m
bot-prod bot-prod-7967fff9bf-sbj47 1/1 Running 0 8h
calico-system calico-kube-controllers-5598966f4d-xlt4b 1/1 Running 0 16d
calico-system calico-node-x4n8x 1/1 Running 3 (46h ago) 16d
calico-system calico-node-x8wbh 1/1 Running 1 (4h14m ago) 10h
calico-system calico-typha-d78cf44dc-rv2bz 1/1 Running 0 16d
kube-system cloud-node-manager-5qkc2 1/1 Running 0 10h
kube-system cloud-node-manager-v78ms 1/1 Running 0 16d
kube-system coredns-autoscaler-5589fb5654-5gx7t 1/1 Running 0 16d
kube-system coredns-b4854dd98-gk44p 1/1 Running 0 16d
kube-system coredns-b4854dd98-nmkrp 1/1 Running 0 16d
kube-system csi-azuredisk-node-l8bfw 3/3 Running 0 10h
kube-system csi-azuredisk-node-rvmlx 3/3 Running 0 16d
kube-system csi-azurefile-node-fkpgg 3/3 Running 0 10h
kube-system csi-azurefile-node-q8khz 3/3 Running 0 16d
kube-system konnectivity-agent-bc779f4c7-2gr86 1/1 Running 0 16d
kube-system konnectivity-agent-bc779f4c7-mdbmq 1/1 Running 0 16d
kube-system kube-proxy-nlvhd 1/1 Running 0 16d
kube-system kube-proxy-vbwtj 1/1 Running 0 10h
kube-system metrics-server-f77b4cd8-k9v6f 1/1 Running 0 16d
kube-system metrics-server-f77b4cd8-zkfhl 1/1 Running 0 16d
kubernetes-dashboard dashboard-metrics-scraper-6f669b9c9b-mjzfj 1/1 Running 0 33h
kubernetes-dashboard kubernetes-dashboard-758765f476-hvzdz 1/1 Running 0 33h
mariadb mariadb-67f5f7896c-bcw2r 0/1 ContainerStatusUnknown 1 2d
mariadb mariadb-67f5f7896c-ng5df 1/1 Running 0 24h
tigera-operator tigera-operator-7c546fc6d5-hjdbc 1/1 Running 9 (25h ago) 16d

You see that the dev bot is struggling a bit with multiple reboots and dead pods, which is to be expected as its allocated resources are the lowest in the cluster, given the marginal importance. You can also see that the production bots (prod and mat) are both apparently healthy and the database is fine with only 1 reboot 24h ago.

You can take a deeper look into the monitoring with ArgoCD. Start by getting the argocd password using

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

then configure the port-forward to your localhost:8081

kubectl port-forward -n argocd :argo-argocd-server_POD_NAME: 8081:8080

now connect to argoCD via localhost:8081. Find an overview of argocd here