[tech] Merry Christmas!  Have some monitoring!

tec tec at ucc.gu.uwa.edu.au
Tue Dec 24 23:14:13 AWST 2019


Hi All,
As a Christmas gift to UCC (of sorts), may I present: A server monitoring solution!
I’ve just finished this up. It was deployed using Ansible, and you can find the config at gitlab.ucc.asn.au/tec/ansiblemonitoring This solution installs grafana for the web GUI, and prometheus for the database. Data is pulled from machines over http on :9100 using node exporter (don’t worry it’s not nodejs [it’s go], also installed using Ansible).
Another promising aspect is Grafana’s thorough support for alerts. Among the many other supported services, we can add a discord webhook to alert when, say, CPU usage on a server is over 90% for a continuous period of at least 5 minutes (for example).
The monitor host installed on a new VM, ucc-monitor (164) . The setup ended up being more of a pain that it should (VLAN tagging issue), thanks to Dylan for helping with that :)Demo
How to viewssh -L 3000:uccmonitor:3000 motsugo.ucc.asn.au

Then just view http://localhost:300 in your browser and log as admin with the password changeme.Client Monitoring
For Deb10 machines, the prometheus-node-exporter package was installed from the default repo. Unfortunately, as the servers are or Deb9 and the ‘Monitor Host’ is on Deb10 the 4-year mismatch in package version seems to somewhat break things. As a result, they used the cloudalchemy.node-exporter role instead. This:
 * 
Downloads and installs a recent prometheus-node-exporter binary * 
Creates a node-exp user and group to run the binary as non-rootErrors
The logs relevant logs are here.
This ran successfully for all listed targets except for:Molmolfatal: [molmol.ucc.asn.au]: FAILED! => {"changed": false, "module_stderr": "Shared connection to molmol.ucc.asn.au closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python: not found\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 127
Motsugo
Not sure why this failed.Cerberusfailed: [cerberus.ucc.asn.au] (item=sysadmin-basic) => {"attempts": 5, "changed": false, "item": "sysadmin-basic", "msg": "No package matching 'sysadmin-basic' is available"}
Monitor Host
This had a lot more happening, using the roles from cloudalchemy. The ansible stdout of the first run is here, and the second is here. Everything seems to work.
At the moment it is not exposed to the world, but I think it would be nice if it could be found somewhere like https://monitor.ucc.asn.au in the future. If nothing else it’s a cool thing to show prospective members.Maintenance
Re-running the ansible scripts every now and then would probably be a good idea. That said the prometheus-node-exporter should get updated as would any other package.
Due to version issues, I ran into issues with the few year old ansible version on all of our Deb9 user servers. Hence, ucc-monitor was deployed using my home computer, and the monitoring software was deployed from ucc-monitor.More Screenshots


​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20191224/3d8b999f/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCD-5E022B80-D3-5CB41980
Type: image/png
Size: 167218 bytes
Desc: not available
Url : https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20191224/3d8b999f/attachment-0004.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCD-5E022B80-D5-5CB41980
Type: image/png
Size: 224483 bytes
Desc: not available
Url : https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20191224/3d8b999f/attachment-0005.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCD-5E022B80-D7-5CB41980
Type: image/png
Size: 87080 bytes
Desc: not available
Url : https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20191224/3d8b999f/attachment-0006.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCD-5E022B80-D9-5CB41980
Type: image/png
Size: 272448 bytes
Desc: not available
Url : https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20191224/3d8b999f/attachment-0007.png 


More information about the tech mailing list