[tech] Temperature Monitoring in Server Room [repost]
Melissa Star
melissa at netexperts.com.au
Tue Mar 19 12:20:21 AWST 2019
Hi David,
Smartmontools get the info in a way that is easy to parse, and by having a chron job write its output to a file that can be read by a non-privileged user, PHP doesn't need to be able to execute anything or otherwise make a hole in the security model.
Writing a server status web-page, which I've said I will do for Ashera-security over the winter break, makes sense to me, and I also have a compelling commercial reason to do it since I run multiple servers that would get me in trouble if they lose data even between periodic backups, and previously relied on data centres to look after this stuff for me. If I want to move from a "dedicated server" to a "colocation" model and be responsible for my own hardware, I need to implement such tools in any case, and they need to either aggregate multiple servers, sending me email or SMS warnings, or both.
What has occured to me (and thanks to UCC I am learning and growing professionally!) is that SSDs wear at fairly predictable amounts and that an SSD RAID is therefore likely to fail catastrophically once its drives have exceeded total data written limits, and if they are intel drives (mine are at OVH), they apparently go into read only mode and then self destruct at very close to to the TDW limit
Regards,
Melissa
> On 18 Mar 2019, at 9:58 pm, David Adam <zanchey at ucc.gu.uwa.edu.au> wrote:
>
> On Mon, 18 Mar 2019, Melissa Star wrote:
>> I just realised - if you have smartmontools installed on linux machines,
>> each hard drive or SSD will provide its “Airflow Temperature”, which I
>> can extract via script.
>>
>> I'm thinking of centralising this for all the servers I run, and
>> collecting the data to chart, having a display at home that gives me
>> live info for all machines under my control.
>
> We used to do this on all the servers, but I think evil is the only one
> still running:
> https://ucc.asn.au/stats/
> It reads the fan and temperature data from lm-sensors (run `sensors` on a
> bare metal machine to get an idea of what's available), plus various
> system statistics, and writes them into some custom RRDs. It is
> approximately zero fun to maintain.
>
> Collectd (https://github.com/collectd/collectd) has both SMART and
> lm-sensors plugins, and was the most sensible tool for our use last time I
> checked, so if we were going to set anything up I'd start with that. It
> hasn't been updated for a couple of years but is fairly mature. You don't
> need root access to start playing around with it.
>
> The fanciest option would be to write a Cockpit plugin
> (https://cockpit-project.org/ currently available at
> https://secure.ucc.asn.au/missioncontrol/), but we don't have the
> timeseries store stuff set up in that and it sounds like a lot more work.
>
> David Adam
> zanchey
More information about the tech
mailing list