[tech] Tech/Wheel Meeting 2022-07-19 18:30 - 24 hour reminder

wheel-reminder at ucc.asn.au wheel-reminder at ucc.asn.au
Mon Jul 18 18:30:01 AWST 2022


Tech/Wheel Meeting Agenda - Tuesday 2022-07-19T18:30
====================================================
- VENUE: UCC Clubroom
  - and online at https://meetings.ucc.asn.au/b/tech

*Meeting opened HH:MM*

## Attendance
- Present
- Apologies
- Absent

## Next meeting
- Schedule next meeting
  - *day 2022-mm-ddTHH:MM
- ACTION: [???] shall be this meeting's secretary! This entails recording minutes for meeting n (beware mid-meeting glitches) and ensuring meeting n+1 reminders succeed:
  - Checklist follows:
    - Clone and reopen a new issue from [[https://gitlab.ucc.asn.au/UCC/tech-todo-list/-/issues/32]]
      - This issue is to keep track of any async secretarial duties detailed ahead
      - Type `/clone` into the "Write a comment" box as a "quick action"
      - Reopen it and assign it to yourself
      - Update the title for today
  - [ ] ACTION: Save and commit the minutes of today's meeting, during the meeting; and at the end
  - [ ] ACTION: Set and (later) verify reminders of next meeting:
    - [ ] Promptly update agenda.next with the TIME/DATE/VENUE
    - [ ] Perform initial curation of agenda.next, and move any longstanding action items out of it and into GitLab (see Action Items section below)
    - [ ] Update the crontab: `motsugo# crontab -e`
    - [ ] Check at T-7days that the notice really went out, fix for T-4days if needed
- [ ] Everyone, before next meeting: Curate `agenda.next`, and move any items you think should be tracked as GitLab issues into GitLab issues, as above

## Optional items - choose at the start of the meeting
- [] Ethical guidelines
- [] Monitoring
- [] Backups
- [] Password rotations
- [] New members
- [] Quick check of ChangeLog
- [] Lessons learnt

## Current Action Items
### Boilerplate
- Now maintained in GitLab at [[https://gitlab.ucc.asn.au/UCC/tech-todo-list/-/issues/]]
- Briefly discuss anything in here that's worth discussing, but don't spend too long rehashing unresolved issues that have already been discussed ;)
- Going forward:
  - New actions: when new ACTION items arise, put them in the minutes once, but add to GitLab
  - Ongoing actions: don't keep them in the agenda, unless they definitely need to be discussed in the next meeting
  - Completed actions: mention in the agenda that it's been completed, and and briefly discuss if need be

### Action items to discuss

## Known Broken Stuff

- Backups
  - space #alerts 2022-04-23 , 2022-04-26 ...
    - https://discord.com/channels/264401248676085760/671351866071842836/967424010717642843
  - http://uccmonitor.ucc.asn.au:3000/d/V3mRaxPZk/ucc-overview?orgId=1&viewPanel=8&from=now-30d&to=now
  - hostperson email
- `May 17 To wheel at ucc.gu (   6) [wheel] [murasoi] rancid checkout failed`
  - https://svn.ucc.asn.au/rancid/ucc/configs/ `ERROR 400: Bad Request.`

## Matters arising previously

## Extra items (rename/refile as appropriate)

### Power outage/RCD testing 2022-05-17T0945 - lessons learnt?
- Circuit by circuit, dual-PSU servers seem to have survived happily
- UPS?
- `murasoi` booted happily about 11:01
  - Low power 1-armed backup router? share `ucc-fw` config?
- > [MPT]: Mudkip has a disk fault and is stuck in initramfs
  - `fsck.ext4: unable to set superblock flags on /dev/mapper/pve-root`
  - > [BOB]: we should totally upgrade it to a Samsung EVO SD card - they have wear levelling (and a bunch of other nice features)
  - ...and a warm-spare SD-card image?
- > [MPT]: machop wasn't coming back up: Somehow walnut's port 11 got put on VLAN1 instead of VLAN1 native when the switch rebooted??? So *that's* fixed now
- that got enough ceph OSDs online to start a major recovery/rebalance/backfill:
  - > [NTU]: For a lot of that time, none of the vmstore-ssd devices were writable, they were undergoal and ceph was frantically backfilling - but it succeeded!
- > [NTU]: With a solid 5/6 quorum of running hosts, seems safe to manually migrate hosts (without local storage) from an offline host e.g. mussel: magikarp:/etc/pve# mv /etc/pve/nodes/mudkip/qemu-server/118.conf /etc/pve/nodes/magikarp/qemu-server/ https://forum.proxmox.com/threads/migrate-vm-from-offline-node.30167/
  - > [MPT]: apache2 failed start because it doesn't properly wait for AD users to become available; running a manual start now
- > [MPT]: samba-ad-dc started manually on samson. It seems like it doesn't autostart correctly after boot, but started first try manually
- > [NTU]: merlo booted unhappily about 10:55, no NFS mounts happened.
  - `mount -av` worked fine after boot
  - `systemd` should notice FS type `nfs` and not need an explicit `_netdev` option... ?
    - get an autofs config into a SOE role instead?
  - rejig `/etc/rsyslog.conf` all the same?
  - manual post-outage procedure:
```
cmp/diff merlo:/var/log/dispense and /home/other/coke/cokelog
merlo# systemctl stop rsyslog
Carefully! tail -n ... /var/log/dispense >> /home/other/coke/cokelog
merlo# systemctl start rsyslog
```

*Meeting closed HH:MM*

----

```
# https://demo.hedgedoc.org/Hlsapf47RsqpgIjqLVfMUw
cd /home/wheel/docs/meetings
HEDGEDOC_SERVER=https://demo.hedgedoc.org /home/wheel/bin/hedgedoc export --md Hlsapf47RsqpgIjqLVfMUw ./$(date +%Y-%m-%d).txt
git add ./$(date +%Y-%m-%d).txt
git commit -m "Tech meeting minutes $(date +%Y-%m-%d)"
```

<!-- vim: tabstop=2 softtabstop=2 shiftwidth=2 expandtab
-->
<!-- Local Variables: -->
<!-- tab-width: 2 -->
<!-- End: -->


More information about the tech mailing list