[tech] Downtime & R.I.P. Maltair (again!)

Felix von Perger frekk at ucc.asn.au
Wed Feb 27 09:42:55 AWST 2019


Hi all,

Unfortunately our dear Maltair has suffered the same fate as it did just 
a few months ago, again - in other words, the builtin regulator on the 
/replacement/ motherboard has failed in the exact same manner as in the 
original system. Given that the recommended firmware upgrade had been 
applied, which was supposed to prevent (or at least reduce the chance 
of) this particular mode of failure, and it cooked itself anyway, I 
would imagine we should avoid purchasing more secondhand IBM x3550 M4 
servers.

Since Maltair is no longer in service, we have a noticeable decrease in 
available VM hosting resources (particularly RAM), and it would be nice 
to still try to re-use some of the still-functional hardware from the 
now-dead Maltair in another server of similar vintage (but preferably of 
a different brand).

Something like a Dell R620 
<https://www.ebay.com.au/itm/Dell-R620-NX3300-SERVER-2x-E5-2640-2-5ghz-6C-32GB-Ram-H310-2x-300gb-SAS-2x-PSU/223130228742> 
or R720 
<https://www.ebay.com.au/itm/Dell-PowerEdge-R720-E5-2640-V2-2GHz-NO-RAM-NO-HDD-Server/153302402555> 
would probably be worth considering - although it would definitely pay 
to check that the exact CPU and RAM types from Maltair will be 
compatible with whatever we purchase.

Alternatively we could splash out a bit and invest in something new, 
suggestions are welcome.

Best regards,

Felix von Perger [FVP]
UCC President & Wheel Member

On 14/8/18 6:25 pm, bob at ucc.gu.uwa.edu.au wrote:
> Update: I managed to find the VT261 on the mobo last night. I looks like
> the one in the aliexpress link in my last email. I've ordered a couple off
> aliexpress, but they will take a few weeks to get here. When they arrive,
> we have some Damn Finnicky soldering to do (it's surrounded by 0402 sized
> components). Oh, and [TPG] had a chat to a rep from Maxim, and apparently
> datasheets for the Volterra VT261 were never made public, so we kinda just
> have to hope that this chip is the thing that's broken.
>
> Andrew Adamson
> bob at ucc.asn.au
>
> |"If you can't beat them, join them, and then beat them."                |
> | ---Peter's Laws                                                        |
>
> On Thu, 9 Aug 2018, Bob Adamson wrote:
>
>> Felix and I de-racked maltair tonight and I pulled its mobo out. The Lenovo
>> page lists only a "VT261" 5V regulator as probably being damaged, so I
>> figured we should just be able to find and replace it. Famous last words.
>>
>> Google turns up VT261WFQR-ADJ as (the only) possible candidate for what
>> VT261 refers to. Unfortunately, googling further for the VT261WFQR-ADJ
>> datasheet only shows up a Maxim datasheet, which makes sense since they
>> bought out Volterra in 2013. Just to make things really interesting, the
>> kynix site  (the only result that has a datasheet) links to an Intersil
>> datasheet: https://www.kynix.com/uploadfiles/pdf8827/ICL7660ACBA-T.pdf  .
>> The maxim site was a bit more forthcoming once I knew a newer part number (
>> https://datasheets.maximintegrated.com/en/ds/ICL7660-MAX1044.pdf ), but I
>> didn't have any luck looking for 7660 on any of the mobo chips.
>>
>> More googling later, and even turning to countries that have a robust market
>> for *ahem* aftermarket goods, shows up this:
>> https://ru.aliexpress.com/item/VT261WF-VT261MF-VT261WFQX-ADJ-QFN-1-integrate
>> d-circuit/32818058390.html , which is possibly-maybe the thing we should be
>> looking for on the mobo. There were a few shiny chips on the board, but I
>> need to return  at a later date with my shiny new USB microscope to check
>> further.
>>
>> If anyone else wants to take a look at it, please be careful about flexing
>> the board while handling (it's very big) and also be careful not to knock
>> off any components (they're very small, and I mean like >.< this big).
>>
>> Oh, and I manually migrated all network-stored VM's to medico today, and I
>> believe Felix did the remaining locally stored VM's this evening.
>>
>> --Bob
>>
>> -----Original Message-----
>> From: tech-bounces+bob=ucc.gu.uwa.edu.au at ucc.gu.uwa.edu.au
>> [mailto:tech-bounces+bob=ucc.gu.uwa.edu.au at ucc.gu.uwa.edu.au] On Behalf Of
>> Felix von Perger
>> Sent: Wednesday, 8 August 2018 11:51 PM
>> To: tech at ucc.asn.au
>> Subject: [tech] Downtime & R.I.P. Maltair
>>
>> Dear tech subscribers,
>>
>> For those of you who have not been following the committee discussions of
>> the last week or so, there was a total service outage this morning between
>> 8:00 and 10:00 which was due to RCD testing in Cameron Hall.
>> Apologies for any inconvenience.
>>
>> Sadly, in the process of turning things back on after the power was
>> restored, an IMM2 firmware bug on Maltair seems to have rendered it
>> permanently unbootable (see
>> https://support.lenovo.com/au/en/solutions/ht118532). [CFE] performed a
>> firmware upgrade this evening to the latest version (v6.8) from v4.3 however
>> it seems like the damage has already been done and either the entire
>> motherboard or the builtin 5V voltage regulator will need to be replaced or
>> repaired.
>>
>> Due to Maltair being presently out of action, additional downtime may be
>> experienced for certain services that were previously hosted on Maltair.
>> Since Maltair accounted for most of our RAM availability, member VMs with
>> large RAM requirements may remain powered off for the time being or have
>> their maximum RAM reduced.
>>
>> Any suggestions for replacement hardware for Maltair are welcome. The
>> existing server is a 1RU IBM System x3550 M4 (7914/7915), and it is likely
>> that the majority of its parts (CPU, RAM, RAID, 10Gb NIC, PSUs) are still
>> functional despite the system board being fried.
>>
>> Best regards,
>>
>> Felix von Perger [FVP]
>> UCC Secretary & Wheel Member
>>
>> _______________________________________________
>> List Archives: http://lists.ucc.gu.uwa.edu.au/pipermail/tech
>>
>> Unsubscribe here:
>> http://lists.ucc.gu.uwa.edu.au/mailman/options/tech/bob%40ucc.gu.uwa.edu.au
>>
>> _______________________________________________
>> List Archives: http://lists.ucc.gu.uwa.edu.au/pipermail/tech
>>
>> Unsubscribe here: http://lists.ucc.gu.uwa.edu.au/mailman/options/tech/bob%40ucc.gu.uwa.edu.au
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20190227/1c8a553f/attachment-0001.htm 


More information about the tech mailing list