Jump to content

Harmony central outage


Dendy Jarrett

Recommended Posts

  • Members

The below is taken from the link

 

 

"Our apologies, we've been down for two days due to a catastrophic hardware failure at the server companythat hosts Harmony Central. The good news—we’re back. The bad news is we lost some data. Luckily our software technology partner, Nucleus Creative, was able to restore the data up to January 10th, 2016. We thank you for your patience and continued support. We've been assured that the root cause has been isolated, and that this specific hardware failure will not happen again

 

Dendy Jarrett | Director Harmony Central Communities"

Link to comment
Share on other sites

  • Members

Harmony Central maintains our 15 years of database on the servers. There is no other music community that has this much data maintained. We are the oldest, so it would be natural to deduce this is why we have so much data. Removing any of that data interferes with SEO and Google search loves HC.

 

The good news is we have 99.998% of HC back. The bad news, we lost a month of good content, threads and posts. But, hey, ... we're back.

 

This was one of those "being struck by lightening twice" situations where we had a cataclysmic hardware failure that took out our back up drives (2 of them).

Stuff happens out of our control. Earthquakes, Tornadoes, Cat 5 Hurricanes. What matters is what we do with it, learn from it and move forward from it.

 

No one will ever know how much work went into the last 48 hours to get us back to where we are.

Link to comment
Share on other sites

  • Members
No one will ever know how much work went into the last 48 hours to get us back to where we are.

 

As someone who has worked in a data center, I have an idea. If the hardware failure was a hard drive (and this suggests so) then there's really nothing you can do but restore from the last backup. There's also a very good chance that the same machine hosting HC forums also hosted about a hundred other sites, and it wasn't that people were sitting on their thumbs for those two days- They were working 'round the clock to get everyone up and running.

 

TL;DR: {censored} happens, glad we're back! :D

Link to comment
Share on other sites

  • Members

Sorry but "this specific hardware failure will not happen again" isn't that reassuring because it implies that someone somewhere is busy inventing new hardware failures. As someone who used to build and repair PCs as a small business, I know just how many ways computers can fail. A big round of applause to the folks behind the scenes but pardon me if I'm still a wee bit nervous.

Link to comment
Share on other sites

  • Members

Deep

 

I hear you, however think for a moment ...

All the Amazon servers have a cascade failure. It is hard to imagine, but it IS possible. Most people don't realize how much of the internet runs on Amazon servers (incidentally we don't ... but ...)

 

Imagine Facebook having a cascade failure. Would they deem it as important as HC did to ensure we restored 15 years of data? Would they be able to restore?

 

Im not saying the couldn't but SCHTUFF happens that is outside some control,

 

What we ARE saying is that post mortem analysis will be performed and we will be implementing whatever recommended safety measures that come out of the findings. If that means redundant times 4, then that is what we will do.

 

No one on TEAM HC is enjoying having to rebuild weeks worth of content, but we believe the community is worth it and that music is worth it.

 

D

Link to comment
Share on other sites

  • Members

 

1 production server (i.e. H-C) with RAID-5 or RAID-6 protection and at least one hot spare drive.

 

1 smaller dedicated backup server in same equipment rack as production server, on same network as production server, with dedicated 1 Gb/s network connection between servers.

 

Backup server has two very large mirrored (RAID-1) hard drives with a hot spare drive.

 

Dedicated backup server performs daily server backups of all file systems -- incremental, weekly differential, and monthly full backups as production server stays online.

 

Daily (at least) database snapshots are performed.

 

Backups and snapshots stay on dedicated backup server as a mirror image of each backup is trickled off asynchronously to "cheap" cloud storage (that way, you have at least two full copies of backups).

 

Backups are kept for at least 90 days if not a year.

 

 

 

Link to comment
Share on other sites

  • Members

It is as mysterious to us as to you, but we are being as transparent as we can be based on the pre forensic info we have. We have always tried to be forthright with you guys. The site has more support and better support right now than we ever have. We don't take an outage or loss of data lightly. And we will take all steps to prevent it in the future. We are, however, just people like you.

Link to comment
Share on other sites

  • Members

Dendy...Please don't ever say the phrase "Pre Forensic" again. It creeps me out.

 

I'm just glad the server is back up. And frankly...there were too many repeats happening in "Music Association Challenge". I'm personally grateful for the freshen up.

 

Link to comment
Share on other sites

  • Members

If you have multiple , simultaneous, drive failures it is almost certainly power frying the controller boards. In this case the data is still on your discs and can be restored for a few hundred bucks.

if some numpty formatted them you will get 99% back with a surface scan on the discs.

if somebody is telling you otherwise i would question those responsible, something does not sound right.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...