|
Rick York wrote: The MTBF of SSDs and HDs are far, far lower than that of a controller.
Well that's reassuring. But it still happened, with a controller that was just a few months old, and was only seeing light use.
Rick York wrote: I consider it utterly absurd to use that as a reason to avoid using RAID.
The extra steps in setup and recovery--given the results I ultimately got--make it absurd to me. I'm glad it serves you well, but you need to lose an entire array only once to make you question the value. YMMV, and that's the beauty of it - you can make your own choices. I base mine on my experience. I've always loved the idea and it took me a long time to finally go ahead and do it, but I rather quickly backed out of the whole thing because of this.
Rick York wrote: Also - your issue applies only to a striped RAID array.
It was striped and mirrored. I wouldn't trust striped only, since you're actually increasing the chances of failure. I'm ok with striped and mirrored however since, given the number of drives involved, the odds are getting smaller that you'll get some catastrophic failure to cause them all to fail at the same time. Of course I didn't count on the controller failing, as common sense (and yourself) says, "the MTBF of SSDs and HDs are far, far lower than that of a controller". OOPS.
|
|
|
|
|
I have RAID, and it's been (mostly) good to me.
I've had two HDD failures on RAID 5 arrays, and in both of those cases I could continue working uninterrupted while ordering a new HDD and replacing the dead drive - so in that sense they were brilliant. (I'm now on my third RAID 5 NAS box.)
But ... and it's a big "but", I cannot lie ... I had a failure of software or controller, or something and lost all the partitions on the RAD array when one disk failed and upset the controller - the data was not recoverable, so 11GB vanished in a moment: RAID is not a backup, or at least not a backup you can rely upon. (Fortunately, I don't - I have air gapped backups as well.)
It's also worth remembering that ransomware will find and corrupt all connected devices - so if your RAID is online and you get hit, it will get screwed just as well as your OS and data disks. An air gap is the only real safety net there!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Quote: It's also worth remembering that ransomware will find and corrupt all connected devices - so if your RAID is online and you get hit, it will get screwed just as well as your OS and data disks. An air gap is the only real safety net there!
Helped clean up after 3 of those. Yields tears as big as horse turds.
>64
Some days the dragon wins. Suck it up.
|
|
|
|
|
Re: ransomware -- Only the AOMEI Backupper software has the password to the account with write access to the backup file server (Synology). All other access is by read only accounts, or logging into the server via its web UI, and manually entering the admin password. This seems like it should protect the server from ransomware corruption. Or have I missed something?
(And, yes, I need an air gapped copy of the backup server, eventually.)
"Fairy tales do not tell children the dragons exist. Children already know that dragons exist. Fairy tales tell children the dragons can be killed."
- G.K. Chesterton
|
|
|
|
|
I do my work with systems running virtual. W10 (2) and W11. Have 7 and 8 in mothballs if I want to test something. Oh, yeah, I have an XP system somewhere.
I back the VM folders up to my NAS weekly and then shut the NAS off. I also backup from the NAS to removable drives. I can take the VM's anywhere, install the VM hypervisor to a system and am back on line. Have a DR kit setup with software, keys, passwords and such, including hardware. NAS is TrueNAS running on a decommissioned workstation.
Murphy is out there.................... waiting.
>64
Some days the dragon wins. Suck it up.
|
|
|
|
|
This post reminded me that I'm currently living dangerously.
I've been a work-from-home solo developer for a small company for the last 15 years. One of my responsibilities is maintaining the company server. This box sits behind me in my home office and hums 24/7 doing all kinds of stuff like hosting a couple dozen web applications/services including our main domain name, and holding virtually all of our projects/code.
It has been a good machine, (aside from a HDD data drive failure several years ago) and still performs adequately. The problem is, every component with the exception of 2 6 y/o SSDs, is now around 13 years old and has been running pretty much nonstop for time. I actually have plans to migrate most of it's responsibilities to the cloud but haven't decided yet what to do with the code. I like the idea of cloud-based, but I just don't trust it.
For the sake of not sounding like a complete idiot, I do perform regular weekly backups which includes powering up my laptop and letting it synchronize the folders on the server, after which it is powered off. I also do daily backups and offsite storage for databases.
So it all comes down to how much are you willing to lose, or how much trouble are you willing to go through to get everything working again. In my scenario, I might lose 6 days of work if the data drive on the server gave out. If the system drive gave out, I could replace/rebuild in a few hours and have services back in a day or two. In a previous bad experience, the data drive (a 4 y/o spinner) gave out completely. I had things back up quickly, but lost a few days of development and a few days of recovery. Not catastrophic, but not fun either...also not an option if downtime is costing you money! Good luck and thanks for letting me vent!
"Go forth into the source" - Neal Morse
"Hope is contagious"
|
|
|
|
|
Every few years, I buy a new machine (and another external drive), and anything of interest gets migrated to it. Then the old machine slowly slips from the mind, waiting to be rewakened. I also rent a dumpster sometimes.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
A friend of mine once again asked me for help with an old machine that he took over when I retired it. Now it is overheating, even when idling (and even if the OS is not yet started and you press F2 to enter BIOS setup).
We looked back, and found that I had retired it in 2010. I don't remember when I bought it. We decided that adding more fans and applying more cooling paste between the CPU and its fan wasn't worth the cost. The PC deserves to be laid to rest.
Sometimes, the fear of spending money on a new PC can go to far.
|
|
|
|
|
Member 14968771 wrote: especially in case of OS failure
OS failure would be due to an update. So you would need to keep backups before each update. Or don't update.
But perhaps you mean due to hardware failure. Then you should engage in being proactive and not reactive. So you replace parts before they can fail. So don't wait until a hard fails but instead track usage and then replace it before it reaches its maximum life time.
Same would be true for other parts in the system (and include new computers as one replacement.)
There are limits to that of course as parts become no longer available so you must adapt.
|
|
|
|
|
And it works - I now have both "ORIGINAL" and "GRIFF" to put on my vehicles ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
It didn't go boom, but didn't flew either...
"If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization." ― Gerald Weinberg
|
|
|
|
|
It went boom. Stage separation failed and the FTS systems activated.
|
|
|
|
|
It was a "rapid unscheduled disassembly"
You can't make this stuff up. Unless you work for SpaceX.
To err is human. Fortune favors the monsters.
|
|
|
|
|
No, it was "deferred success".
(That, a few years ago, was what teachers in my area had to explain failed exam results as. Seriously.)
|
|
|
|
|
Also: "The joy was not complete." One of my favorites to use (thankfully not that frequent) with child rearing.
|
|
|
|
|
RUD (Rapid Unscheduled Disassembly) has long been an 'inside joke' term in the space industry. SpaceX didn't create the term, and is very far from the first to have it apply to their operations.
|
|
|
|
|
"That the rocket got off the launch pad was a major success."
Participation trophies all around!
To err is human. Fortune favors the monsters.
|
|
|
|
|
OT: You are embedded guru aren't you?
|
|
|
|
|
I wouldn't call myself a guru. I'm still trying to get my head around ARM development, and get outside the IoT/Arduino end of things.
I'm told I'm a fast learner though, and that's what's on my plate right now.
Still, I may be able to help with your question (I saw your post downthread) just because what I do intersects with embedded principles all the time.
To err is human. Fortune favors the monsters.
|
|
|
|
|
Does that make it an "award winning rocket"?🙄
|
|
|
|
|
If nothing else, it was extremely successful in removing concrete and dispersing it widely...
|
|
|
|
|
I can’t say I have been in touch with everything taking place in the field but I haven’t heard of failed rocket launches for quite some time.
|
|
|
|
|
SpaceX certainly has them.
There was at least one other failure or sorts. Another country perhaps? They attempted to launch a satellite (spy?) and it failed to separate or failed to activate. Something like that.
|
|
|
|
|
but, but it's supposed to be simple rocket science !!
CI/CD = Continuous Impediment/Continuous Despair
|
|
|
|