824
Single point of failrule
(lemmy.blahaj.zone)
Be sure to follow the rule before you head out.
Rule: You must post before you leave.
Thank you, I am fucking sick of people passing this comic around in relation to the Crowdstrike failure. Crowdstrike is a $90bn corporation, they're not some little guy doing a thankless task. They had all the resources and expertise required to avoid this happening, they just didn't give a shit. They want to move fast and break things, and that's exactly what they did.
Off topic but that "move fast and break things" line from Zuck irks me quite a bit. Probably because it's such a bratty corporate billionaire thing to say
It's only ok to break things internally. Never push broken code to the customer.
It works in most software because the cost of failure is cheap. It's especially cheap if you can make that failure happen early in the development process. If anything, I think the industry should be leaning into this even harder. Iterate quickly and cause failures in the staging environment.
This does not work out so well for things like cars, rockets, and medicine. And, yes, software that runs goddamn everything.
The problem is that this strategy is becoming more popular in physical product development, for things that we’ve known how to make for decades.
You don’t need to move fast and break things when you’re making a car. We’ve been making cars on assembly lines for a hundred years, innovation is going to be small.
Same thing for rockets. We put men on the moon 50 years ago for fucks sake. Rocketry is a well understood engineering field at this point. We know exactly how much force needs to exerted, we know exactly the stresses involved. You don’t need to rapidly iterate anything. Sit down, do the math, build the thing to spec, and it fucking works: see ULA, ESA, and NASA who have, all in the past few years, built rockets and had them successfully complete missions on the first launch without blowing up a bunch to “gather data”
Move fast and break things is for companies that have crackhead leadership who can’t make up their mind about what a product should do. It should have no place in real world engineering, where you know what your product is going to be subject to.
“Looks at SpaceX”, Iterate quickly and break things can work for rockets, it just depends on the development phase and the type of project. I wouldn’t “iterate quickly” with manned, extra terrestrial or important cargo missions.
But it can be used for the early development of rockets. Space X had a deep well of proven technology to draw upon during the development of the Falcon rocket. They put the tech together and iterated quickly to get a final product.
Blue Origin as well as the Artemis program both use traditional techniques with similar proven technologies. I’d argue they aren’t as successful or were never intended to be successful (Artemis is just a jobs program for shuttle contractors at this point).
Just ask NASA what they think about break things in unmanned vs manned programs.
Better yet, ask nasa, ULA, and ESA about how they needed to move fast and break things for their rockets that worked flawlessly on the first launch while actually fulfilling a mission.
I understand what you're saying about failing early. That's a great strategy but it's meant to apply to production software. As in, your product shouldn't even start up if critical parts are missing or misconfigured. The software should be capable of testing its configuration and failing when anything is wrong, before it breaks anything else. During the development process, failing early also speeds up iteration cycles, but again, that's only when it's built into the sw runtime that it carries with it.
"Fail early" can also mean your product stops working and shuts down as soon as its environment changes in a disruptive way; for example, if you're using a database connection, and the database goes down, and you can't recover or reconnect, you shut down. Or you go into read-only mode until your retries finally succeed. That's a form of "fail early" where "early" means "as soon as possible after a problem arises".
You don't want your development processes to move fast and break things. If your dev and staging environments are constantly broken because you moved fast and broke things, you will ship broken software. The more bugs there are in there due to your development practices, the more bugs you'll ship, in a linear relationship.
QA and controlled development iterations with good quality practices and good understanding by all team members is how you prevent these problems. You avoid shipping bugs by detecting failures early, not by making mistakes early.
That's an easy thing to say when you haven't laid off a ton of your workforce, might be careful operating like that the way tech has been cutting jobs lately.