The "if it isn’t broken, don’t fix it" Mentality
Today I realised that every time I hear the phrase if it isn’t broken, don’t fix it I feel kind of uncomfortable. I fully understand the notion: the process yields the results we want, so why bother?
As it is being said, I detect the whatever works attitude emanating from the person saying it. No concept of continuous improvement. No interest in reflecting. Not even 60 seconds invested to consider whether the process is efficient or could be improved. It most cases it’s almost a canned response to any comment referring to the efficiency of a process that is considered to be working.
Maybe the process breaks once per month, causing Fred to stop what he’s doing (see Gerald Weinberg’s rule of thumb) and fix it. But, who cares, it works, and Fred doesn’t mind. Paper-based filing systems, typewriters and horse and cart worked but this didn’t exclude them all from being significantly improved by someone with the right mindset.
Applying the if it isn’t broken, don’t fix it mentality to everything is naïve; you end up with lots of suboptimal processes. Based on the fact that you never get it right the first time, processes need to evolve, and for this to happen a culture of reflection and improvement must exist.
To consider something to be either working or broken is a simple model which doesn’t leave much room for improvement. Considering the degree to which something is working is a better model that puts your mind in the right place. How reliable is the process? Can it execute quicker? Can the execution cost be reduced? We need to think about quality, not just if by some vague definition the process can be considered to be working.
I recently finished reading The Toyota Way. In the book, Jeffrey Liker explains the management philosophy behind Toyota. Interestingly, one of the principles within Toyota is to “build a culture of stopping to fix problems, to get quality right the first time”. Liker goes on to explain the apparent productivity paradox:
Toyota management say that it’s OK to run at less than 100% of the time, even when it’s possible to run full time, yet Toyota is regularly ranked amongst the most productive plants in the auto industry. Why?
Investing time and effort into quality from the start sounds like the slower and more expensive method to most people. Let’s just get something working and improve it later. The problem is, too many people don’t improve it later. The If it isn’t broken, don’t fix it syndrome starts instead. Liker explains how Toyota does it:
Because Toyota learned long ago that solving quality problems at the source saves time and money downstream. By continually surfacing problems and fixing them as they occur, you eliminate waste, your productivity soars and competitors who are running assembly lines flat-out and letting problems accumulate get left in the dust.
The issue of letting quality problems accumulate isn’t exclusive to the auto industry. In the software industry it results in poorly written systems being built on top of other poorly written systems that, while completed sooner than a higher quality system might have been, require increasing amounts of human attention as the accumulation of quality problems becomes too big to ignore; too big to be considered working. Needless to say, the cost of running these systems starts to increase too.
So from now on, if it isn’t broken, don’t fix it is banned. Stop using it. It makes you sound like you don’t care even enough to look for improvement. Things are not broken or working; they are broken or working to a certain degree and may be exposing opportunities to be improved and have their business value increased. It’s also a mistake to assume that business value can’t be increased from existing work (or processes) rather than business value just coming from new work.
You can be the one who thoughtfully creates work to a high standard, always looking to improve yourself and increase the business value in suboptimal existing work. Alternatively, be the one who doesn’t care enough to even look for improvement opportunities and would rather just churn out first versions and move on – leaving a legacy of low quality behind you. It’s your choice.
If it isn’t broken, fix it anyway.


I agree things are not black and white, it’s not all working or all broken.
so it’s not either never fix or fix all the time, the choice requires to use a brain in the context.
I have two problems with your post.
First, you begin by pointing to something that might be able to be made more efficient or "being able to be improved" (very broad statement) as being broken. You finish by talking about quality. Very different subjects, and you segue into quality by talking about something being broken, and the phrase "But, who cares, it works, and Fred doesn’t mind." No, in that case it IS broken and SHOULD be fixed, another very different situation.
On quality, something is not higher quality because it was "improved". You have metrics to start with, and if it meets those metrics it is the highest quality needed for that product, the purpose it serves, and the stakeholders. For example, I consider BMW to be higher quality than, say a Chevy Sprint. I can probably improve the Sprint to the quality of the BMW, but that would negate what the Sprint is put into the market for-to provide a low cost vehicle. Even though it costs less to design in quality, quality still costs.
If a "thing" does not meet the quality standards that were agreed upon(you have to have standards, as Toyota does, or you would not know that the "thing" is not meeting them) it is broken. Absolutely you must stop and fix it. Better yet, plan in the quality before you begin to produce, as Toyota does, and you have to stop and fix quality issues less often, it’s much cheaper to plan it in then control it in.
Second, as far as efficiency, just because you believe that something can be made more efficient, doesn’t necessarily mean it should be. For example, you have product abc that is produced by processes a, b and c. Process a and c can produce enough parts to assemble 1000 abc’s. You estimate that you can increase a and b’s efficiency by 5% each by adjusting a tooling. Process b can produce enough parts to assemble 700 abc’s, no more, no less, because of a technical restraint that cannot be controlled out without driving costs through the roof, making it unprofitable to produce above the 700 units. Anyway, you are meeting market demand for product abc with your 700 unit output.
Should you spend the money to make a and c more efficient? Why? You can still only produce 700 abc’s, so you will not produce more product to sell, and the extra capacity in a and c will only result in either people standing around for a longer duration (not that there is anything inherently wrong with that, but we are already only operating at 80% capacity of those two resources, better to invest your time and money in something else), or excess inventory of parts a and c. Not good to drive up costs and inventory charges in the face of efficiency, if it isn’t going to drive up sales or revenues! Remember, we are constrained by the market as well.
Anyway, my point is that sometimes it really is better to leave well enough alone. Sometimes the product really is good enough for the purpose it serves.
I suggest you read "The Goal" by Eli Goldratt about optimization and theory of constraints.
In web discussions "if it isn’t broken, don’t fix it" may not be a real system maintenance strategy but more like a smart-ass phrase to throw in when someone broke something because of an upgrade or other change. The comments happen after the fact and very often sound like outsiders laughing at some unlucky guy. At least this it how it often seems like.
I’m sure that some/many have real maintenance strategy like this and it may be very useful to keep things working reliably. It all depends on what is the system that is being "fixed" (or not fixed).
Joe,
Thanks for the reply. I’m pleased the post caused a couple of people to voice their opinion as that’s why I bother to write.
As you mention (and I agree), things must have metrics in order for us to measure any potential improvement. Sometimes these metrics are obvious, like "it requires human intervention once per month" means an automation metric of 30/31. However, it may be the case that human intervention for that 1 day per month is cheaper than the cost of improving that metric to 31/31. On the other hand, it may not be.
Until you get the grey matter working on your behalf, you don’t reliably know if something can be improved under the current constraints. My underlying point is that people need to be responsible and use their brain, looking at whether something can be improved or not (and of course being pragmatic enough to know if the degree of improvement is worth it for the business).
I’m not advocating that everything should be improved, just that people should have an attitude where they want to find improvement opportunities. This is the attitude that people should have towards code in terms of refactoring.
If, for example, you were to tell me "if it ain’t broken" with regards to a process that you had measured and found no viable improvements I would be as happy as a dog with two tails. If you rummaged through a canned response bag for the phrase, as an excuse to not use your brain I’d write a post like this one :)
Thanks for the book suggestion. Incidentally, I’m reading it at the moment and it’s good to get a positive review.
If you’re not actively trying to find problems that may affect your system, you’re not doing your job. Some problems accumulate to a breaching point where by the time it hits, it’s too late. It’s like driving on the road and not taking into account the other drivers. When the accident occurs, you can rationalize all you want. But the reality is you weren’t aware of what’s going outside of your immediate surroundings.
I used to go checkstyle/findbugs/pmd for big java project. it should have improved quality, however when you change a lot of code you might introduce new bugs. that thing happened to me, was it worth it? no, i got nothing for improving the code and yelled at because of bug I introduce. Dont fix it if it aint broken.
raveman,
Sadly it’s not always possible to make changes to a system without accepting a certain amount of risk that your changes may break something else. This is because such systems have not been designed with change in mind (i.e. decoupled, testable and following solid design principles) and are not supported with test suites to verify their current behaviour automatically.
Faced with situations where the system that needs to be changed (for example, to fix a severe bug) introduces a high risk of regression, the first and most obvious improvement you can make is to fix this problem.
Deploy a test version that allows you to manually verify your changes for any regressions before they go live and get you yelled at. Write integration tests that capture the expected behaviour of parts of the system, with a view to building a safety net around the application so that future changes don’t require as much manual effort in verifying no regressions have been introduced.
Ironically, if you think you shouldn’t improve the system because any attempts to improve it usually break it, you probably have more potential improvements than expected.
its impossible to unit test big system that has no unit-test and nobody cares about your unit-tests. plus how can you unit test something you dont understand? understanding 10 big classes to change if to switch or something like that is pointless. so unit-testing not always work.
Im just saying that you should think twice before you do any refactoring, tests help a lot, but they are not always an option.
The thing I broke was because of someone else using some magic code(not even in Java, but in FreeMarkers template). Only good integration tests could find that bug. Plus you dont have testers after refactoring(why test if nothing has changed?) and automatic UI testing is still not used.
While I don’t have much problem with most of the substance of your post, I do have a problem with the premise (at least of the title). You said:
[quote]I fully understand the notion: [i]the process yields the results we want, so why bother?[/i] [/quote]
But that’s not how I hear or use the phrase (usually the more vernacular, "If it ain’t broke, don’t fix it.") I use it in response to proposals for changing something with no thought as to what you actually hope to improve by that change.
Want to refactor that long method? Fine, but why? Are you trying to do something that would be easier if that method were broken up into smaller methods? Are you trying understand what the method does, but it’s really incomprehensible? Those sound like good reasons. Did you happen to run across it while perusing some old legacy code that hasn’t been touched in five years and you just hate long methods? Then it ain’t broke.
Want to upgrade to Tomcat 7? Fine, but why? Do you do frequent web application reloads and you’re tired of the memory leaks? Sounds promising. Are you just embarrassed that you’re still on Tomcat 4.1.x and you’ve heard that it’s going to be cool? Then it ain’t broke.
raveman,
We’re kind of getting away from the point of the post, but unit (as well as integration) testing could still provide value in the situation you describe.
But I agree; you certainly do need to think before refactoring and this is my point exactly. We need to think before throwing out a canned "if it ain’t broke" phrase as there may be underlying problems that just haven’t had time to surface yet (any change in your example).
Jason,
Don’t get me wrong, I wouldn’t want anyone simply making changes for no apparent reason. By using the word "improve" I was hoping to imply that the consequence of making the said change was already established as a positive one.
In such cases I get frustrated when an actual improvement won’t even be listened to and "if it ain’t broke" enters the debate.
I’m really just rallying against people who use the phrase in the opposite way you mention. When the phrase is used as a means of preventing *any* change regardless of value I think it shows an aversion to change, which I’m sure we agree is undesirable.
Thanks for your thoughts.
Somehow disagree. The systems you where using as an example where broken. Just like a car that is making funny noises. It might be running now, but that doesn’t mean that is not broken. Thus the if is not broke, don’t fixed, do not apply to some of your examples. However it do apply to systems that where at one time a model of efficiency, but technology made them obsolete.
As of doing it wright the first time. most of the time is not even possible. For example: A company, in the past, might had an excellent and efficient file system based on paper. Then the 1980 came, and the excellent and efficient file system became obsolete over the night. So the company invested on a DOS file system. Then came Windows, and DOS was killed. So the company then needed to change their system. Then came the intelligent cellphones and who knows what the future might bring.
My point is that is not possible to make a system right from start. And what is efficient and excellent now, might be burdensome and inefficient in the future. The creation of systems is an evolutionary process. And new technology does not always bring more efficiently (before windows, business never had the problem of workers surfing the internet for example)
Hey Martin, I really like your website..your blogs are extremely interesting and obviously causing a stir.
Keep it up.
P.s. I would probably take toyota out as the example of high quality…since their marketing s* storm they are no longer kings of quality
I actually think Toyota is the best example here, especially after everything that happened with their breaking problems. With Toyotas management don’t fix it mentality, Toyota got itself into a lot of trouble! If only someone at Toyota was allowed to just “fix it” before people started dying…
In my experience with people that use the “if it isn’t broke don’t fix it” phrase, it usually was a case like Toyotas. There usually was a real problem but it just didn’t manifest itself to the point of it being un-ignorable. Later, those problems caused us a lot more effort and stress to fix. Fixing ahead is a lot easier than fixing a process that everyone relies on after it stopped working…