Smart People Make Smart Mistakes
Topic: Martha's Diary
Smart people don't make silly mistakes. That is good.
Smart people make smart mistakes that do not destroy their work immediately. Such mistakes have a softened impact and ruin everything later - with accumulated effect.
A couple of weeks ago, a developer in my project submitted a pull request with a code that would override the delayed payment flow and charge the user's card instantly.
The new code was clean. It did not break any old tests - they all were green. The old payment flow still existed with all its automated tests.
However, the good old delayed payment procedure would start under no circumstances - the new flow would be faster to jump in and steal the show.
It would be quite a failure if that code got into production.
Me and another developer (there were four backenders in the project) immediately yelled "No-o!!" and expressed a lot of frustration about the appearance of this code. We wondered about the reasons.
The author pointed to a ticket with a vague description saying something like: "Sometimes our users are not charged… So we want them to be charged if this flag is set".
The story could be understood like there was a certain scenario - that our product manager did not know, but wanted us to fix - in which the usual flow did not have the desired effect.
Or it could be understood directly: charge the user if the flag is set. This is what has been done, the author said.
To complete his mission, he even reached out to the product manager with a question: "Hey, do I understand correctly that you want payments to be taken always if the flag is set?"
The answer was: "Yes, the payment should always be taken if the flag is set".
"You see?", the author of the code told us. "This is exactly what I have done".
"Yes, of course", we said and declined the pull request. Later, we had a call with the product owner, discussed the problem, and things somewhat cleared up. Of course, the straight solution that nails the complex flow and turns five consequent and asynchronous charge attempts into one could not be the right path.
We declined that code, preventing it from being released. Production was saved.
Then I started thinking, asking myself questions.
Why did the issue happen? Was it miscommunication? Of course. I would even say, two or three sessions of good, diligent miscommunication. Both the developer and the product manager assumed they heard something positive from the counterpart. In reality they talked about different ideas. It is called ‘confirmation bias’.
What else caused the problem? Lack of knowledge? Yes, sure. The developer did not know all code paths and the true purpose of the flag in question.
Lack of internal communication? The author of the code could discuss the user story before working on it.
Okay, the good thing is that the code did not get into production. Two of us in the team were attentive enough to catch the issue and prevent it from being released.
What if we were out sick? Or not attentive enough? What if no one cared about other developers' commits?
If that happened, we had another layer of protection. The problem would be caught by quality assurance on the test server. Our QA engineer Alexandra was very attentive, she knew the usual payment flow, and she could figure out that the new feature disables it.
Yes, this is how the issue would be caught. Definitely. Alexandra has always been our final barrier, she could protect the project.
Unless she was sick while the project owner was in a hurry. Then he would ask someone else to test the build.
Or if Alexandra left the project. Then no one would catch the problem.
What would happen then?
Users would be prematurely charged, they would not be happy, the managers would complain, the project owner would complain, we would revert the release blaming one another.
That would not be a perfect situation, but it would be much better than having a less visible bug, a bug that does not appear frequently, and impacts only a small part of users. These are harder to catch, track, reproduce, and fix - but are much easier to accumulate over time.
Alright, moving on. Under what conditions would the procedure be more foolproof?
The problem would not appear if the product manager wrote a better ticket or gave more accurate answers to our questions. I believe he simply did not know the correct answer. Most likely, he acted as an intermediary person retelling the story from the project owner's words.
So, the issue would not appear if the project owner explained the feature and answered questions himself? Without the intermediary person?
Oh, yes. For sure. Probability of confirmation bias would be very low.
Okay, then how do we imagine conditions under which we have more problems?
More intermediary people, less quality assurance. Sounds like a perfect formula.
It's a paradox. It looks obvious that each project usually needs more qualified engineers than managers. But often, project owners prefer to spend money on the latter.
Why? Because while a project benefits from engineering, a project owner benefits from having managers - they give him some air to breathe.
Smart project owners think they are smart enough to hire smart product managers.
They get away with that while the product managers are balanced out by savvy engineers or responsible people of other kinds.
As soon as the balance is upset - the project starts accumulating issues.
Hey, I am getting too dark. Looks, I have been fantasizing for some time and have drawn a scary picture. Thank God, it never happens in my projects!
Check out my article on the challenge of distinguishing skilled developers from less experienced ones in the business world.