About silos and hierarchies in software development

Disclaimer: this is NOT a rant about people. In most of the situations all devs I know want to deliver a good work. This is a rant about organisations imposing such structures calling themselves “an agile company”.

To give you some context: a digital product, sold online as a subscription. The application in my scenario is the usual admin portal to manage customers, get an overview of their payment situation, like balance, etc.
The application is built and maintained by a frontend team. The team is using the GraphQL API built and maintained by a backend team. Every team has a team lead and over all of them is at least one other lead. (Of course there are also a lot of other middle-management, etc.) 

Some time ago somebody must have decided to include in the API a field called “total” containing the balance of the customer so that it can be displayed in the portal. Obviously I cannot know what happened (I’m just a user of this product), but fact is, this total was implemented as an integer. Do you see the problem? We are talking about money displayed on the website, about a balance which is almost never an integer. This small mistake made the whole feature unusable.

Point 1: Devs implement technical requests instead of improving the product 
I don’t know if the developer who implemented this made an error by not thinking about what this total should represent or he/she simple didn’t had the experience in e-commerce but it is not my point. My point is that this person was obviously not involved in the discussion about this feature, why it is needed, what is the benefit. I can see it with my spiritual eyes how this feature became turned in code: The team lead, software lead (xyz lead) decided that this task has to be done. The task didn’t referred to the customer benefit, it stripped everything down to “include a new property called total having as value the sum of some other numbers”. I can see it because I had a lot of meetings like this. I delivered a string to the other team and this string was sometimes a URL and sometimes a name. But I did this in a company which didn’t called himself agile. 

Point 2: No chance for feedback, no chance for commitment for the product
Again: I wasn’t there as this feature was requested and built, I just can imagine that this is what it happened, but it really doesn’t matter. It is not about a special company or about special people but about the ability to deliver features or only just some lines of code sold as a product. Back to my “total”: this code was reviewed, integrated, deployed to development, then to some in-between stages and finally to production. NOBODY on this whole chain asked himself if the new field included in a public(!) API is implemented as it should. And I would bet that nobody from the frontend team was asked to review the API to see if their needs can be fulfilled.

Point 3: Power play, information hiding makes teams slow artificially (and kills innovation and the wish to commit themselves to the product they build) 
If this structure wouldn’t be built on power and position and titles then the first person observing the error could have talked to the very first developer in the team responsible for the feature to correct it. They could have changed it in a few minutes (this was the first person noticing the error ergo nobody was using it yet) and everybody would have been happy. But not if you have leads of every kind who must be involved in everything (because this is why they have their position, isn’t it?) Then somebody young and enthusiastic wanting to deliver a good product would create a JIRA ticket. In a week or two this ticket will be eventually discussed (by the leads of course)  and analyzed and it will eventually moved forward in the backlog – or not. It doesn’t matter anyway because the frontend team had a deadline and they had to solve their problem somehow.

Epilogue: the culture of “talk only to the leads” bans the cooperation between teams
at this moment I did finally understood the reason behind of another annoying behavior in the admin panel: the balance is calculated in the frontend and is equal with the sum of the shown items. I needed some time to discover this and was always wondering WTF… Now I can see what happened: the total in the API was not a total (only the integer part of the balance) and the ticket had to be finished so that somebody had this idea to create a total adding the values from the displayed items. Unfortunately this was a very short-sighted idea because it only works if you have less then 25 payments, the default number of items pro page. Or you can use the calculator app to add the single totals on every page…

All this is on so many levels wrong! For every involved person is a lose-lose situation. 

What do you think? Is this only me arguing for better “habitat for devs” or it is time that this kind of structures disappear.

Continuous Delivery Is a Journey – Part 3

In the first part I described why I think that continuous delivery is important for an adequate developer experience and in the second part I draw a rough picture about how we implemented it in a 5-teams big product development. Now it is time to discuss about the big impact – and the biggest benefits – regarding the development of the product itself.

Why do more and more companies, technical and non-technical people want to change towards an agile organisation? Maybe because the decision makers have understood that waterfall is rarely purposeful? There are a lot of motives – beside the rather wrong dumb one “because everybody else does this” – and I think there are two intertwined reasons for this: the speed at wich the digital world changes and the ever increasing complexity of the businesses we try to automate.

Companies/people have finally started to accept that they don’t know what their customer need. They have started to feel that the customer – also the market – has become more and more demanding regarding the quality of the solutions they get. This means that until Skynet is not born (sorry, I couldn’t resist 😁) we oftware developers, product owners, UX designers, etc. have to decide which solution would be the best to solve the problems in that specific business and we have to decide fast.

We have to deliver fast, get feedback fast, learn and adapt the consequences even faster. We have to do all this without down times, without breaking the existing features and – for most of us very important: without getting a heart attack every time we deploy to production.

IMHO These are the most important reasons why every product development team should invest in CI/CD.

The last missing piece of the jigsaw which allows us to deliver the features fast (respectively continuously) without disturbing anybody and without losing the control how and when features are released is called feature toggle.

feature toggle[1] (also feature switchfeature flagfeature flipperconditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.[2]

Wikipedia

The concept is really simple: one feature should be hidden until somebody, something decides that it is allowed to be used.

function useNewFeature(featureId) {
  const e = document.getElementById(featureId);
  const feat = config.getFeature(featureId);
  if(!feat.isEnabled)
    e.style.display = 'none';
  else
    e.style.display = 'block';
}

As you see, implementing feature toggles is really that simple. To adopt this concept will need some effort though:

  • Strive for only one toggle (one if) per feature. At the beginning it will be hard or even impossible to achieve this but it is a very important to define this as a middle-term goal. Having only one toggle per feature means the code is highly decoupled and very good structured.
  • Place this (main) toggle at the entry point (a button, a new form, a new API endpoint) the first interaction point with the user (person or machine) and in disabled state it should hide this entry point.
  • The enabled state of the toggle should lead to new services (in micro service world), new arguments or to new functions, all of them implementing the behavior for feature.enabled == true. This will lead to code duplication: yes, this is totally ok. I look at it as a very careful refactoring without changing the initial code. Implementing a new feature should not break or eliminate existing features. The tests too (all kind of them) should be organized similarly: in different files, duplicated versions, implemented for each state.
the different states of the toggle lead to clearly separated paths
  • Through the toggle you gain real freedom to make mistakes or just the wrong feature. At the same time you can always enable the feature and show it the product owner or the stake holders. This means a feedback loop is reduced to minimum.
  • This freedom has a price of course: after the feature is implemented, the feedback is collected, the decision for enabling the feature was made, after all this the source code must be cleaned up: all code for feature.enabled == false must be removed. This is why it is so important to create the different paths so that the risk of introducing a bug is virtually zero. We want to reduce workload not increase it.
  • Toggles don’t have to be temporary, business toggles (i.e. some premium features or “maintenance mode”) can stay forever. It is important to define beforehand what kind of toggle will be needed because the business toggles will be always part of your source code. The default value for this kind of toggles should be false.
  • The default value for the temporary toggles should be true and it should be deactivated on production, activated during the development.

One advice regarding the tooling: start small, with a config map in kubernetes, a database table, a json file somewhere will suffice. Later on new requirements will appear, like notifying the client UI when a toggle changes or allowing the product owner to decide, when a feature will be enabled. That will be the moment to think about next steps but for the moment it is more important to adopt this workflow, adopt this mindset of discipline to keep the source code clean, learn the techniques how to organize the code basis and ENJOY HAVING THE CONTROL over the impact of deployments, feature decisions, stress!

That’s it, I shared all of my thoughts regarding this subject: your journey of delivering continuously can start or continued 😉) now.

p.s. It is time for the one sentence about feature branches:
Feature toggles will never work with feature branches. Period. This means you have to decide: move to trunk based development or forget continuous development.

p.p.s. For the most languages exist feature toggle libraries, frameworks, even platforms, it is not necessary to write a new one. There are libraries for different complexities how the state can be calculated (like account state, persons, roles, time settings), just pick one.

Update:

As pointed out by Gergely on Twitter, on Martin Fowlers blog is a very good article describing extensively the different feature toggles and the power of this technique: Feature Toggles (aka Feature Flags)

Continuous Delivery Is a Journey – Part 2

After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)

The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).

This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.

Successful push leads to a new version of my service/pkg/docker image

At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.

Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)

Continuous delivery to the first environment including the execution of first acceptance tests

At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).

This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.

These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.

At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!

Good developer experience also means simplified continuous deployment

I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.

What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.

Deploy manually the latest resources from integration to the review stage

If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.

Roll out, test and eventually roll back the system ready for review

I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.

Every graphic is realized with plantuml thank you very much!

to be continued …