Tl;dr: Continuous integration and delivery are not about a pipeline, it is about trust, psychological safety, a common goal and real teamwork.
What is needed for CI/CD – and how to achieve those?
No feature branches but trunk-based development and feature toggles: feature branches mean discontinuous development. CI/CD works with only one temporary branch: the local copy on your machine getting integrated at the moment you want to push. “No feature branches” also means pushing your changes at least once a day.
A feeling of safety to commit and push your code: trust in yourself and trust in your environment to help you if you fall – or steady you to not fall at all.
Quality gates to keep the customer safe
Observing and reducing the outcome of your work (as a team, of course)
Resilience: accept that errors will happen and make sure that they are not fatal, that you can live with them. This means also being aware of the risk involved in your changes
What happens in the team, in the team-work:
It enables a growing maturity, autonomy due to fast feedback, failing fast and early
It makes us real team-workers, βwe fail together, we succeed togetherβ
It leads to better programmers due to the need for XP practices and the need to know how to deliver backwards compatible software
It has an impact on the architecture and the design (see Accelerate)
Psychological safety: eliminates the fear of coding, of making decisions, of having code reviews
It gives a common goal, valuable for everybody: customers, devs, testers, PO, company
It makes everybody involved happy because of much faster feedback from customers instead of the feedback of the PO => it allows to validate the assumption that the new feature is valuable
It drives new ideas, new capabilities bc it allows experiments
Sets the right priorities: not to jump to code but to think about how to deliver new capabilities, to solve problems (sometimes even by deleting code)
How to start:
Agree upon setting CI/CD as a goal for the whole team: focus on how to get there not on the reasons why it cannot work out
Consider all requirements (safety net, coding and review practices, creating the pipeline and the quality gates) as necessary steps and work on them, one after another
Agree upon team rules making CI/CD as a team responsibility (monitoring errors, fixing them, flickering tests, processes to improve leaks in the safety net, blameless post-mortems)
Learn to give and get feedback on a professional manner (“I am not my work”). For example by reading the book Agile Conversations and/or practice it in the meetup
– – – – –
This bullet-point list was born during this year’s CITCON, a great un-conference on continuous improvement. I am aware that they can trigger questions and needs for explanations – and I would be happy to answer them π
In the first part I described why I think that continuous delivery is important for an adequate developer experience and in the second part I draw a rough picture about how we implemented it in a 5-teams big product development. Now it is time to discuss about the big impact – and the biggest benefits – regarding the development of the product itself.
Why do more and more companies, technical and non-technical people want to change towards an agile organisation? Maybe because the decision makers have understood that waterfall is rarely purposeful? There are a lot of motives – beside the rather wrong dumb one “because everybody else does this” – and I think there are two intertwined reasons for this: the speed at wich the digital world changes and the ever increasing complexity of the businesses we try to automate.
Companies/people have finally started to accept that they don’t know what their customer need. They have started to feel that the customer – also the market – has become more and more demanding regarding the quality of the solutions they get. This means that until Skynet is not born (sorry, I couldn’t resist 😁) we oftware developers, product owners, UX designers, etc. have to decide which solution would be the best to solve the problems in that specific business and we have to decide fast.
We have to deliver fast, get feedback fast, learn and adapt the consequences even faster. We have to do all this without down times, without breaking the existing features and – for most of us very important: without getting a heart attack every time we deploy to production.
IMHO These are the most important reasons why every product development team should invest in CI/CD.
The last missing piece of the jigsaw which allows us to deliver the features fast (respectively continuously) without disturbing anybody and without losing the control how and when features are released is called feature toggle.
A feature toggle[1] (also feature switch, feature flag, feature flipper, conditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.[2]
Wikipedia
The concept is really simple: one feature should be hidden until somebody, something decides that it is allowed to be used.
function useNewFeature(featureId) {
const e = document.getElementById(featureId);
const feat = config.getFeature(featureId);
if(!feat.isEnabled)
e.style.display = 'none';
else
e.style.display = 'block';
}
As you see, implementing feature toggles is really that simple. To adopt this concept will need some effort though:
Strive for only one toggle (one if) per feature. At the beginning it will be hard or even impossible to achieve this but it is a very important to define this as a middle-term goal. Having only one toggle per feature means the code is highly decoupled and very good structured.
Place this (main) toggle at the entry point (a button, a new form, a new API endpoint) the first interaction point with the user (person or machine) and in disabled state it should hide this entry point.
The enabled state of the toggle should lead to new services (in micro service world), new arguments or to new functions, all of them implementing the behavior for feature.enabled == true. This will lead to code duplication: yes, this is totally ok. I look at it as a very careful refactoring without changing the initial code. Implementing a new feature should not break or eliminate existing features. The tests too (all kind of them) should be organized similarly: in different files, duplicated versions, implemented for each state.
Through the toggle you gain real freedom to make mistakes or just the wrong feature. At the same time you can always enable the feature and show it the product owner or the stake holders. This means a feedback loop is reduced to minimum.
This freedom has a price of course: after the feature is implemented, the feedback is collected, the decision for enabling the feature was made, after all this the source code must be cleaned up: all code for feature.enabled == false must be removed. This is why it is so important to create the different paths so that the risk of introducing a bug is virtually zero. We want to reduce workload not increase it.
Toggles don’t have to be temporary, business toggles (i.e. some premium features or “maintenance mode”) can stay forever. It is important to define beforehand what kind of toggle will be needed because the business toggles will be always part of your source code. The default value for this kind of toggles should be false.
The default value for the temporary toggles should be true and it should be deactivated on production, activated during the development.
One advice regarding the tooling: start small, with a config map in kubernetes, a database table, a json file somewhere will suffice. Later on new requirements will appear, like notifying the client UI when a toggle changes or allowing the product owner to decide, when a feature will be enabled. That will be the moment to think about next steps but for the moment it is more important to adopt this workflow, adopt this mindset of discipline to keep the source code clean, learn the techniques how to organize the code basis and ENJOY HAVING THE CONTROL over the impact of deployments, feature decisions, stress!
That’s it, I shared all of my thoughts regarding this subject: your journey of delivering continuously can start or continued 😉) now.
p.s. It is time for the one sentence about feature branches: Feature toggles will never work with feature branches. Period. This means you have to decide: move to trunk based development or forget continuous development.
p.p.s. For the most languages exist feature toggle libraries, frameworks, even platforms, it is not necessary to write a new one. There are libraries for different complexities how the state can be calculated (like account state, persons, roles, time settings), just pick one.
Update:
As pointed out by Gergely on Twitter, on Martin Fowlers blog is a very good article describing extensively the different feature toggles and the power of this technique: Feature Toggles (aka Feature Flags)
After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)
The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).
I think, if you agree having CD this way (commit ->…->production) than you have implicitly enforced trunk-based development.
This scenario triggered a totally new view on what we could achieve – good and bad 😉 – and made the responsibility on our shoulders palpable.β Krisztina Hirth (@YellowBrickC) March 11, 2019
This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.
At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.
Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)
At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).
This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.
These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.
At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!
I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.
What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.
If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.
I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.
Every graphic is realized with plantuml thank you very much!