Continuous Delivery Is a Journey – Part 2

After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)

The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).

This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.

Successful push leads to a new version of my service/pkg/docker image

At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.

Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)

Continuous delivery to the first environment including the execution of first acceptance tests

At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).

This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.

These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.

At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!

Good developer experience also means simplified continuous deployment

I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.

What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.

Deploy manually the latest resources from integration to the review stage

If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.

Roll out, test and eventually roll back the system ready for review

I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.

Every graphic is realized with plantuml thank you very much!

to be continued …

Continuous Delivery Is a Journey – Part 1

Last year my colleagues and I had the pleasure to spend 2 days with @hamvocke and @diegopeleteiro from @thoughtworks reviewing the platform we created. One essential part of our discussions was about CI/CD described like this: “think about continuous delivery as a journey. Imagine every git push lands on production. This is your target, this is what your CD should enable.”

Even if (or maybe because) this thought scared the hell out of us, it became our vision for the next few months because we saw great opportunities we would gain if we would be able to work this way.

Let me describe the context we were working:

  • Four business teams, 100% self-organized, owning 1…n Self-contained Systems, creating microservices running as Docker containers orchestrated with Kubernetes, hosted on AWS.
  • Boundaries (as in Domain Driven Design) defined based on the business we were in.
  • Each team having full ownership and full accountability for their part of business (represented by the SCS).
  • Basic heuristics regarding source code organisation: “share nothing” about business logic, “share everything” about utility functions (in OSS manner), about experiences you made, about the lessons you learned, about the errors you made.
  • Ensuring the code quality and the software quality is 100% team responsibility.
  • You build it, you run it.
  • One Platform-as-a-service team to enable this business teams to deliver features fast.
  • Gitlab as VS, Jenkins as build server, Nexus as package repository
  • Trunk-based development, no cherry picking, “roll fast forward” over roll back.
Teams
4 Business Teams + 1 Platform-as-a-Service Team = One Product

The architecture we have chosen was meant to support our organisation: independent teams able to work and deliver features fast and independently. They should decide themselves when and what they deploy. In order to achieve this we defined a few rules regarding inter-system communication. The most important ones are:

  • Event-driven Architecture: no synchronous communication only asynchronous via the Domain Event Bus
  • Non-blocking systems: every SCS must remain (reduced) functional even if all the other systems are down

We had only a couple of exceptions for these rules. As an example: authentication doesn’t really make sense in asynchronous manner.

Working in self-organized, independent teams is a really cool thing. But

with great power there must also come great responsibility

Uncle Ben to his nephew

Even though we set some guards regarding the overall architecture, the teams still had the ownership for the internal architecture decisions. As at the beginning we didn’t have continuous delivery in place every team was alone responsible for deploying his systems. Due the missing automation we were not only predestined to make human errors but we were also blind for the couplings between our services. (And we spent of course a lot of time doing stuff manually instead of letting Jenkins or Gitlab or some other tool doing this stuff for us 🤔 )

One example: every one of our systems had at least one React App and a GraphQL API as the main communication (read/write/subscribe) channel. One of the best things about GraphQL is the possibility to include the GraphQL-schema in the react App and this way having the API Interface definition included in the client application.

Is this not cool? It can be. Or it can lead to some very smelly behavior, to a real tight coupling and to inability to deploy the App and the API independently. And just like my friend @etiennedi says: “If two services cannot be deployed independently they aren’t two services!”

This was the first lesson we have learned on this journey: If you don’t have a CD pipeline you will most probably hide the flaws of your design.

One can surely ask “what is the problem with manual deployment?” – nothing, if you have only a few services to handle, if every one in your team knows about these couplings and dependencies and is able to execute the very precise deployment steps to minimize the downtime. But otherwise? This method doesn’t scale, this method is not very professional – and the biggest problem: this method ignores the possibilities offered by Kubernetes to safely roll out, take down, or scale everything what you have built.

Having an automated, standardized CD pipeline as described at the beginning – with the goal that every commit will land on production in a few seconds – having this in place forces everyone to think about the consequences of his/hers commit, to write backwards compatible code, to become a more considered developer.

to be continued …

Kollegen-Bashing – Überraschung, es hilft nicht!

Bei allen Konferenzen, die meinen Kollegen und ich besuchen, poppt früher oder später das Thema Team-Kultur auf, als Grund von vielen/allen Problemen. Wenn wir erzählen, wie wir arbeiten, landen wir unausweichlich bei der Aussage “eine selbstorganisierte crossfunktionale Organisation ohne einen Chef, der DAS Sagen hat, ist naiv und nicht realistisch“. “Ihr habt irgendwo sicher einen Chef, ihr wisst es nur nicht!” war eine der abgefahrensten Antworten, die wir unlängst gehört haben, nur weil der Gesprächspartner nicht in der Lage war, dieses Bild zu verarbeiten: 5 Selbstorganisierte Teams, ohne Chefs, ohne CTO, ohne Projektmanager, ohne irgendwelche von außen eingekippte Regeln und Anforderungen, ohne Deadlines denen wir widerspruchslos unterliegen würden. Dafür aber mit selbst auferlegten Deadlines, mit Budgets, mit Freiheiten und Verantwortung gleichermaßen.

Ich spreche jetzt hier nicht vom Gesetz und von der Papierform: natürlich haben wir in der Firma einen CTO, einen Head of Development, einen CFO, sie entscheiden nur nicht wann, was und wie wir etwas tun. Sie definieren die Rahmen, in der die Geschäftsleitung in das Produkt/Vorhaben investiert, aber den Rest tun wir: POs und Scrum Master und Entwickler, gemeinsam.

Wir arbeiten seit mehr als einem Jahr in dieser Konstellation und wir können noch 6 Monate Vorlaufzeit dazurechnen, bis wir in der Lage waren, dieses Projekt auf Basis von Conways-Law zu starten.

“Organizations which design systems […] are constrained to produce designs which are copies of the communication structures of these organizations.” [Wikipedia]

In Umkehrschluss (und freie Übersetzung) heißt das “wie deine Organisation ist, so wird auch dein Produkt, dein Code strukturiert sein”. Wir haben also an unserer Organisation gearbeitet. Das Ziel war, ein verantwortungsvolles Team aufzubauen, das frei zum Träumen ist, um ein neues, großartiges Produkt zu bauen, ohne auferlegten Fesseln.

Wir haben jetzt dieses Team, wir leben jetzt diesen Traum – der natürlich auch Schatten hat, das Leben ist schließlich kein Ponyhof :). Der Unterschied ist: es sind unsere Probleme und wir drücken uns nicht davor, wir lösen sie zusammen.

Bevor ihr sagt “das ist ein Glücksfall, passiert normalerweise nicht” würde ich widersprechen. Bei uns ist es auch nicht nur einfach so passiert, wir haben (ungefähr 6 Monate) daran gearbeitet, und tun es weiterhin kontinuierlich. Der Clou, der Schlüssel zu dieser Organisation ist nämlich eine offene Feedback-Kultur.

Was soll das heißen, wie haben wir das bei uns erreicht?

  • Wir haben gelernt, Feedback zu geben und zu nehmen – ja, das ist nicht so einfach. Das sind die Regeln
    • Alle Aussagen sind Subjektiv: “Gestern als ich Review gemacht habe, habe ich das und das gesehen. Das finde ich aus folgenden Gründen nicht gut genug/gefährlich. Ich könnte mir vorstellen, dass so oder so es uns schneller zum Ziel bringen könnte.” Ihr merkt: niemals DU sagen, alles in Ich-Form, ohne vorgefertigten Meinungen oder Annahmen.
    • Alle Aussagen mit konkreten Beispielen. Aussagen mit “ich glaube, habe das Gefühl, etc.” sind Meinungen und keine Tatsachen. Man muss ein Beispiel finden sonst ist das Feedback nicht “zulässig”
    • Das Feedback wird immer konstruktiv formuliert. Es hilft nicht zu sagen, was schlecht ist, es ist viel wichtiger zu sagen woran man arbeiten sollte: “Ich weiß aus eigener Erfahrung, dass Pair-Programming in solchen Fällen sehr hilfreich ist” z.B.
    • Derjenige, die Feedback bekommt, muss es anhören ohne sich zu recht fertigen. Sie muss sich selber entscheiden, was sie mit dem Feedback macht. Jeder, der sich verbessern möchte, wird versuchen, dieses Feedback zu Herzen zu nehmen und an sich zu arbeiten. Das muss man nicht vorschreiben!
  • One-and-Ones: das sind Feedback-Runden zwischen 2 Personen in einem Team, am Anfang mit Scrum Master, solange die Leute sich an die Formulierung gewöhnt haben (wir haben am Anfang die ganze Idee ausgelacht) und später dann nur noch die Paare. Jedes mal nur in eine Richtung (nur der eine bekommt Feedback) und z.B. eine Woche später in die andere Richtung. Das Ergebnis ist, das wir inzwischen keine Termine mehr haben, wir machen das automatisch, jedes Mal, wenn etwas zu “Feedbacken” ist.
  • Team-Feedback: ist die letzte Stufe, läuft nach den gleichen regeln. Wird nicht nur zwischen Teams sondern auch zwischen Gruppen/Gilden gehalten, wie POs oder Architektur-Owner.

Das war’s. Ich habe seit über einem Jahr nicht mehr Sätze gehört, wie “die Teppen von dem anderen Team waren, die alles verbockt haben” oder “Die kriegen es ja sowieso nicht hin” oder “Was kümmert es mich, sie haben ja den Fehler eingecheckt” Und diese Arbeitsatmosphäre verleiht Flügel! (sorry für die copy-right-Verletzung 😉 )