Continuous Delivery Is a Journey – Part 2

After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)

The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).

This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.

Successful push leads to a new version of my service/pkg/docker image

At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.

Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)

Continuous delivery to the first environment including the execution of first acceptance tests

At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).

This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.

These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.

At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!

Good developer experience also means simplified continuous deployment

I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.

What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.

Deploy manually the latest resources from integration to the review stage

If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.

Roll out, test and eventually roll back the system ready for review

I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.

Every graphic is realized with plantuml thank you very much!

to be continued …

Continuous Delivery Is a Journey – Part 1

Last year my colleagues and I had the pleasure to spend 2 days with @hamvocke and @diegopeleteiro from @thoughtworks reviewing the platform we created. One essential part of our discussions was about CI/CD described like this: “think about continuous delivery as a journey. Imagine every git push lands on production. This is your target, this is what your CD should enable.”

Even if (or maybe because) this thought scared the hell out of us, it became our vision for the next few months because we saw great opportunities we would gain if we would be able to work this way.

Let me describe the context we were working:

  • Four business teams, 100% self-organized, owning 1…n Self-contained Systems, creating microservices running as Docker containers orchestrated with Kubernetes, hosted on AWS.
  • Boundaries (as in Domain Driven Design) defined based on the business we were in.
  • Each team having full ownership and full accountability for their part of business (represented by the SCS).
  • Basic heuristics regarding source code organisation: “share nothing” about business logic, “share everything” about utility functions (in OSS manner), about experiences you made, about the lessons you learned, about the errors you made.
  • Ensuring the code quality and the software quality is 100% team responsibility.
  • You build it, you run it.
  • One Platform-as-a-service team to enable this business teams to deliver features fast.
  • Gitlab as VS, Jenkins as build server, Nexus as package repository
  • Trunk-based development, no cherry picking, “roll fast forward” over roll back.
Teams
4 Business Teams + 1 Platform-as-a-Service Team = One Product

The architecture we have chosen was meant to support our organisation: independent teams able to work and deliver features fast and independently. They should decide themselves when and what they deploy. In order to achieve this we defined a few rules regarding inter-system communication. The most important ones are:

  • Event-driven Architecture: no synchronous communication only asynchronous via the Domain Event Bus
  • Non-blocking systems: every SCS must remain (reduced) functional even if all the other systems are down

We had only a couple of exceptions for these rules. As an example: authentication doesn’t really make sense in asynchronous manner.

Working in self-organized, independent teams is a really cool thing. But

with great power there must also come great responsibility

Uncle Ben to his nephew

Even though we set some guards regarding the overall architecture, the teams still had the ownership for the internal architecture decisions. As at the beginning we didn’t have continuous delivery in place every team was alone responsible for deploying his systems. Due the missing automation we were not only predestined to make human errors but we were also blind for the couplings between our services. (And we spent of course a lot of time doing stuff manually instead of letting Jenkins or Gitlab or some other tool doing this stuff for us 🤔 )

One example: every one of our systems had at least one React App and a GraphQL API as the main communication (read/write/subscribe) channel. One of the best things about GraphQL is the possibility to include the GraphQL-schema in the react App and this way having the API Interface definition included in the client application.

Is this not cool? It can be. Or it can lead to some very smelly behavior, to a real tight coupling and to inability to deploy the App and the API independently. And just like my friend @etiennedi says: “If two services cannot be deployed independently they aren’t two services!”

This was the first lesson we have learned on this journey: If you don’t have a CD pipeline you will most probably hide the flaws of your design.

One can surely ask “what is the problem with manual deployment?” – nothing, if you have only a few services to handle, if every one in your team knows about these couplings and dependencies and is able to execute the very precise deployment steps to minimize the downtime. But otherwise? This method doesn’t scale, this method is not very professional – and the biggest problem: this method ignores the possibilities offered by Kubernetes to safely roll out, take down, or scale everything what you have built.

Having an automated, standardized CD pipeline as described at the beginning – with the goal that every commit will land on production in a few seconds – having this in place forces everyone to think about the consequences of his/hers commit, to write backwards compatible code, to become a more considered developer.

to be continued …

Event Storming with Specifications by Example

Event Storming is a technique defined and refined by Alberto Brandolini (@ziobrando). I fully agree the statement about this method, Event Storming is for now “The smartest approach to collaboration beyond silo boundaries”

I don’t want to explain what Event Storming is, the concept is present in the IT world for a few years already and there are a lot of articles or videos explaining the basics. What I want to emphasize is WHY do we need to learn and apply this technique:

The knowledge of the product experts may differ from the assumption of the developers
KanDDDinsky 2018 – Kenny Baas-Schwegler

On the 18-19.10.2018 I had the opportunity to not only hear a great talk about Event Storming but also to be part of a 2 hours long hands-on session, all this powered by Kandddinsky (for me the best conference I visited this year) and by @kenny_baas (and @use case driven and @brunoboucard). In the last few years I participated on a few Event Storming sessions, mostly on community events, twice at cleverbridge but this time it was different. Maybe ES is like Unit Testing, you have to exercise and reflect about what went well and what must be improved. Anyway this time I learned and observed a few rules and principles new for me and their effects on the outcome. This is what I want to share here.

  1. You need a facilitator.
    Both ES sessions I was part at cleverbridge have ended with frustration. All participants were willing to try it out but we had nobody to keep the chaos under control. Because as Kenny said “There will be chaos, this is guaranteed.” But this is OK, we – devs, product owners, sales people, etc. – have to learn fast to understand each other without learning the job of the “other party” or writing a glossary (I tried that already and didn’t helped 😐 ). Also we need somebody being able to feel and steer the dynamics in the room.


    The tweets were written during a discussion about who could be a good facilitator. You can read the whole thread on Twitter if you like. Another good article summarizing the first impressions of @mathiasverraes as facilitator is this one.

  2. Explain AND visualize the rules beforehand.
    I skip for now the basics like the necessity of a very long free wall and that the events should visualize the business process evolving in time.
    This are the additional rules I learned in the hands-on session:

      1. no dev-talk! The developer is per se a species able to transform EVERYTHING in patterns and techniques and tables and columns and this ability is not helpful if one wants to know if we can solve a problem together. By using dev-speech the discussion will be driven to the technical “solvability” based on the current technical constraints like architecture. With ES we want to create or deepen our ubiquitous language , and this surely not includes the word “Message Bus”  😉
      2. Every discussion should happen on the board. There will be a lot of discussions and we tend to talk a lot about opinions and feelings. This won’t happen if we keep discussing about the business processes and events which are visualized in front of us – on the board.
      3. No discussions regarding persons not in the room. Discussing about what we think other people would mind are not productive and cannot lead to real results. Do not waste time with it, time is too short anyway.
      4. Open questions occurring during the storming should not be discussed (see the point above) but marked prominently with a red sticky. Do not waste time
      5. Do not discuss about everything, look for the money! The most important goal is to generate benefit and not to create the most beautiful design!

Tips for the Storming:

  • “one person, one sharpie, one set of stickies”: everybody has important things to say, nobody should stay away from the board and the discussions.
  • start with describing the business process, business rules, eventual consistent business decisions aka policies, other constraints you – or the product owner whom the business “belongs” – would like to model, and write the most important information somewhere visible for everybody.
  • explain how ES works: every business relevant event should be placed on a time line and should be formulated in the past tense. Business relevant is everything somebody (Kibana is not a person, sorry 😉 ) would like know about.
  • explain the rules and the legend (you need a color legend to be able to read the results later).
  • give the participants time (we had 15 minutes) to write every business event they think it is important to know about on orange stickies. Also write the business rules (the wide dark red ones) and the product decisions (the wide pink ones) on stickies and put them there where they are applied. The rules before the event, the policies after one event happened.
  • start putting the stickies on the wall, throw away the duplicates, discuss and maybe reformulate the rest. After you are done try to tell the story based on what you can read on the wand. After this read the stickies from the end to the start. With these methods you should be able to discover if you have gaps or used wrong assumptions by modelling the process you wanted to describe.
  • mark known processes (like “manual process”) with the same stickies as the policies and do not waste time discussing it further.
  • start to discuss the open questions. Almost always there are different ways to answer this questions and if you cannot decide in a few seconds than postpone it. But as default: decide to create the event and measure how often happens so that later on you can make the right decision!
    Event Storming – measure now, decide later


    Another good article for this topic is this one from @thinkb4coding

At this point we could have continued with the process to find aggregates and bounded contexts but we didn’t. Instead we switched the methodology to Specifications by Example – in my opinion a really good idea!

Specifications
Event Storming enhanced with Specifications by Example

We prioritized the rules and policies and for the most important ones we defined examples – just like we are doing it if we discuss a feature and try to find the algorithm.

Example: in our ticket reservation business we had a rule saying “no overbooking, one ticket per seat”. In order to find the algorithm we defined different examples:

  • 4 tickets should be reserved and there are 5 tickets left
  • 4 tickets should be reserved and there are 3 tickets left
  • 4 tickets should be reserved and all tickets are already reserved.

With this last step we can verify if our ideas and assumptions will work out and we can gain even more insights about the business rules and business policies we defined – and all this not as developer writing if-else blocks but together with the other stake holders. At the same time the non-techie people would understand in the future what impact these rules and decisions have on the product we build together. The side-effect having the specifications already defined is also a great benefit as these are the acceptance tests which will be built by the developer and read and used by the product owner.

More about the example and the results can you read on the blog of Kenny Baas-Schwegler.

I hope I covered everything and have succeeded to reproduce the most important learning of the 2 days ( I tend to oversee things thinking “it is obvious”). If not: feel free to ask, I will be happy to answer 🙂

Happy Storming!

Update: we had our first event storming and it was good!

Unfortunately we didn’t get to define the examples (not enough time). Most of the rules described above were accepted really well (explain the rules, create a legend for the stickies, flag everything out of scope as Open Question). Where I as facilitator need more training is by keeping the discussion ON the board and not beside. I have also a few new takeaways:

  • the PO describes his feature and gives answers, but he doesn’t write stickies. The main goal ist to share his vision. This means, he should test us if we understood the same vision. As bonus he should complete his understanding of the feature trough the questions which appear during the storming.
  • one color means one action/meaning. We had policies and processes on the same red stickies and this was misleading.
  • if you have a really complex domain
    (like e-commerce for SaaS products in our case) or really complex features start with one happy path example. Define this example and create the event “stream” with this example. At the end you should still add the other, not so happy-path examples.

You never code alone!

Meine letzte Woche in der Arbeit würde ich gerne the cleverbridge-University-Woche nennen, so großartig war sie. Wir haben 3 von 5 Tagen damit verbracht, unseren Code, unsere Prozesse, unseres Miteinander zu analysieren, zu sezieren, zu verbessern:

  • Dienstag hatten wir fast einen ganz Tag Architektur-Workshop um die Vision und die Strukturen für unseren Code zu definieren.
  • Den Freitag haben wir gemeinsam bei der dotnet-cologne 2014 verbracht.

Das wichtigste hat allerdings am Donnerstag stattgefunden: wir haben uns einen ganz Tag Zeit genommen, um von Ilker, meinem TDD-Mentor, so viel wie möglich über Specs, Tests, Teams und Kommunikation zu verstehen.

Es geht nichts über Verstehen!

from requirement to specifications
Wie schafft es ein Team, das und nur das zu bauen, was der Kunde braucht?

  1. Findet alle zusammen – product owner, Entwickler, Tester – das richtige Wording, die “ubiquitous language” heraus, die die Anforderung beschreiben
  2. Versteht alle zusammen, worum es geht.
  3. Definiert zusammen das Feature
  4. Schreibt zusammen die Akzeptanzkriterien

Wie werden eigentlich eine paar Menschen zu einem Team?

Dies ist wahrscheinlich das Problem aller Firmen weltweit, die sich weiterentwickeln möchten. Und das zu Recht: stell dir vor, du bist ein C#-Entwickler mit zum Beispiel Schwerpunkt Web, hast JavaScript warscheinlich noch nie als Programmiersprache betrachtet und von der Arbeit der anderen C#-Entwickler weißt nichts, außer dass sie existiert und sie irgendwann später erfolgen wird, nachdem du deine Aufgaben schon längst erledigt hast. Einen Abteilungsleiter hast natürlich auch und deine Entscheidungsgewalt beschränkt sich auf “soll ich foreach oder lieber LINQ verwenden?”

Das ist allerdings nicht alles: deine Aufgaben überschreiten niemals die Grenzen deiner Abteilung, genauso wenig wie dein Verständnis über die Anforderungen. Und das stört dich seit langem und nicht nur dich sondern auch den JavaScript-Typen und die Frontend-Entwicklerin und wenn du Glück hast, dann auch deinen Abteilungsleiter. Wenn das der Fall ist, dann hat deine Firma Glück: ihr seid offen für ein Crossfunktionales Team. Ihr musst nur diese Frage bejahen:

Bin ich bereit, jede Rolle in diesem Team zu erfüllen?
Auch mit der Gefahr, dass ich das nicht kann? Wenigstens noch nicht, oder noch nicht gut genug…

Jedes Team braucht noch ein paar Zutaten:

  1. Ein Ziel
  2. Eine Partizipation – eine gemeinsame “Teilhaberschaft”
  3. Den Respekt der Individualität
  4. Die gemeinsame Verantwortung

oder wie Ilker sagt “oder mindestens 3 davon plus jemanden, der die anderen die ganze Zeit daran erinnert”.

 

Ich will nicht behaupten, dass wir bereits alles darauf haben – aber wir sind auf dem besten Wege dahin. Und was das Wichtigste ist: wir werden diesen Weg sicherlich nicht verlassen.