Teams – The Yellow Brick Code

You Don’t Need To Work In Silos If You Don’t Want To

… but if you do then you should stop reading here. It is Ok for me.

How many of you have built features in backend services which were never used in any application? Or implemented requests in the wrong way because nobody cared to give you the whole story, the whole problem this feature should solve? Or felt demotivated because of the lack of feedback, if that what you do makes an impact, or it was wasted energy and time? How many of you are still working under these unsatisfying circumstances? For those of you is this article.

I did all of this. One case I will never forget: I should implement a feature request resulting in returning some object property as a string. This property was containing a URL, but the feature didn’t say “I need to know how to navigate to X or Y” but “please include the URL X in the result”.

It turned out that another 2 teams used this “string” to build navigation on it or to include it in emails without ever telling me. Why should they? I was done with the feature: it was their turn. Both of them have validated this string, have built URLs with them (using information exclusively owned by the backend service…), etc.

Let me be more explicit:

Failure No. 1: If I would have changed some internals in the backend service, I could’ve broken the UI code without knowing. My colleagues relied on things they had no chance to control. We were dependent on each other without being able to see it.

Failure No. 2: the company paid 3 different developers to write the same validation functions and the customer flow had to pass the same validations 3 times instead of only once. A totally wrong decision, from an economical point of view.

I think that was the moment I decided to change the way we deliver features, the way we work together. This was 6 or 7 years ago and since then I followed the same method to reorganize not only the teams but also the source code. Because one thing is sure: changing one without the other only leads to bigger pains and even more frustration.

Step 1. Visit the “other side” of that wall and learn what they are doing and how they are doing it. You will observe bottlenecks and wasted time and energy in your value stream (the road a feature passes from the idea to the customer)

Step 2. Get the buy-in by the next level in your hierarchy: in most situations (in both cases I were in this situation) you are not the first one noticing these problems, but you could be the first one offering a solution. Grab this chance, don’t hesitate!

Step 3. Remove the wall between the silos: find a good time to make your move, after the biggest project ended or before the next one starts. Don’t wait too long, there always will be unfinished features.

Step 4. This depends on how many team members we are talking about. In both cases, we were around 15 people, and nobody wants stand-ups or even meetings with 15 people! You become even slower and even less capable to take decisions. But this step is important for a few things:

both “parties” should learn and understand what the others do, how the parts are connected, what language, concept, design is used to build them
all members should understand and accept that it is important to split up in teams – and this is always hard because it means “we have to change”! Developers are – against all expectations – very reluctant to change. Even more reluctant when they realize that they won’t work with their buddies anymore but with some hardly known people they do not really or trust.
you and/or your boss, your colleagues, your buddy in this change must start to discover how the domain is shaped, how can the teams being split up – because this will be the next step.

Up to this point you didn’t improve the developer experience, it will become rather worse. What you have improved is the life of the product manager or CTO or whoever brings the requests to the teams: instead of explaining two teams the two parts of a feature (cut in the “middle” between backend and frontend), he/she must explain it only once. At the same time, the Delivery Lead Time (the first key metric in measuring team performance) will become shorter because all the ping-pong between BE and FE can be eliminated before the feature development starts.

After you all spent a longer or shorter time together is time to take the next step: align the organization to the business

Designing Autonomous Services & Teams Together – Nick Tune – KanDDDinsky 2017

The most important part is to find the natural boundaries of the domain and create business teams who OWN this (sub) domains.

I did this 3 times in all kinds of environments: brownfield monolith or greenfield new biz, it doesn’t matter. Having a monolith as cash cow doesn’t make this change easy of course but it can be made, with discipline and a good plan on how to take over control. (this topic is much to complex to be included in this article)

The last thing which must be said is, when NOT to start this transformation:

If you don’t find any fellow to support you. In this case, either the problem isn’t big enough to be felt by the others, or you are in the wrong company and maybe should start to think to transform yourself instead (and leave).
If you or your fellow and/or boss aren’t patient people. Changing is hard and should be accompanied carefully and patiently – so that one does not need to repeat it again after even greater frustrations and chaos (was there, saw this :-/ )
If you expect that this is all. Because it isn’t: every change toward more transparency – because this is what happens when you break up silos and let others look at the existing solutions – all these changes will make issues transparent. A few of these issues will be technical (like CI/CD, code coupling, infrastructure coupling, etc.). But the hard problems will be missing communication skills and missing trust. Nothing that cannot be solved – but it will take time, that is sure.

If you reach this point, you can start to form an autonomous team: one which not only decides, what to do but also in charge to do it. Working in an environment created by you and your team allows you all to discover and live up to your creativity, to make mistakes and learn from them.

This ownership and responsibility make the difference between somebody hired to type lines of code and somebody solving problems.

What do you think? Could you start this change in your company? What would you need?

Now you know about my experience. I would be really happy to find out yours – here or on twitter.

One last question: what would you like more to read of: how to find the right boundaries or how can your team become a REALLY autonomous team – and how autonomous can that be?

Continuous Delivery Is a Journey – Part 2

After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)

The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).

I think, if you agree having CD this way (commit ->…->production) than you have implicitly enforced trunk-based development.

This scenario triggered a totally new view on what we could achieve – good and bad 😉 – and made the responsibility on our shoulders palpable.— Krisztina Hirth (@YellowBrickC) March 11, 2019

This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.

Successful push leads to a new version of my service/pkg/docker image

At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.

Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)

Continuous delivery to the first environment including the execution of first acceptance tests

At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).

This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.

These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.

At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!

https://twitter.com/YellowBrickC/status/1105081259934605312

Good developer experience also means simplified continuous deployment

I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.

What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.

Deploy manually the latest resources from integration to the review stage

If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.

Roll out, test and eventually roll back the system **ready for review**

I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.

Every graphic is realized with plantuml thank you very much!

to be continued …

Continuous Delivery Is a Journey – Part 1

Last year my colleagues and I had the pleasure to spend 2 days with @hamvocke and @diegopeleteiro from @thoughtworks reviewing the platform we created. One essential part of our discussions was about CI/CD described like this: “think about continuous delivery as a journey. Imagine every git push lands on production. This is your target, this is what your CD should enable.”

Even if (or maybe because) this thought scared the hell out of us, it became our vision for the next few months because we saw great opportunities we would gain if we would be able to work this way.

Let me describe the context we were working:

Four business teams, 100% self-organized, owning 1…n Self-contained Systems, creating microservices running as Docker containers orchestrated with Kubernetes, hosted on AWS.
Boundaries (as in Domain Driven Design) defined based on the business we were in.
Each team having full ownership and full accountability for their part of business (represented by the SCS).
Basic heuristics regarding source code organisation: “share nothing” about business logic, “share everything” about utility functions (in OSS manner), about experiences you made, about the lessons you learned, about the errors you made.
Ensuring the code quality and the software quality is 100% team responsibility.
You build it, you run it.
One Platform-as-a-service team to enable this business teams to deliver features fast.
Gitlab as VS, Jenkins as build server, Nexus as package repository
Trunk-based development, no cherry picking, “roll fast forward” over roll back.

The architecture we have chosen was meant to support our organisation: independent teams able to work and deliver features fast and independently. They should decide themselves when and what they deploy. In order to achieve this we defined a few rules regarding inter-system communication. The most important ones are:

Event-driven Architecture: no synchronous communication only asynchronous via the Domain Event Bus
Non-blocking systems: every SCS must remain (reduced) functional even if all the other systems are down

We had only a couple of exceptions for these rules. As an example: authentication doesn’t really make sense in asynchronous manner.

Working in self-organized, independent teams is a really cool thing. But

with great power there must also come great responsibility
Uncle Ben to his nephew

Even though we set some guards regarding the overall architecture, the teams still had the ownership for the internal architecture decisions. As at the beginning we didn’t have continuous delivery in place every team was alone responsible for deploying his systems. Due the missing automation we were not only predestined to make human errors but we were also blind for the couplings between our services. (And we spent of course a lot of time doing stuff manually instead of letting Jenkins or Gitlab or some other tool doing this stuff for us 🤔 )

One example: every one of our systems had at least one React App and a GraphQL API as the main communication (read/write/subscribe) channel. One of the best things about GraphQL is the possibility to include the GraphQL-schema in the react App and this way having the API Interface definition included in the client application.

Is this not cool? It can be. Or it can lead to some very smelly behavior, to a real tight coupling and to inability to deploy the App and the API independently. And just like my friend @etiennedi says: “If two services cannot be deployed independently they aren’t two services!”

This was the first lesson we have learned on this journey: If you don’t have a CD pipeline you will most probably hide the flaws of your design.

One can surely ask “what is the problem with manual deployment?” – nothing, if you have only a few services to handle, if every one in your team knows about these couplings and dependencies and is able to execute the very precise deployment steps to minimize the downtime. But otherwise? This method doesn’t scale, this method is not very professional – and the biggest problem: this method ignores the possibilities offered by Kubernetes to safely roll out, take down, or scale everything what you have built.

Having an automated, standardized CD pipeline as described at the beginning – with the goal that every commit will land on production in a few seconds – having this in place forces everyone to think about the consequences of his/hers commit, to write backwards compatible code, to become a more considered developer.

to be continued …