Project vs. Product Development – a Comparison

Last week I have realized that I had a blind spot: I thought that every developer is aware that selling software or delivering a product are not quite the same. As it turned out, I was wrong so I created this list to explain what I mean.

Goals And Interests In Project Development (aka Feature Factory):

  • The main stakeholder is the company paying for the features (called client further on), not the customer who is using them.
  • The responsibility for maintaining and evolving of the platform is not my job.
  • The requirements are defined by the client: I have no way to validate them because I have no contact with the users of the features. Feedback-based decisions are not possible.
  • Fast development but slow delivery.
  • Features are defined as a whole and delivered as a whole, not iteratively. Visual requirements (mock-ups) are un-negotiable because they are ordered as-is, even if the end user might not see it that way.
  • Perfection instead of usability.
  • Innovation is limited by restricted access to the infrastructure or other 3rd party services used by the client.
  • No involvement in long- and medium-term planning, as the goals of the client are not my goals. Very limited possibility to plan the architecture aligned with the strategy of the client.
  • The product my company sells is time and/or LoC. (Disclaimer: this would not be the case when working with Extreme Contracts)
  • The most important metrics are:
    • hours per week,
    • features per unit of time,
    • LoC

Goals And Interests In Product Development:

  • The main stakeholders are the end customers and the company itself (me and my team included).
  • The main goal is to identify users’ problems, develop solutions for them and solve them in the correct order. The job is no longer spending time with work or moving tasks on a Jira board, but to provide solutions.
  • Nowadays, with a large number of competitors who could appear every day, time-to-market (i.e. time) is decisive, but not at the expense of quality.
  • We own the maintenance and the evolution of the platform. It is our interest to produce high quality and robust software.
  • Through the cooperation of business analysts, UX experts, software developers and cloud experts, we are able to deliver features (capabilities) step by step, measure their benefits and decide on the next measures.
  • I can use all my skills and my company can benefit for them.
  • The user stories are written in a business-oriented manner, they can be taken literally. They document the proposed solution, can be cut into meaningful slices to be implemented quickly and reliably and to be delivered fast.
  • “Fail Fast” and “Inspect and Adapt” are the most important principles.
  • Usability, not perfection.
  • The most important metrics are:
    • customer satisfaction (measured with business metrics and the usage of delivered features),
    • lead time (time between idea and in use),
    • time to recovery,
    • change failure rate (Accelerate)

DDD is not a goal but a means

DDD (Domain-Driven Design) it is “only” one essential building block that promises an evolutionary software, a software that won’t let you down in a few years or after a few thousands of line of code.

That’s it. I could stop here, the title and this short description tell everything. But is this true, can everybody reading this relate to that? Is it helpful? I don’t think so.

During the last year, I often heard the question “how can I introduce DDD in my company?” – mostly meaning “in my dev team”. It is time to answer this question now.

Misconception number 1: introducing DDD as a goal

Misconception number 2: DDD is a dev thing, and after the devs “fix this” by applying all the patterns, all problems will disappear – and they will live happily ever after …

Let’s imagine a successful company sitting already on 1m lines of code, a lot of customers, a few teams, possibly working in silos, having frontend and backend development split. Or a younger company but with the same organization, the only difference is the effort needed to change things. This company has an agile development process (with sprints and “stories” and one Product Owner pro team etc.), but they observe that every new feature takes longer and longer. A few devs have read about DDD, read the Blue Book, and convince one of the backend teams to introduce DDD. The company agrees to let them do it because they hope that everything will be easy and fast and bug-free afterwards. So everybody is happy 😊

Fast-forward a few months: now we are at the point when the source code owned by that team was restructured nicely, with value types, entities, aggregates. Ok, the Rest API is unchanged because the frontend team is already using it and they were not involved. But the business logic ist cleaned up. The names, the language were defined by the devs – DDD is a dev thing, isn’t it? The development process didn’t change either: they get requirements to implement things without including anything built by the frontend team, the part the customer sees. For this reason, the refinements are not about customer needs but API calls and frameworks – just like before the rework towards DDD.

JIRA 1234: Please add this new flag “withVAT” to the API call generating the invoice (sounds familiar?)

What is new? Frustration: the PO does not know why the devs are asking more and more details, and the developers don’t understand why the PO does not get the meaning of aggregates and value types, etc.

Refinement: “we cannot add it to that API call, the VAT decision belongs to another aggregate! ;( ”

And what else is new? Maintaining those API routes becomes harder and harder because they do not follow the business structure but are kept as an Anticorruption Layer between the frontend and backend (team 😉).

I am sorry if I am disappointing you, it is not my intention. It is the opposite actually: to save you from this frustration after having invested so much energy and maybe extra hours to get everything perfect.

What is DDD: a paradigm, a rule set that helps you to build an evolutionary software. Like Mathias Verraes says

This can only be achieved if DDD stops being a “dev-stuff” and starts leading to a common way to discuss features, behaviors, and capabilities.

And why not? DDD means not only aggregates and value types; the essential component is the ubiquitous language used in your domain by the business people, by the customer. And apropos customer: DDD aims to solve the customer’s problems and be known in the teams applying DDD. Why not create a whole team handling those needs, from the UI (or even UX) to the delivery? (this move would also change the “somehow agile” development process in a real one, based on feedback and experiments instead of a prepared backlog covering the next three months)

We live in a great time: the accumulated knowledge and the evolution of the software industry in the last 25 years give us anything we need to build resilient, sustainable and maintainable software that won’t “die” slowly if we are careful and happy to learn. What do we need?

  1. Visual modelling technics to discuss and learn the business (like Event Storming, Impact Mapping, etc.)
  2. Crunching and defining stories aiming to solve customer needs, always with the Ubiquitous Language (like User Story Mapping, BDD)
  3. A paradigm to make decisions based on the Business Domain, not on hopes or wishes (strategical DDD to write the right code)
  4. A guide for organizing the teams to achieve good teamwork, less cognitive load and smooth development process (Context Maps from DDD)
  5. Open feedback culture to create psychological safety
  6. A set of patterns to organize the code (DDD again)
  7. A bunch of practices to write the code right: TDD, refactoring, pairing, mob-programming
  8. Trunk-based development (aka Continuous integration) and continuous delivery to reduce the lead time (the time from the idea to the usage, also to the possibility of feedback) and to reduce the MTTR (Mean Time To Recovery)
  9. Mature 3rd party services to fulfil all the other needs when offering digital products. Now we can focus entirely on our business proposal (AWS/Azure & co, Observability, etc.)
  10. The readiness and willingness to learn the business we want to improve and the corresponding company culture supporting this.

This list is my Silver Bullet. All items on it are essential, and all of them exist. Most of them had a significant value for itself but the sum of them? Wow, they give us a real chance in this rapidly changing world with all its competitors running after our slice of cake.

Book List (And Other Resources)

Basic Resources For Developer

  1. The Pragmatic Programmer – by David Thomas, Andrew Hunt
  2. Test-Driven Development: By Example – by Kent Beck
  3. Code Complete – By Steve McConnell
  4. Refactoring: Improving the Design of Existing Code – by Martin Fowler
  5. Fix The Small Things – by Kent Beck
  6. Working Effectively with Legacy Code – by Michael Feathers
  7. Ian Cooper: TDD, where did it all go wrong (Video)
  8. Some Underrated Elements of Success for the Modern Programmer – J. B. Rainsberger
  9. 97 Things Every Programmer Should Know – by Kevin Henney

Architecture (and Business)

  1. Domain-Driven Design: Tackling Complexity in the Heart of Software – by Eric Evans
  2. Martin Fowler’s Blog
  3. Community Collection of Maps, Heuristics, Methods and more – Open Source
  4. Encouraging DDD Curiosity as a Product Owner – Zsófia Herendi – KanDDDinsky(video)

Resources For Everbody Caring For Product(Project) Development and Strategy

  1. Impact Mapping: Making a Big Impact with Software Products and Projects – by Gojko Adzic
  2. Specification by Example: How Successful Teams Deliver the Right Software – by Gojko Adzic
  3. The Bottleneck Rules: How To Get More Done at Work, Without Working Harder – by Clarke Ching
  4. Agile Conversations – by Douglas Squirrel and Jeffrey Fredrick (there also is a Meetup to practice)
  5. Nick Tune’s Strategic Technology Blog
  6. Accelerate: Building and Scaling High-Performing Technology Organizations – byNicole Forsgren, Jez Humble, Gene Kim
  7. Team Topologies: Organizing Business and Technology Teams for Fast Flow – by Matthew Skelton, Manuel Pais
  8. The Software Architect Elevator: Transforming Enterprises with Technology and Business Architecture – by Gregor Hohpe
  9. Visual Collaboration Tools – by many

Crime, History

  1. The Mythical Man-Month: Essays on Software Engineering – by Frederick P. Brooks Jr.
  2. Your Code As a Crime Scene: Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs – by Adam Tornhill
  3. The Phoenix Project – by Gene Kim, Kevin Behr, and George Spafford
  4. The Unicorn Project – by Gene Kim

THE FIRST IDEAL: Locality and Simplicity

THE SECOND IDEAL: Focus, Flow, and Joy

THE THIRD IDEAL: Improvement of Daily Work

THE FOURTH IDEAL: Psychological Safety

THE FIFTH IDEAL: Customer Focus

You Don’t Need To Work In Silos If You Don’t Want To

… but if you do then you should stop reading here. It is Ok for me.

How many of you have built features in backend services which were never used in any application? Or implemented requests in the wrong way because nobody cared to give you the whole story, the whole problem this feature should solve? Or felt demotivated because of the lack of feedback, if that what you do makes an impact, or it was wasted energy and time? How many of you are still working under these unsatisfying circumstances? For those of you is this article.

I did all of this. One case I will never forget: I should implement a feature request resulting in returning some object property as a string. This property was containing a URL, but the feature didn’t say “I need to know how to navigate to X or Y” but “please include the URL X in the result”.

It turned out that another 2 teams used this “string” to build navigation on it or to include it in emails without ever telling me. Why should they? I was done with the feature: it was their turn. Both of them have validated this string, have built URLs with them (using information exclusively owned by the backend service…), etc.

Let me be more explicit:

Failure No. 1: If I would have changed some internals in the backend service, I could’ve broken the UI code without knowing. My colleagues relied on things they had no chance to control. We were dependent on each other without being able to see it.

Failure No. 2: the company paid 3 different developers to write the same validation functions and the customer flow had to pass the same validations 3 times instead of only once. A totally wrong decision, from an economical point of view.

I think that was the moment I decided to change the way we deliver features, the way we work together. This was 6 or 7 years ago and since then I followed the same method to reorganize not only the teams but also the source code. Because one thing is sure: changing one without the other only leads to bigger pains and even more frustration.

Step 1. Visit the “other side” of that wall and learn what they are doing and how they are doing it. You will observe bottlenecks and wasted time and energy in your value stream (the road a feature passes from the idea to the customer)

Step 2. Get the buy-in by the next level in your hierarchy: in most situations (in both cases I were in this situation) you are not the first one noticing these problems, but you could be the first one offering a solution. Grab this chance, don’t hesitate!

Step 3. Remove the wall between the silos: find a good time to make your move, after the biggest project ended or before the next one starts. Don’t wait too long, there always will be unfinished features.

Step 4. This depends on how many team members we are talking about. In both cases, we were around 15 people, and nobody wants stand-ups or even meetings with 15 people! You become even slower and even less capable to take decisions. But this step is important for a few things:

  • both “parties” should learn and understand what the others do, how the parts are connected, what language, concept, design is used to build them
  • all members should understand and accept that it is important to split up in teams – and this is always hard because it means “we have to change”! Developers are – against all expectations – very reluctant to change. Even more reluctant when they realize that they won’t work with their buddies anymore but with some hardly known people they do not really or trust.
  • you and/or your boss, your colleagues, your buddy in this change must start to discover how the domain is shaped, how can the teams being split up – because this will be the next step.

Up to this point you didn’t improve the developer experience, it will become rather worse. What you have improved is the life of the product manager or CTO or whoever brings the requests to the teams: instead of explaining two teams the two parts of a feature (cut in the “middle” between backend and frontend), he/she must explain it only once. At the same time, the Delivery Lead Time (the first key metric in measuring team performance) will become shorter because all the ping-pong between BE and FE can be eliminated before the feature development starts.

After you all spent a longer or shorter time together is time to take the next step: align the organization to the business

Designing Autonomous Services & Teams Together – Nick Tune – KanDDDinsky 2017

The most important part is to find the natural boundaries of the domain and create business teams who OWN this (sub) domains. 

I did this 3 times in all kinds of environments: brownfield monolith or greenfield new biz, it doesn’t matter. Having a monolith as cash cow doesn’t make this change easy of course but it can be made, with discipline and a good plan on how to take over control. (this topic is much to complex to be included in this article)

The last thing which must be said is, when NOT to start this transformation:

  • If you don’t find any fellow to support you. In this case, either the problem isn’t big enough to be felt by the others, or you are in the wrong company and maybe should start to think to transform yourself instead (and leave).
  • If you or your fellow and/or boss aren’t patient people. Changing is hard and should be accompanied carefully and patiently – so that one does not need to repeat it again after even greater frustrations and chaos (was there, saw this :-/ )
  • If you expect that this is all. Because it isn’t: every change toward more transparency – because this is what happens when you break up silos and let others look at the existing solutions – all these changes will make issues transparent. A few of these issues will be technical (like CI/CD, code coupling, infrastructure coupling, etc.). But the hard problems will be missing communication skills and missing trust. Nothing that cannot be solved – but it will take time, that is sure.

If you reach this point, you can start to form an autonomous team: one which not only decides, what to do but also in charge to do it. Working in an environment created by you and your team allows you all to discover and live up to your creativity, to make mistakes and learn from them.

This ownership and responsibility make the difference between somebody hired to type lines of code and somebody solving problems.

What do you think? Could you start this change in your company? What would you need?

Now you know about my experience. I would be really happy to find out yours – here or on twitter.

One last question: what would you like more to read of: how to find the right boundaries or how can your team become a REALLY autonomous team – and how autonomous can that be?

My Reading List @KanDDDinsky

Accelerate - Building and Scaling High Performing Technology Organizations

Accelerate by Nicole Forsgren, Gene Kim, Jez Humble

This book was referenced to in a lot of talks, mostly with the same phrase “hey folks, you have to read this!”


Domain Modeling Made Functional by Scott Wlaschin

The book was called as the only real currently published reference work for DDD for functional programming.

More books and videos to find on fsharpforfunandprofit


Functional Core, Imperative Shell by Gary Bernhard – a talk

The comments on this tweet are telling me, watching this video is long overdue …


37 Things One Architect Knows About IT Transformation by Gregor Hohpe

The name @ghohpe was also mentioned a few times at @KanDDDinsky


Domain Storytelling

A Collaborative Modeling Method

by Stefan Hofer and Henning Schwentner


Drive: The surprising truth about what motivates us by Daniel H Pink

There is also a TLDR-Version: a talk on vimeo


Sapiens – A Brief History of Humankind by Yuval Noah Harari

This book was recommended by @weltraumpirat after our short discussion about how broken or industry is. Thank you Tobias! I’m afraid, the book will give me no happy ending.

UPDATE:

It is not a take-away from KanDDDinsky but still a must have book (thank you Thomas): The Phoenix Project

About silos and hierarchies in software development

Disclaimer: this is NOT a rant about people. In most of the situations all devs I know want to deliver a good work. This is a rant about organisations imposing such structures calling themselves “an agile company”.

To give you some context: a digital product, sold online as a subscription. The application in my scenario is the usual admin portal to manage customers, get an overview of their payment situation, like balance, etc.
The application is built and maintained by a frontend team. The team is using the GraphQL API built and maintained by a backend team. Every team has a team lead and over all of them is at least one other lead. (Of course there are also a lot of other middle-management, etc.) 

Some time ago somebody must have decided to include in the API a field called “total” containing the balance of the customer so that it can be displayed in the portal. Obviously I cannot know what happened (I’m just a user of this product), but fact is, this total was implemented as an integer. Do you see the problem? We are talking about money displayed on the website, about a balance which is almost never an integer. This small mistake made the whole feature unusable.

Point 1: Devs implement technical requests instead of improving the product 
I don’t know if the developer who implemented this made an error by not thinking about what this total should represent or he/she simple didn’t had the experience in e-commerce but it is not my point. My point is that this person was obviously not involved in the discussion about this feature, why it is needed, what is the benefit. I can see it with my spiritual eyes how this feature became turned in code: The team lead, software lead (xyz lead) decided that this task has to be done. The task didn’t referred to the customer benefit, it stripped everything down to “include a new property called total having as value the sum of some other numbers”. I can see it because I had a lot of meetings like this. I delivered a string to the other team and this string was sometimes a URL and sometimes a name. But I did this in a company which didn’t called himself agile. 

Point 2: No chance for feedback, no chance for commitment for the product
Again: I wasn’t there as this feature was requested and built, I just can imagine that this is what it happened, but it really doesn’t matter. It is not about a special company or about special people but about the ability to deliver features or only just some lines of code sold as a product. Back to my “total”: this code was reviewed, integrated, deployed to development, then to some in-between stages and finally to production. NOBODY on this whole chain asked himself if the new field included in a public(!) API is implemented as it should. And I would bet that nobody from the frontend team was asked to review the API to see if their needs can be fulfilled.

Point 3: Power play, information hiding makes teams slow artificially (and kills innovation and the wish to commit themselves to the product they build) 
If this structure wouldn’t be built on power and position and titles then the first person observing the error could have talked to the very first developer in the team responsible for the feature to correct it. They could have changed it in a few minutes (this was the first person noticing the error ergo nobody was using it yet) and everybody would have been happy. But not if you have leads of every kind who must be involved in everything (because this is why they have their position, isn’t it?) Then somebody young and enthusiastic wanting to deliver a good product would create a JIRA ticket. In a week or two this ticket will be eventually discussed (by the leads of course)  and analyzed and it will eventually moved forward in the backlog – or not. It doesn’t matter anyway because the frontend team had a deadline and they had to solve their problem somehow.

Epilogue: the culture of “talk only to the leads” bans the cooperation between teams
at this moment I did finally understood the reason behind of another annoying behavior in the admin panel: the balance is calculated in the frontend and is equal with the sum of the shown items. I needed some time to discover this and was always wondering WTF… Now I can see what happened: the total in the API was not a total (only the integer part of the balance) and the ticket had to be finished so that somebody had this idea to create a total adding the values from the displayed items. Unfortunately this was a very short-sighted idea because it only works if you have less then 25 payments, the default number of items pro page. Or you can use the calculator app to add the single totals on every page…

All this is on so many levels wrong! For every involved person is a lose-lose situation. 

What do you think? Is this only me arguing for better “habitat for devs” or it is time that this kind of structures disappear.

Continuous Delivery Is a Journey – Part 3

In the first part I described why I think that continuous delivery is important for an adequate developer experience and in the second part I draw a rough picture about how we implemented it in a 5-teams big product development. Now it is time to discuss about the big impact – and the biggest benefits – regarding the development of the product itself.

Why do more and more companies, technical and non-technical people want to change towards an agile organisation? Maybe because the decision makers have understood that waterfall is rarely purposeful? There are a lot of motives – beside the rather wrong dumb one “because everybody else does this” – and I think there are two intertwined reasons for this: the speed at wich the digital world changes and the ever increasing complexity of the businesses we try to automate.

Companies/people have finally started to accept that they don’t know what their customer need. They have started to feel that the customer – also the market – has become more and more demanding regarding the quality of the solutions they get. This means that until Skynet is not born (sorry, I couldn’t resist 😁) we oftware developers, product owners, UX designers, etc. have to decide which solution would be the best to solve the problems in that specific business and we have to decide fast.

We have to deliver fast, get feedback fast, learn and adapt the consequences even faster. We have to do all this without down times, without breaking the existing features and – for most of us very important: without getting a heart attack every time we deploy to production.

IMHO These are the most important reasons why every product development team should invest in CI/CD.

The last missing piece of the jigsaw which allows us to deliver the features fast (respectively continuously) without disturbing anybody and without losing the control how and when features are released is called feature toggle.

feature toggle[1] (also feature switchfeature flagfeature flipperconditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.[2]

Wikipedia

The concept is really simple: one feature should be hidden until somebody, something decides that it is allowed to be used.

function useNewFeature(featureId) {
  const e = document.getElementById(featureId);
  const feat = config.getFeature(featureId);
  if(!feat.isEnabled)
    e.style.display = 'none';
  else
    e.style.display = 'block';
}

As you see, implementing feature toggles is really that simple. To adopt this concept will need some effort though:

  • Strive for only one toggle (one if) per feature. At the beginning it will be hard or even impossible to achieve this but it is a very important to define this as a middle-term goal. Having only one toggle per feature means the code is highly decoupled and very good structured.
  • Place this (main) toggle at the entry point (a button, a new form, a new API endpoint) the first interaction point with the user (person or machine) and in disabled state it should hide this entry point.
  • The enabled state of the toggle should lead to new services (in micro service world), new arguments or to new functions, all of them implementing the behavior for feature.enabled == true. This will lead to code duplication: yes, this is totally ok. I look at it as a very careful refactoring without changing the initial code. Implementing a new feature should not break or eliminate existing features. The tests too (all kind of them) should be organized similarly: in different files, duplicated versions, implemented for each state.
the different states of the toggle lead to clearly separated paths
  • Through the toggle you gain real freedom to make mistakes or just the wrong feature. At the same time you can always enable the feature and show it the product owner or the stake holders. This means a feedback loop is reduced to minimum.
  • This freedom has a price of course: after the feature is implemented, the feedback is collected, the decision for enabling the feature was made, after all this the source code must be cleaned up: all code for feature.enabled == false must be removed. This is why it is so important to create the different paths so that the risk of introducing a bug is virtually zero. We want to reduce workload not increase it.
  • Toggles don’t have to be temporary, business toggles (i.e. some premium features or “maintenance mode”) can stay forever. It is important to define beforehand what kind of toggle will be needed because the business toggles will be always part of your source code. The default value for this kind of toggles should be false.
  • The default value for the temporary toggles should be true and it should be deactivated on production, activated during the development.

One advice regarding the tooling: start small, with a config map in kubernetes, a database table, a json file somewhere will suffice. Later on new requirements will appear, like notifying the client UI when a toggle changes or allowing the product owner to decide, when a feature will be enabled. That will be the moment to think about next steps but for the moment it is more important to adopt this workflow, adopt this mindset of discipline to keep the source code clean, learn the techniques how to organize the code basis and ENJOY HAVING THE CONTROL over the impact of deployments, feature decisions, stress!

That’s it, I shared all of my thoughts regarding this subject: your journey of delivering continuously can start or continued 😉) now.

p.s. It is time for the one sentence about feature branches:
Feature toggles will never work with feature branches. Period. This means you have to decide: move to trunk based development or forget continuous development.

p.p.s. For the most languages exist feature toggle libraries, frameworks, even platforms, it is not necessary to write a new one. There are libraries for different complexities how the state can be calculated (like account state, persons, roles, time settings), just pick one.

Update:

As pointed out by Gergely on Twitter, on Martin Fowlers blog is a very good article describing extensively the different feature toggles and the power of this technique: Feature Toggles (aka Feature Flags)

Continuous Delivery Is a Journey – Part 2

After describing the context a little bit in part one it is time to look at the single steps the source code must pass in order to be delivered to the customers. (I’m sorry, but it is a quite long part 🙄)

The very first step starts with pushing all the current commits to master (if you work with feature branches you will probably encounter a new level of self-made complexity which I don’t intend to discuss about).

This action triggers the first checks and quality gates like licence validation and unit tests. If all checks are “green” the new version of the software will be saved to the repository manager and will be tagged as “latest”.

Successful push leads to a new version of my service/pkg/docker image

At this moment the continuous integration is done but the features are far from being used by any customer. I have a first feedback that I didn’t brake any tests or other basic constraints but that’s all because nobody can use the features, it is not deployed anywhere yet.

Well let Jenkins execute the next step: deployment to the Kubernetes environment called integration (a.k.a. development)

Continuous delivery to the first environment including the execution of first acceptance tests

At this moment all my changes are tested if they can work together with the currently integrated features developed by my colleagues and if the new features are evolving in the right direction (or are done and ready for acceptance).

This is not bad, but what if I want to be sure that I didn’t break the “platform”, what if I don’t want to disturb everybody else working on the same product because I made some mistakes – but I still want to be a human ergo be able to make mistakes 😉? This means that my behavioral and structure changes introduced by my commits should be tested before they land on integration.

These must be obviously a different set of tests. They should test if the whole system (composed by a few microservices each having it’s own data persistence, one or more UI-Apps) is working as expected, is resilient, is secure, etc.

At this point came the power of Kubernetes (k8s) and ksonnet as a huge help. Having k8s in place (and having the infrastructure as code) it is almost a no-brainer to set up a new environment to wire up the single systems in isolation and execute the system tests against it. This needs not only the k8s part as code but also the resources deployed and running on it. With ksonnet can be every service, deployment, ingress configuration (manages external access to the services in a cluster), or config map defined and configured as code. ksonnet not only supports to deploy to different environments but offers also the possibility to compare these. There are a lot of tools offering these possibilities, it is not only ksonnet. It is important to choose the fitting tool and is even more important to invest the time and effort to configure everything as code. This is a must-have in order to achieve a real automation and continuous deployment!

Good developer experience also means simplified continuous deployment

I will not include here any ksonnet examples, they have a great documentation. What is important to realize is the opportunity offered with such an approach: if everything is code then every change can be checked in. Everything checked in can be included observed/monitored, can trigger pipelines and/or events, can be reverted, can be commented – and the feature that helped us in our solution – can be tagged.

What happens in a continuous delivery? Some change in VCS triggers pipeline, the fitting version of the source code is loaded (either as source code like ksonett files or as package or docker image), the configured quality gate checks are verified (runtime environment is wired up, the specs with the referenced version are executed) and in case of success the artifact will be tagged as “thumbs up” and promoted to the next environment. We started do this manually to gather enough experience to automate the process.

Deploy manually the latest resources from integration to the review stage

If you have all this working you have finished the part with the biggest effort. Now it is time to automate and generalize the single steps. After the Continuous Integration the only changes will occur in the ksonnet repo (all other source code changes are done before), which is called here deployment repo.

Roll out, test and eventually roll back the system ready for review

I think, this post is already to long. The next part ( I think, it will be the last one) I would like to write about the last essential method, how to deploy to production, without annoying anybody (no secret here, this is why feature toggles were invented for 😉) and about some open questions or decisions what we encountered on our journey.

Every graphic is realized with plantuml thank you very much!

to be continued …

Continuous Delivery Is a Journey – Part 1

Last year my colleagues and I had the pleasure to spend 2 days with @hamvocke and @diegopeleteiro from @thoughtworks reviewing the platform we created. One essential part of our discussions was about CI/CD described like this: “think about continuous delivery as a journey. Imagine every git push lands on production. This is your target, this is what your CD should enable.”

Even if (or maybe because) this thought scared the hell out of us, it became our vision for the next few months because we saw great opportunities we would gain if we would be able to work this way.

Let me describe the context we were working:

  • Four business teams, 100% self-organized, owning 1…n Self-contained Systems, creating microservices running as Docker containers orchestrated with Kubernetes, hosted on AWS.
  • Boundaries (as in Domain Driven Design) defined based on the business we were in.
  • Each team having full ownership and full accountability for their part of business (represented by the SCS).
  • Basic heuristics regarding source code organisation: “share nothing” about business logic, “share everything” about utility functions (in OSS manner), about experiences you made, about the lessons you learned, about the errors you made.
  • Ensuring the code quality and the software quality is 100% team responsibility.
  • You build it, you run it.
  • One Platform-as-a-service team to enable this business teams to deliver features fast.
  • Gitlab as VS, Jenkins as build server, Nexus as package repository
  • Trunk-based development, no cherry picking, “roll fast forward” over roll back.
Teams
4 Business Teams + 1 Platform-as-a-Service Team = One Product

The architecture we have chosen was meant to support our organisation: independent teams able to work and deliver features fast and independently. They should decide themselves when and what they deploy. In order to achieve this we defined a few rules regarding inter-system communication. The most important ones are:

  • Event-driven Architecture: no synchronous communication only asynchronous via the Domain Event Bus
  • Non-blocking systems: every SCS must remain (reduced) functional even if all the other systems are down

We had only a couple of exceptions for these rules. As an example: authentication doesn’t really make sense in asynchronous manner.

Working in self-organized, independent teams is a really cool thing. But

with great power there must also come great responsibility

Uncle Ben to his nephew

Even though we set some guards regarding the overall architecture, the teams still had the ownership for the internal architecture decisions. As at the beginning we didn’t have continuous delivery in place every team was alone responsible for deploying his systems. Due the missing automation we were not only predestined to make human errors but we were also blind for the couplings between our services. (And we spent of course a lot of time doing stuff manually instead of letting Jenkins or Gitlab or some other tool doing this stuff for us 🤔 )

One example: every one of our systems had at least one React App and a GraphQL API as the main communication (read/write/subscribe) channel. One of the best things about GraphQL is the possibility to include the GraphQL-schema in the react App and this way having the API Interface definition included in the client application.

Is this not cool? It can be. Or it can lead to some very smelly behavior, to a real tight coupling and to inability to deploy the App and the API independently. And just like my friend @etiennedi says: “If two services cannot be deployed independently they aren’t two services!”

This was the first lesson we have learned on this journey: If you don’t have a CD pipeline you will most probably hide the flaws of your design.

One can surely ask “what is the problem with manual deployment?” – nothing, if you have only a few services to handle, if every one in your team knows about these couplings and dependencies and is able to execute the very precise deployment steps to minimize the downtime. But otherwise? This method doesn’t scale, this method is not very professional – and the biggest problem: this method ignores the possibilities offered by Kubernetes to safely roll out, take down, or scale everything what you have built.

Having an automated, standardized CD pipeline as described at the beginning – with the goal that every commit will land on production in a few seconds – having this in place forces everyone to think about the consequences of his/hers commit, to write backwards compatible code, to become a more considered developer.

to be continued …

Base your decisions on heuristics and not on gut feeling

As a developer we tackle very often problems which can be solved in various ways. It is ok not to know how to solve a problem. The real question is: how to decide which way to go 😯

In this situations often I rather have a feeling as a concrete logical reason for my decisions. This gut feelings are in most cases correct – but this fact doesn’t help me if I want to discuss it with others. It is not enough to KNOW something. If you are not a nerd from the 80’s (working alone in a den) it is crucial to be able to formulate and explain and share your thoughts leading to those decisions.

Finally I found a solution for this problem as I saw the session of Mathias Verraes about Design Heuristics held by the KanDDDinsky.

The biggest take away seems to be a no-brainer but it makes a huge difference: formulate and visualize your heuristics so that you can talk about concrete ideas instead of having to memorize everything what was said – or what you think it was said.

Using this methodology …

  • … unfounded opinions like “I think this is good and this is bad” won’t be discussed. The question is, why is something good or bad.
  • … loop backs to the same subjects are avoided (to something already discussed)
  • … the participants can see all criteria at once
  • … the participants can weight the heuristics and so to find the probably best solution

What is necessary for this method? Actually nothing but a whiteboard and/or some stickies. And maybe to take some time beforehand to list your design heuristics. These are mine (for now):

  • Is this a solution for my problem?
  • Do I have to build it or can I buy it?
  • Can it be rolled out without breaking neither my features as everything else out of my control?
  • Breaks any architecture rules, any clean code rules? Do I have a valid reason to break these rules?
  • Can lead to security leaks?
  • Is it over engineered?
  • Is it much to simple, does it feel like a short cut?
  • If it is a short cut, can be corrected in the near future without having to throw away everything? = Is my short cut implemented driving my code in the right direction, but in more shallow way?
  • Does this solution introduce a new stack = a new unknown complexity?
  • Is it fast enough (for now and the near future)?
  • … to be continued 🙂

The video for the talk can be found here. It was a workshop disguised as a talk (thanks again Mathias!!), we could have have continued for another hour if it weren’t for the cold beer waiting 🙂