About
Here's the time I spout my opinions about DevOps!
Background
It has largely been accepted that DevOps was first coined in 2008 by Patrick Debois and Andrew Schaffer, in recognition of how things had become highly segregated due to ITSM (Information Technology Service Management) policies and activities imbedded in IT at the time. These policies largely isolated IT operations teams from development teams, by creating in effect force fields between the developers and the operations teams. The developers could not really see how their applications acted in production, and the operations teams could not see what was being developed.
The protection mechanism was to create a change control process, the was supposed to be fully monitored by experienced engineers, and the change creators put in an honest effort to explain what the changes are and how to roll back the changes in case of issues. In truth, the change creators are loath to put too much detail in either the change itself or how to roll back the changes. They may not fully understand the change, or really don't have any rollback choice other than falling forward (fixing problems as they deploy).
Opinion
DevOps has been such an over-utilized acronym I don't know where to begin. The spirit of DevOps is simple though: Give developers tools to better understand how their application behaves in the wild, and give the operations teams better visibility into the development process. Sounds easy, but it's not, it's really not easy at all. Unfortunately, there have been attempts to oversimplify this philosophy by those who think it is easier than it is. There are few that really stand out to me:
- Leadership mandates to become more DevOps.
- Using ego or vanity metrics to measure how DevOps the team (see the first point) is.
- Creating a DevOps team.
- Hiring DevOps engineers.
Those are the top four things that drive me nuts about DevOps. These four points are from people who don't see DevOps beyond a cool-sounding acronym and a vague idea of what it actually means.
Mandates and Metrics
The fastest way to have a philosophy change fail at a company is to have the high-level management team declare that the entire company needs to embrace a new philosophy: I think most of us in the business have heard C-level management spout something about "We need to be more DevOps", but not being specific about what it means. It comes to mind the pirate adage "The beatings will continue until morale improves" as all the individual teams hears is "great, more work for us with the same pay". This largely depends on the size on the company as the smaller the company the less this holds true, but I am talking about larger enterprises. Larger enterprises tend to behave like oil tankers: They can turn in narrow spaces, but slowly and with an abundance of caution!
When leadership is asked to define DevOps, they normally dodge the question by recommending that each team follow their own path to the promised kingdom, but, they tend to put out metrics for all the teams to measure themselves with. These metrics are bubbled up to the leadership staff to define how well the departments are embracing the new philosophy, in this case, DevOps. Metrics are important to track improvements, but there are some metrics that I consider ego metrics:
- Number of code commits per day: Ugh, this one is awful. This metrics signifies absolutely nothing. Any team can run up the score on this one and say "yup, we're more devops now" without really improving.
- Customer tickets: Again, this is one that can be easily be 'gamed' by a team and doesn't really get at customer satisfaction or performance.
- New lines of code: I don't have the words.
- Lead time: This isn't a bad metric, but if wielded incorrect it can set pretty awful behaviors within a team: The team could be just in a rush to deploy code, to hell with the production results.
Some metrics which I feel provide a better picture of how a team is performing include:
- Percentage of failed deployments: This can show the impact of bad deployments, either through bad code or rushed work. It begins to tie in development work with operational impact.
- Mean Time To Recover: This can show the response time of both the operational teams and development teams to react to and recover from problems.
DevOps Teams and Engineers
Calling a team "DevOps", or trying to hire "DevOps" engineers has always irritated me, because you still need development teams, and you still need operational teams, regardless of what the corporate edict is. Each of these teams need to be manned by developers or system administrators, the more so in bigger companies: The system administrator will largely have authority over vast fleets of infrastructure, and not assigned to just a few development teams.
The recommendation here is to continue to hire what you need, but when you are staffing the teams from the ground up, ensure that your developers have some operations skills, or have an interest in operations. This could include skills such as:
- Actual experience using log aggregation tools (Logstash, Splunk).
- Actual experience using configuration management tools (Ansible, Chef, Puppet).
- Habitually writing tests before code.
Same would go for any or all operations staff you might need. Developer-minded operational skills should include things like:
- Actual experience with code management tools (GitHub, GitLab, Bitbucket).
- Any scripting, coding, or configuration management skills. Can they read code?
- Actual experience with code deployment & CI/CD tools.
Instead of throwing together your developer and operations teams, and calling them DevOps teams, it would make more sense to have monthly engineer swaps, where developers spend a month away from feature development and just work with the operations team: Yes, this would mean they would be on call. The same goes for system administrators: Get them involved with developer sprints. Have them tackle less critical code work, and teach them how to write and deploy tests. They can capture possible conflicts long before they reach your test or stage environments.
Summary
Honestly, I have no problem with the philosophy of DevOps: Anything that can reduce outage count, duration, and impact is just fine. My particular concerns is when leaders use a phrase like DevOps, point to it, and say "This will save all our problems, make it happen!" and the teams proceed to do what they can to be more DevOps-y without really understand what it means, how to measure it and how to improve. The real point is you don't need a catch acronym to improve: You just need imagination and support from your leadership stack.