I’d like to share my emotions and thoughts about working in the system administration and DevOps area of IT. I’ll try to be as clear as possible and will do my best to describe the cases I face as a sales & project manager in the team of system administrators and DevOps engineers, who work with clients from all over the globe.
What are the services you're paying for?
Real life analogy
"Why do I need system admins or DevOps? Where does my money go if everything is already working as I need it to?"
Rly..?) Let’s do some analogy stuff and imagine that you’re watching the play in a theater. The play is great, the cast is just outstanding and marvelous, the scenario is mind-blowing and really exciting, it’s so good that it keeps everybody on toes, the directors work is brilliant and the ending will tear your heart into pieces, and then get them together. Imagined it, didn’t you?
I bet, that you’ve painted a very bright picture in your head: huge theater with great illumination; beautiful decorations; very qualitative sound, including microphones, which you don’t need to hold in your hands, but the badass low profile micros with tiny headphones; invisible prompter place and huge and wisely designed hall, so the people sitting at the back can easily see and hear what’s happening on the stage without using binoculars and hearing aid :)
The only thing is that nobody considers that all that additional features are one of the main parts of the play. If we disable and exclude all of the above, the play will just become a bunch of famous people in costumes, throwing some well-rehearsed words and phrases to each other on the stage. The play is a complex term, which consists from actors’ work, scenario, etc. as well as from illumination, sound, prompter work, etc.
Actual DevOps role
So let’s get back to our IT world and try to compare the above statements with it. Actors are developers, which are working on creating/adjusting scenarios and the quality of the product. They develop scenarios together with the client/owner, which users will follow to interact with the system. They work on an understandable, clear, and easy-to-use design, so users can enjoy interacting with the system.
System administrators and DevOps engineers are working on the back: adjust the light, and turn it on / off when it’s needed / not needed, so end users can see the developers' work results. Also, they monitor the quality of playing sound and its delivery to every user, changing decorations on the back, so the end user can see other system features.
Ideal play is when all the gears of the mechanism are working smoothly and in a coordinated manner, so when something new appears on the stage - the light is automatically adjusted as well as the sound. And if anything went wrong - for example, sound doesn’t work, the techs need to catch the issue, which can be caused by:
- mic that's just broken
- central sound processor which burnt itself down, so everyone doesn’t receive any sound.
And they need to understand the reason of it. There might be several options, some of them are:
- it broke due to internal failure aka server got overloaded and it’s a bottleneck that nobody considered during initial discussion;
- actor aka developer just spilled a coffee on the mic during the break aka added non-working code, which has broken certain components functionality or the functionality the system in total.
Prevent failures instead of dealing with them!
Real life analogy
"I want a single fix and don’t need your support, monitoring, and stuff! My goal is to solve the problem ASAP, so it won’t appear anymore!"
Ha-ha..) Frankly speaking, the above statement is a real quote from one of the chats with one of the potential clients. Let’s get back to analogies.
So imagine that you came to the store to get yourself a brand-new car. Your friends and other buyers told you how awesome, super-fast and comfortable it is. The engine is something cosmic with around 500 horsepower, an internal computer with climate control and etc. But during the sell nobody will tell you, that this girl needs around 10 liters of machine oil, gasoline needs to be only top-quality one and for a comprehensive and correct operation, you need to take her to regular maintenance once a half year, right?
Only people who had previous experience with purchasing cars and felt all that untold ‘BUTs’ on their own skin would consider the above points and would think in advance about how to configure and get all of that working. Many clients in the modern IT market don’t consider such issues and face problems like:
- I bought a car, it’s awesome! I was riding dirty, but now there’s some strange sound under the hood.
- Well, do I really need to pass maintenance once a half-year? But why, if it’s working fine?
- I was doing some drift, skr-skr-skr-bang you know :) And now when I try to brake I feel that I don’t have a clutch with the road.
- Also I put mid-grade gasoline instead of premium one, is that ok?
- Why do I need to change the oil? I’ve already filled it, no need to change it!
How it usually happens?
Reading all of the above is funny, right? Now I’ll try to do some analogy stuff with our area of work: the client hired developers, who developed a great website with outstanding design, and showed how it’s all working on their LOCAL MACHINE, let’s call it some kind of test drive. The client approved everything, so they closed the contract and a happy client got what he paid for. He even got instructions on how to deploy his code to the server. WOW! :)
So happy client comes home thinking: “Now I’ll do some stuff on my brand new car!”, and deploys the site to the server following instructions. He/she launched it and everything is working fine. People began entering it, as he/she posted information about his/her new website on every social network profile he/she has. Several days passed by, and all was just fine, but suddenly… something is wrong. The site is slow, then users started complaining about randomly appearing errors and the site just stopped working correctly at all.
So what happened? In many various cases, we face a handoff process that didn't finish correctly, as the client didn’t have enough information about the required specs, metrics, hardware, etc. Nobody told him/her about them, just like no one told the happy car buyer that this baby needs 10 liters of machine oil each time it passes 1000 kilometers.
Actual maintenance role
Server maintenance is a complex term consisting from numerous actions that are done in order to prevent errors from happening and deal with them at their source before they become a real problem. So what features should you consider in order to run your business without any issues? Most of them are listed below. Each of them is very important and ignoring at least a single one of them can cause problems that might evolve into money, revenue, or even sensitive business data loss.
Server’s specs required for the correct operation of website/app
Testing, realized on a single local machine (don’t consider these words as a compilation for every development studio and freelance developer; recently we encounter fewer and fewer of such cases, but they’re still in place) with a single user in the DB, without any load went fine and without any errors. But in no case, we can say that it simulates the real system’s behavior! Sadly only a few understand it, but recently I began meeting more and more technically-savvy people. Or just people who teach themselves on their own mistakes :D
Logs and server/website monitoring
How can you catch what happened to your server/website, if it began to decelerate or just went down for no reason? And how we can prevent such a situation from happening in the future? You can find the answer to these questions can be found in one of our previous articles, where we're talking about correct monitoring system organization.
Backups - one of the most important components in any non- and commercial project
What will be your actions if:
- your server degraded?
- your database is down?
How will you restore normal online project operation? How will you rollback to a certain system’s state if you need that? The answer to all of the previous questions is the same: restore it from backup (-s).
Server overload during high traffic spikes of the customers/visitors
How are you going to take care of the server so it won't get overloaded at all? And how to implement this so the live customers, currently visiting your site, won’t even notice the change? Btw that’s just what our clients usually ask us to do: zero downtime.
Website / server / software updates
Timely integration of all available updates prevents hacks and intrusions in 90% of the cases. The biggest backdoor and you’ll laugh, the easiest one - it’s an open vulnerability or bottleneck in the previous software version. Why do you think developers of various products and tools we use constantly deliver new and new patches and modifications? Because hacking never sleeps...)
Now let’s get back to the beginning of this topic: “I want a single fix!”. Then I have a single question: why do you even need to bring it to the case when fixing is necessary? What’s the point of waiting and saving costs on the things you think are not important? And then in any way pay money for the “single fix” and bear losses because of the occurred issue? In fact, everything can be designed, as I like to say, with the BEST PRACTICE approach from scratch. The only work that’s unavoidable after successful delivery is support and maintenance of working solutions, preventing the problems from happening or dealing with them quickly, as the support team will catch and track them almost instantly.
Now let me explain regarding a single fix that can or cannot solve the problem once and for all... Unfortunately, many problems can’t be solved once and for all. They can’t even be called problems, they are more related to the system’s maintenance, as well as the maintenance of the car from time to time.
Solvation of such issues can be automated, but not for all possible cases. Our team accomplished implementing some cool and fully independent solutions, which allow the system to clean up its own mess, where it’s possible.
The misery caused by misunderstandings
How it usually is
"My developers didn’t do anything, they showed me the working solution, and I tested it! That’s definitely the server issue - deal with it!"
Their words are fundamentally wrong. The phenomenon you see here we call a virtual wall: when developers don’t understand what we want, and DevOps don’t understand what developers want. This is what happens:
I think there is no need to tell you about the above schema, as everything is pretty straightforward there. Let me provide you with an example of the workflow, which will eliminate:
- client’s dissatisfaction
- trust decrease to developers and DevOps
- fails and errors when deploying to prod
- client’s money loss
How it should be
Workflow, which you can see below proved itself on many of our projects, which were successfully delivered and deployed in live. The client’s satisfaction was above the edge and he was happy to collaborate more. The main point is that problems can’t always be on 1 side. I can compare the interaction between developers and system admins / DevOps engineers with a scale. It's in balance only when both tips contain the same amount of matter. Same here - idyll and balance in any project begin when the efforts and work of both teams are fully coordinated with each other.
In this blog post, I can provide you with more and more various cases, but I’d like to make a conclusion from all of the above: harmony is achieved when all mechanisms of the project are working together and helping each other to fix their and other people’s mistakes. Understanding client will be nothing but happy because of the speed and efficiency of problem-solving, as every side of the mechanism follows and consider wishes, recommendations, and instructions provided by another side (-s). And he will also be happy that developers team and DevOps team are happy to work together and deliver something outstanding and cool!
Also, I’d like to mention that sometimes the performance of the mechanism isn’t clear and obvious, as well as adjustments done with light/sound/decorations on the stage, but that doesn’t mean it doesn’t perform any work. It just means that the work is done in the background and almost quietly for the system and the client.
Let’s get along! Mutual understanding leads only to positive results!
More to come..)