I’d like to share my emotions and thoughts about working in the system administration and DevOps area of IT. I’ll try to be as clear as possible and will do my best to describe the cases I face as a sales & project manager in the team of sysadmins and DevOps engineers, who work with clients from all over the globe.
“Why do I need you? What am I paying for if everything is already working as I need it to?“
Rly..?) Let’s do some analogy stuff and imagine that you’re watching the play in a theater. The play is great, cast is just outstanding and marvelous, scenario is mind-blowing and really exciting, it’s so good that it keeps everybody on toes, the directors work is brilliant and the ending will tear your heart in pieces and then get them together. Imagined it, didn’t you?
I bet, that you’ve painted very bright picture in your head: huge theater with great illumination; beautiful decorations; very qualitative sound, including microphones, which don’t need to be held in hands, but the badass low profile micros with tiny headphones; invisible prompter place and huge and wisely designed hall, so the people sitting at the back can easily see and hear what’s happening on the stage without using binoculars and hearing aid 🙂
The only thing is that nobody considers that all that additional features are one of the main part of the play. If we disable and exclude all of the above, the play will just become a bunch of famous people in costumes, throwing some well rehearsed words and phrases to each other on the stage. The play is a complex term, which consists from actors’ work, scenario, etc. as well as from illumination, sound, prompter work, etc.
So let’s get back to our IT world and try to compare above statements with it. Actors are developers, which are working on creating/adjusting scenario and quality of the product. They develop scenarios together with the client/owner, which users will follow to interact with the system. They work on understandable, clear and easy-to-use design, so user can enjoy interacting with the system.
System administrators and DevOps are working on the back: adjust the light, turn it on / off when it’s needed / not needed, so the developer’s work results can be seen by the end users. Also they monitor the quality of playing sound and its delivery to every user, change decorations on the back, so the end user can see other system features.
Ideal play is when all the gears of the mechanism are working smoothly and in coordinated manner, so when something new appears on the stage – the light is automatically being adjusted as well as the sound. And if anything went wrong – for example sound doesn’t work, the techs need to catch the issue: either mic is broken or central sound processor is burnt, so everyone doesn’t receive any sound. And understand the reason of it: it got broken due to internal failure aka server got overloaded and it’s a bottleneck, that wasn’t taken into account during initial discussion or actor aka developer just spilled a coffee on the mic during the break aka added non-working code, which has broken certain components functionality or the functionality the system in total.
“I want single fix! I don’t need your support, monitoring and stuff! I want to solve the problem ASAP, so it won’t appear anymore!“
Ha-ha..) Frankly speaking the above statement is real quote from one of the chats with one of the potential clients. Let’s get back to analogies:
So imagine that you came to the store to get yourself a brand new car. You’ve been told how awesome, super-fast and comfortable it is. The engine is something cosmic with around 500 horsepower, internal computer with climate control and etc. But during the sell nobody will tell you, that this girl needs around 10 litres of machine oil, gasoline needs to be only top-quality one and for comprehensive and correct operation it needs to be maintained once a half year, right?
Only people who had previous experience with purchasing the cars and felt all that untold ‘BUTs’ on their own skin would consider the above points and would think in advance about how to configure and get all of that working. Many clients in modern IT market don’t consider such issues and face problems like:
– I bought a car, it’s awesome! I was riding dirty, but now there’s some strange sound under the hood.
– Well do I really need to pass maintenance once a half-year? But why, if it’s working fine?
– I was doing some drift, skr-skr-skr-bang you know 🙂 And now when I try to brake I feel that I don’t have clutch with the road.
– Also I put mid-grade gasoline instead of premium one, is that ok?
– Why do I need to change the oil? I’ve already filled it, no need to change!
Reading all of the above is funny, right? Now I’ll try to do some analogy stuff with our area of work: the client hired developers, who developed a great website with outstanding design, showed how it’s all working on their LOCAL MACHINE, let’s call it some kind of test drive.The client approved everything, so they closed the contract and happy client got what he paid for. He even got the instructions of how to deploy his code to the server. WOW! 🙂
So happy client comes home thinking: “Now I’ll do some stuff on my brand new car!”, deploys the site to the server following instructions. The site is launched and working, people began entering on it, as he posted the information about his new website on every social network profile he has. Several days passed by, all was just fine, but suddenly… something is wrong. Site is slow, then users started complaining about randomly appearing errors and the site just stopped working correctly at all. What happened?
In many various cases we face handoff process wasn’t finished correctly, as the client didn’t had enough information about:
1. server’s specs required for correct operation of website/app
Testing, realized on single local machine (don’t consider this words as compilation for every development studio and freelance developer; recently we encounter less and less of such cases, but they’re still in place) with single user in the DB, without any load went fine and without any errors. But in no case, we can say that it simulates real system’s behaviour! Sadly this is understood only by a few, but recently I meet more and more technically-savvy people. Or just people who are taught on their own mistakes 😀
2. maintaining of the server (-s) and website along with software for correct operation
● 2.1 where will the logs be kept? Which logs are important and which are not? What about website or server monitoring?
How can you catch what happened to your server / website, if it began to decelerate or just went down for no reason? And how we can prevent such situation from happening in future? The answer on these question can be found in our previous article, btw.
● 2.2 what about backups – one of the most important components in any non- and commercial project?
What will be your actions if you server is fully degraded? What will you do if your DB is down? How will you restore you normal online business / project operation? How will you rollback to certain system’s state if you need that? The answer on all of the previous questions is the same: restore it from backup(-s).
● 2.3 and what happens if server gets overloaded during high traffic spike of the customers / visitors?
How are you going to take care of the server in order not to get it overloaded at all? And how to implement this so the live customers, currently visiting your site, won’t even notice the change? Btw that’s just what our clients usually ask us to do: zero downtime.
● 2.4 what about website / server / software updates?
Timely integration of all available updates prevents hacks and intrusions in 90% of the cases. The biggest backdoor, and you’ll laugh, the most easy one – it’s an open vulnerability or bottleneck in previous software version. Why do you think developers of various products and tools we use constantly deliver new and new patches and modifications? Because hacking never sleeps…)
Now let’s get back to the beginning of this topic: “I want single fix!” Then I have single question: why do you even need to bring it to the case when fix is needed? What’s the point of waiting and saving costs on the things you think are not important? And then in any way pay money for the “single fix” and bear losses because of the occurred issue? In fact everything can be designed, as I like to say, with BEST PRACTICE approach from scratch. The only work that’s needed after successful delivery is support and maintenance of working solution, preventing the problems from happening or dealing with them quickly, as they will be caught and tracked almost instantly.
“…I want to solve the problem ASAP, so it won’t appear anymore!” – unfortunately many problems can’t solved once and for all. They can’t even be called problems, they are more related to system’s maintenance, as well as maintenance of the car from time to time.
Solvation of such issues can be automated, but not for all possible cases. Our team accomplished to implement some cool and fully independent solutions, which allow the system to clean up its own mess, where it’s possible.
“My developers didn’t do anything, they showed me the working solution, I tested it! That’s definitely the server issue – deal with it!“
There words are fundamentally wrong. The phenomenon you see here we call a virtual wall: when developers don’t understand what we want, and we don’t understand what developers want. This is what happens:
– client’s dissatisfaction
– trust decrease to developers and sysadmins
– fails and errors when deploying to prod
– client’s money loss
Workflow, which you can see above proved itself on many our projects, which were successfully delivered and deployed in live. The client’s satisfaction was above the edge and he was happy to collaborate more. The main point is that problems can’t always be on 1 side. Interaction between developers and system admins / DevOps engineers can be compared with scale which is in balance only when both tips contain same amount of matter. Same here – idyll and balance in any project begin when efforts and work of both teams is fully coordinated with each other.
In this blog post I can provide you with more and more various cases, but I’d like to make a conclusion from all of the above: harmony is achieved when all mechanisms of the project are working together and help each other to fix their and other people’s mistakes. Understanding client will be nothing but happy because of the speed and efficiency of problem-solving, as every side of the mechanism follow and consider wishes, recommendations and instructions provided by other side(-s).
Also I’d like to mention that sometimes performance of the mechanism isn’t clear and obvious, as well as adjustments done with light / sound / decorations on the stage, but that doesn’t mean it doesn’t perform any work. It just means that the work is done in the background and almost quietly for the system and the client.
Let’s get along! Mutual understanding leads only to positive results!
More to come..)
Live long and prosper! *</:)
(c) Nick Perzhanovskiy, 2018