Quantcast
Channel: Spiceworks Community
Viewing all articles
Browse latest Browse all 5334

Black Thursday: 24 hours, 2 outages and countless cups of coffee

$
0
0

This is the 292nd article in the Spotlight on IT series. If you'd be interested in writing an article on the subject of backup, security, storage, virtualization, mobile, networking, wireless, cloud and SaaS, or MSPs for the series PM Eric to get started.

Our days are often unpredictable in IT. You might plan for a productive day of project implementations or learning about new tech, or maybe you’re expecting a boring day of simple monitoring with a few classic user tickets. But then, without warning, it can hit you: Murphy attack! Sometimes it can be a mild attack, but on Black Thursday, the attack was heavy and unscrupulous. 

When you hear the word "black" associated with a day, you may think of the hurried fun of holiday retail sales, but that’s not what I mean. Last month, I experienced Black Thursday, or Murphy Thursday if you prefer. It was my worst day at work — a day I’ll remember for a lifetime.

First, our ERP went down. Our most precious asset, a recently virtualized SAP server, which despite all the challenges and situations that come with virtualization, was created and configured from scratch. After a year of testing and troubleshooting, we could finally say that the implementation was a success. 

On Black Thursday, we got our first failure — the famous removing snapshot process of the VM’s backup caused the server to lose connection. It wasn’t until 30 minutes that the process finished and then another 30 minutes to start all SAP services.

You know the feeling — all the people breathing down your neck as you look for the solution. In my case, this was waiting for the snapshot and proceeding to reboot the VM and services. Then, breathing slowly and answereing a few questions about why we were using a virtualizated solution, who sold it as an ice-cold soda in the desert, and then, since such a simple backup process of the VM failed, questions about its reliability. “Can we rely on something so sensitive?”

This is where I appealed to divine wisdom to respond in an appropriate, respectful manner while at the same time with offering up something with meaning, substance and also based on best practices.

It’s hard enough when your ERP, the heart of the company, goes down. But, Murphy was far from done with me. 

When it seemed that the storm had passed and had only resulted in an hour under fire, we realized that the EDI server, the other major app server and our second most important server, was down — affecting hundreds of our customers. Yes, this is the same day.

And this wasn’t just for an hour. We spent eight hours trying to perform workarounds and troubleshooting in a physical and old server, then we decided to start from scratch and install EDI in the virtual environment. 

After another 10 hours, we were finalizing the details of the new installation in the VM environment — the same environment that at the beginning of the day brought us headache, inconvenience, and disappointment. Now it gave us the opportunity to quickly configure one of the most important servers in our company.

Lessons learned

  1. Never underestimate a process related to the production system, the heart of your company. It’s better to work these situations after hours or on the weekend, under a maintenance window. 

  2. Never, ever underestimate the importance of keeping backups daily and running drills to test the restores of key services.
  3. In times of disaster, it’s good to seek help from other peers; in the multitude of counselors there is wisdom.  
  4. Our profession is sometimes very similar to the role of a doctor — well, with a difference that doctors have to deal with people about to die and we deal with equipment about to die. Disasters sometimes result in more than 28 hours of work, with little time to take a breath.

That was my Black Thursday: two big server outages and almost 24 hours of working with the help of plenty of black coffee, pizza, music (the Rocky soundtrack) and an IT peer with a good sense of humor. The worst day of work became one of the most interesting days of work, and it’s a day that I will remember for a lifetime.

--

Got a dark day of your own to share? Chime in in the comments below!


Viewing all articles
Browse latest Browse all 5334

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>