patching...
Welcome back, Patch Blogger!
Local Voices
Professional Geek

GEEKNOTE: Another Saturday Bites the Dust

GEEKNOTE:  I REALLY need to dust off my old Geeknotes regularly to remind myself of the mantra: "Check the easy stuff first."

I popped in Saturday morning to kick a couple of weekend diagnostics to the next steps on a customer machine on one of our tech benches and to do some preliminary testing on a planned hardware upgrade for one of our servers. 

As I'm finishing those projects, I noted that one of our mail servers was not responding. The drive on the this particular mail server was beyond ancient, so I figured I'd image the files onto a new drive and call it a day.  Nope. The server continued to become sluggish and within a few seconds of startup and then would crash and do a memory dump. Coming close to guessing the true cause of the problem, I replaced the network card. Still no joy, so I emailed a copy of one of the crash dumps to an uber-geek friend in California and went to get the mail. 

When I got back from the post office, I noticed that my email hadn't gone out. I checked and, sure enough, a SECOND server was now going stupid and rebooting itself every few minutes. Hmmm. Multiple calls to California and hours later, both my geek friend and I were convinced that we either had one or more severely corrupted mailbox files and/or something was totally trashed in the DNS system on both machines.

I got the critical stuff from my backups of the two sick servers moved to a still functioning server, set that server to respond to the IP addresses used by the two sick machines and went home for supper, my brain totally fried from a full day of testing and emergency recovery work. After supper, I tried to do a little more rescue work on the functioning server and noticed that I couldn't touch the IP addresses I'd moved from the sick machines from home, even though I'd been able to do so from the office. Hmmm. At this point Carolyn suggested that things would be clearer in the morning.

Early Sunday morning, before dawn, I headed back in to the office to play a hunch related to why I could touch the IP's from inside the office, but not from home and I planned to spend the whole morning migrating the rest of the non-essential stuff off the two sick servers if my hunch didn't work out. I removed the extra IP addresses from the running server and then power cycled our Roadrunner cable modem. I then powered up one of the two sick servers.  It ran fine for several minutes, so I decided to try my luck and power up the second one.  Sure enough, it came up happy as well. 

I spent the balance of Sunday morning undoing the emergency changes I'd made Saturday.

Actual time to solve the REAL problem?  Five minutes Time I spent between Saturday and Sunday chasing the wrong problems? More like ten hours.

Given that we recently had the cable modem replaced because the old one went stupid from time to time, I'm not real impressed with the cable modems Brighthouse uses. The old modem would just go stupid and NOTHING would work. For the new one to kill routing to two of our IP's and leave all the other IP addresses, including the IP address between them functioning is just plain strange.

In any event, I'm going to print out that saying "Check the easy stuff first" in big letters and frame it on the wall over my monitors. Even Geeks need to be reminded once in a while.

Hopefully you had a more enjoyable weekend.

Feel free to drop me a note or give me a call if you have any questions about your computer.

Rob Marlowe, Senior Geek, Gulfcoast Networking, Inc.
http://www.gulfcoastnetworking.com

(Rob also serves as deputy mayor of the City of New Port Richey. Opinions expressed here are his own and do not necessarily represent the position of the city.)

Rob Marlowe

5:42 am on Monday, November 12, 2012

Arrgh... Monday am update... I'm at the office at 5:15am and just had Brighthouse technical support reset their router to fix the problem again.

Reply

Marilynn deChant

11:33 am on Monday, November 12, 2012

Oh my gosh...what a story. It was so nice to see you at the EcoMart on Sat. morning, Rob...but so sad to hear what happened...talk about spoiling a day...a weekend!

Reading what you had to do to fix the problem kinda hurts my head, but more power to ya Rob! The Geek wins the day after all! Yay!

Marilynn

Reply

Michael D.

11:38 am on Monday, November 12, 2012

Rob, we need to be reminded all the time. I spent all of Friday reviewing code trying to fix a system bug. Just to notice that it was a issue with the input template, the correct information was going in, but not actually flowing through. After reviewing thousands of lines of code, I just put an asciioutput line at the beginning of the process to see what was being entered. It was because someone had shut down the services to feed the infomation. A right click and run later the issue was fixed.

Reply

Rob Marlowe

1:02 pm on Monday, November 12, 2012

I think we've just nailed the actual problem... At this point, it appears the two servers were under a Denial Of Service (DOS) attack. I believe we have identified the sources being used for the attack and we have them blocked.

Reply

Rob Marlowe

4:27 pm on Monday, November 12, 2012

The DOS attack is originating in California and the Chinese are mounting a dictionary attack. It must be Monday.

Reply
Comment_arrow

Michael D.

4:55 pm on Monday, November 12, 2012

Rob,
Try being IT for a DoD contractor....

Rob Marlowe

7:36 am on Tuesday, November 13, 2012

Michael,
I bet!
Part of my testing to eliminate the DOS attack was to move the domain I suspect is being attacked to a new server that doesn't have ANYTHING running on it. It's been running for a week, but without any public duties. Apparently, just the fact that it is online was enough for the Chinese to find it. It only took them 11 hours from the point I put it online until it was under attack.

Reply

Rob Marlowe

8:16 am on Tuesday, November 13, 2012

Remind me again why anyone would expose a Windows machine to the Internet...

[I] Nov 5 13:13:06 IPAD Startup - Internet Protocol Adapter (IOA-IPAD 8.21y) build 6923
[W] Nov 5 22:39:31 [220.135.159.90:57054]POP3 Login: [root] - User not found

Later, the dictionary attack migrated from Taiwan to Mainland China.

Reply
Comment_arrow

Michael D.

8:46 am on Tuesday, November 13, 2012

Why would you directly expose any machine.

Comment_arrow

Rob Marlowe

3:19 am on Wednesday, November 14, 2012

Internet servers have to be exposed, but they also need to be hardened against potential attacks.

Leave a comment