PDA

View Full Version : Victory! The evil outage monster has been slain!


onethumb
Oct-11-2005, 12:56 PM
I'm going to tenatively declare victory.

That was pretty sucky. I've been feverishly working non-stop trying to stop the site from being dog slow since early this morning. 8 hours later, I think the fire-breathing outage monster is dead for good.

For those who like the gory details, here's what's up:

One of our database servers was getting old and crusty. We bought it more than two years ago, and it lasted far longer than I could have imagined. It was one of the very first AMD Opteron servers sold on the market and contained two Opteron 240 CPUs and 8GB of RAM.

The beast we got to replace is is a 4-way dual-core (8 CPUs!) Opteron 875 monster with 32GB of RAM. It's attached to faster disks, too.

On paper, that means at *least* a 4X speed increase, and very probably more, right? Well, yes, unless you (accidentally!) misconfigure the dang thing.

We've been running it in the cluster for a few weeks and it's been screaming along. After weeks of you and I using it, it seemed like all the kinks had been worked out. So during our standard maintenance window this morning, I swapped the two servers and watched it merrily speed the site up. A lot. Off I went to bed, dreaming of how happy our customers would be to find their beloved photos cruising across the net at blazing speeds.

Hours later, awakened by my awesome customer support reps, I discovered something was wrong. Horribly wrong! The shiny new beast was crawling along far slower than the crusty old box it had replaced.

I spent the next 8 hours wrangling with the server, the operating system, the database software, and everything else in between. I called up the reserves in the form of our Enterprise-level support subscriptions for all the various pieces of software we run. Top engineers all over the world were pulling their hair out alongside me, and no doubt, my poor customers.

I managed to get the site to behave enough to provide a painfully slow version of the site for a few hours while we worked to figure out what had gone wrong.

Finally, long story short, even the authors of said software were stumped. Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).

We're back, baby!

I'm so sorry we had such a lousy performance record this morning. I know how painful it was for you, your friends, and your customers.

Thanks for being so patient. You truly are the best customers in the world.

Silver lining moment: The site should be considerably faster than it's ever been, and will continue to improve.

Don

luke_church
Oct-11-2005, 01:04 PM
I'm going to tenatively declare victory.Good going. I don't envy you, I've been there in the past... Has the adrenaline gone yet? Well done for getting there anyhow.

You can sleep again now :):

Luke

PS. The outage warning message was very helpful. Good marks for communications there from me. Other people's milage may vary.

Gator
Oct-11-2005, 01:07 PM
Thanks so much for all the hard work! You are appreciated very much!!

lynnma
Oct-11-2005, 01:12 PM
Thanks dearie.. I knew it was slow but I thought it was me.. :1drink I'd much rather it be you being slow than me :rofl Awesome service and support :thumb

flyingdutchie
Oct-11-2005, 01:21 PM
Good work, Don.

I'm a software engineer/architect myself.
I remember, years ago, i was assigned the task to find a resource-leak in our OS/2 version of our product. It took me 6 weeks and a trip to the USA (i lived in the Netherlands then) to fix the damn thing.
The fix was adding 2 lines of code! No more!
In Holland we have the proverb: "An accident sits in a little corner". A very little one indeed :thumb
-- Anton.

Cindy
Oct-11-2005, 01:23 PM
YIPPIE!!! Thanks bunches & bunches! I'm sooooooooo glad we're back.
Time to call the school so the superintendant can see now (timing wasn't so great for being down but you all are fantastic and forgiven :)
It scared me this morning thinking I'd done something to mess up (an advance warning of maintance & possible problems would have been great... maybe next time - please - thank you).

Thanks,
Cindy

ginger_55
Oct-11-2005, 01:36 PM
Gosh, and I am busy right now, can't do my photos, gotta watch the d movie before it burns up.

Yeah, I got 2 gbs of ram installed yesterday, up from 700, and everthing slowed down on smugmug. I was ignoring the fact that I was getting slower stuff after spending so much on memory. SO GLAD, it was YOUR fault, not mine. Smile.

thanks,
ginger

Ric Grupe
Oct-11-2005, 01:52 PM
Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).
Don
:whip :whip :whip


Thanks, Don. :super

jfriend
Oct-11-2005, 01:56 PM
I'm going to tenatively declare victory.
I know from building a carrier/large enterprise-class online service that the hardest thing in software engineering and network operations is to be able to test things at scale in the lab before going online with real traffic.

We ended up investing nearly 50% of our ongoing engineering resources devoted to the service aspect of our business in the ability to test stuff at scale before customers saw it. On the one hand, it really slowed down our ability to develop new features for the service, but on the other hand, it's what our customers wanted us to do and it's paid off for us. It sounds like you're trying to do the right thing, but it's definitely hard. Good luck.

--John

Techman1
Oct-11-2005, 02:00 PM
Don,

Thanks to you and the Smugmug Team for all the hard work getting this back up and running again. It is much faster than eariler today and seems to be running as it should.

Thanks again! :clap

Fred

Barb
Oct-11-2005, 03:12 PM
Super support from a super site. Appreciate your hard work :)

ppuga
Oct-11-2005, 04:55 PM
Hello guys!

Well, my site is wrong till the Maintainace Window disapear :dunno
All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

:cry

PLEASE HELP!

Check it out:

jfriend
Oct-11-2005, 05:02 PM
Hello guys!

Well, my site is wrong till the Maintainace Window disapear :dunno
All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

:cry

PLEASE HELP!

Check it out:
It looks OK to me:

http://jfriend.smugmug.com/photos/39651057-O.jpg

Mike Lane
Oct-11-2005, 05:04 PM
It looks OK to me:

http://jfriend.smugmug.com/photos/39651057-M.jpg


:agree

flyingdutchie
Oct-11-2005, 05:06 PM
Hello guys!

Well, my site is wrong till the Maintainace Window disapear :dunno
All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

:cry

PLEASE HELP!

Check it out:I tried your site on Mozilla, Mozilla FireFox, IE6.0, Opera 8, Netscape 7 and up: Your site looks fine. It seems to be a problem only on your browser.
The image i see in your post, is that Safari 1.2 or IE5 for Mac? If it is IE5 for Mac, forget about this browser.... Even IE5 on Windows is no longer (officially) supported by Smugmug.
-- Anton.

ppuga
Oct-11-2005, 05:15 PM
:clap :clap :clap

Now my site its ok!

I'm using Safari 1.3.1 on my office computer. And my laptop I have the 2.0.1 and on both a few minutes ago my site was like my first post. But now it's ok!

:thumb

Thanks for your answers!

asd
Oct-11-2005, 05:45 PM
I missed the outage, but I'm loving how lightning fast my site is now - thanks for the speedup!! :bow

Mac Write
Oct-11-2005, 07:50 PM
I concur, I had the exact same problem. dumbed cache and now it's fine.

gus
Oct-11-2005, 09:38 PM
And here i was thinking you blokes just used a headless fedora 4 box ..

luke_church
Oct-11-2005, 10:46 PM
Hello guys!

Well, my site is wrong till the Maintainace Window disapear :dunno

Check it out:
This happened to me immediatly after the end of maintence. In Interner Explorer Ctrl+R forces a reload. I did that suspecting a daft problem and it went away.

Try refresh, if that doesn't work, purge your temporary cache and try again. It'll probably go away.

Cheers,

Luke

gus
Oct-12-2005, 12:00 AM
Actually ive noticed its a lot faster over here in the never never also. :thumb

Andy
Oct-12-2005, 02:31 AM
Actually ive noticed its a lot faster over here in the never never also. :thumb

gus, isn't it amazing how fast we can make those hamsters run??

gus
Oct-12-2005, 02:35 AM
gus, isn't it amazing how fast we can make those hamsters run??
I dont know how you do it with such small hampsters...let me know when your ready & i will send some real ones over (http://astron.berkeley.edu/wombat/wombat2.jpg).

Andy
Oct-12-2005, 03:14 AM
I dont know how you do it with such small hampsters...let me know when your ready & i will send some real ones over (http://astron.berkeley.edu/wombat/wombat2.jpg).


y'see, gus - even you can help onethumb in building a yet faster smugmug :lol3

kwalsh
Oct-12-2005, 07:51 AM
I concur that the site is now wicked fast. Keep up the good work!

Ken

costantinidis
Oct-12-2005, 08:02 AM
Hello guys!

Well, my site is wrong till the Maintainace Window disapear :dunno
All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc. I had the same problem. I think my browser had a cached some corrupt style sheets or something. Restarting the browser made everything appear just fine.

RichS
Oct-12-2005, 08:55 AM
Everything had cleared up late yesterday, and things were very fast.

Now I'm back with the same problem, small thumbnails for each forum lining up on the left one-eigth of the page.

I've tried multiple browsers, cleared the caches, rebooted, etc.

richs.smugmug.com

Andy
Oct-12-2005, 09:15 AM
Everything had cleared up late yesterday, and things were very fast.

Now I'm back with the same problem, small thumbnails for each forum lining up on the left one-eigth of the page.

I've tried multiple browsers, cleared the caches, rebooted, etc.

richs.smugmug.com

hi there rich -- please email this to help@smugmug.com thanks very much. sorry for your troubles... very strange as your site is showing up normally for me (safari, firefox)

RichS
Oct-12-2005, 09:39 AM
Hmmm - I didn't change anything and now it's back to normal.

As a sometimes-software testing engineer, I hate non-reproducible problems that resolve without a known fix.

As a user, I'm happy again....:):

jamescalder
Oct-12-2005, 09:59 AM
Hmmm - I didn't change anything and now it's back to normal.

As a sometimes-software testing engineer, I hate non-reproducible problems that resolve without a known fix.

As a user, I'm happy again....:):did any IP addresses change as a result of the work done in the past few days? if so, then that might explain the occasional regurgitation of the dud links, due to a DNS server somewhere not updating... or possibly a Proxy server problem if you're on a LAN? of course i don't know what kind of clustering may be happening on the SM servers, so that's a whole nother level of interference that could explain it.

anyone who really understands these things wanna back me up here... or alternatively expose me for the pseudo-techie i really am and shoot my theory down in nice, toasty, orange flames?

:hide

j

galla47
Oct-12-2005, 10:03 AM
I just emailed help about this (I was having the same problem).

The trick they gave me was to hold shift and then press reload. This fixes a corrupt stylesheet.

It worked for me!!!

PS... Just noticed this is my 15th post, and Geico wants to remind you that a 15 minnute call can save you a bunch of money on your car insurance.

tmanchester
Oct-13-2005, 08:41 PM
Down again.

I see an awful lot of outages at smugmug for it to be a paid site. At my current rate, I will pay over 4 figures in US dollars in commissions over the next year.

I would expect more from a varsity letterman.

Cindy
Oct-13-2005, 08:45 PM
Down again.

I see an awful lot of outages at smugmug for it to be a paid site. At my current rate, I will pay over 4 figures in US dollars in commissions over the next year.

I would expect more from a varsity letterman.
Mine was just down briefly also :dunno ... about 5 minutes.

Back again now though :):

Cindy