Options

1/15/07 Database Reset: Http/1.1 Service Unavailable

24

Comments

  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited January 15, 2007
    my concerns stated above are still valid and should be addressed.

    Nicholas
    We have acknowledged them, every time you've made them. (we even changed the © date after your last posting a while back :D

    Thanks again, Nick. wave.gif
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited January 15, 2007
    devbobo wrote:
    you sure about that...i think Andy might bet to differ lol3.giflol3.giflol3.gif
    thumb.gif
  • Options
    nickphoto123nickphoto123 Registered Users Posts: 302 Major grins
    edited January 15, 2007
    Problem remains unsolved
    Andy wrote:
    We have acknowledged them, every time you've made them. (we even changed the © date after your last posting a while back :D

    Thanks again, Nick. wave.gif



    The other issues I raised are even more important than the date issue.

    Your acknowledgment does not solve the problem.

    The real problem is that you have decided to cover the concerns of your larger customer base, the personal photographer ( Non-Pro) without a thought for your Pro customers.
    The issues I raised as constructive suggestions were not even thought about until I raised them.

    What you have to do is determine how to solve these issues to the satisfaction of your Pro and Non-Pro customers.

    And so far you have not accomplished this solution.

    Thank you, Nicholas
  • Options
    devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited January 15, 2007
    The real problem is that you have decided to cover the concerns of your larger customer base, the personal photographer ( Non-Pro) without a thought for your Pro customers.
    The issues I raised as constructive suggestions were not even thought about until I raised them.

    :bigbs

    come on Nick, you can't be serious. I seem to remember only a few months ago a whole load of stuff released for pros...backprinting, custom watermarks, etc.

    Grow up...
    David Parry
    SmugMug API Developer
    My Photos
  • Options
    W.W. WebsterW.W. Webster Registered Users Posts: 3,204 Major grins
    edited January 15, 2007
    my concerns stated above are still valid and should be addressed.
    From my experience, the frequency and typical period of SmugMug site outages seem to be well within industry norms, but I don't understand the implications from the point of view of a working pro.

    Out of curiosity, based on today's event, would you be willing to share with us:

    a) your average number of sales multiplied by the average net sale proceeds, calculated for the relevant day and time of day (i.e. the estimated total gross profit from sales, not the gross sales value), that you might have missed during the outage - i.e. your direct financial 'cost', assuming the sales are completely lost

    b) the average number of customer visits to your site, also calculated for the relevant day and time of day, that you might have missed during the outage - i.e. the impact for your customer goodwill

    c) your site URL.

    Thanks
  • Options
    ZanottiZanotti Registered Users Posts: 1,411 Major grins
    edited January 15, 2007
    Zanotti wrote:
    Wait, its not working from Florida and Andy's not working and IN Florida....conincidence? I think not!

    It is the Andy effect. Andy, you back in NYC and now its working?
    It is the purpose of life that each of us strives to become actually what he is potentially. We should be obsessed with stretching towards that goal through the world we inhabit.
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited January 15, 2007

    And so far you have not accomplished this solution.

    Thank you, Nicholas
    Hi Nick,
    I'm sorry you are unhappy with our service. I wish I could do more for you. We have put a tremendous amoung of time, energy, resources, people, and cash into our pro business. I deal with our pros on a daily basis, these are pros who have their livelihoods depend on SmugMug for our service, our site, and our people. We take everyone's input very seriously, and we apply our development resources and energy to all facets of our business.

    I hope that we can satisfy you, too. Apologies again for the few minutes downtime tonight.
  • Options
    devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited January 15, 2007
    Zanotti wrote:
    It is the Andy effect. Andy, you back in NYC and now its working?

    No he's still in FL :D
    David Parry
    SmugMug API Developer
    My Photos
  • Options
    nickphoto123nickphoto123 Registered Users Posts: 302 Major grins
    edited January 15, 2007
    I see the problem...
    Andy wrote:
    Hi Nick,
    I'm sorry you are unhappy with our service. I wish I could do more for you. We have put a tremendous amoung of time, energy, resources, people, and cash into our pro business. I deal with our pros on a daily basis, these are pros who have their livelihoods depend on SmugMug for our service, our site, and our people. We take everyone's input very seriously, and we apply our development resources and energy to all facets of our business.

    I hope that we can satisfy you, too. Apologies again for the few minutes downtime tonight.

    I mentioned nothing with your downtime.

    It is your posted notice when you are down that is a problem.

    If you want to satisfy me then change your outage notice and don't mention the words 'Lost data' Or 'Data loss'.
    Make reference to the site the visitor just typed into their browzer.
    Mention a request to re-visit soon or in the near future.

    Put some of your tremendous energy into the problem of this discussion,your outage notice.

    Thank you, Nicholas
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited January 15, 2007
    I mentioned nothing with your downtime.

    It is your posted notice when you are down that is a problem.

    If you want to satisfy me then change your outage notice and don't mention the words 'Lost data' Or 'Data loss'.
    Make reference to the site the visitor just typed into their browzer.
    Mention a request to re-visit soon or in the near future.

    Put some of your tremendous energy into the problem of this discussion,your outage notice.

    Thank you, Nicholas
    Hi Nick, thanks again for posting. I've made sure that our product manager has seen this. We appreciate it.
  • Options
    WirelessWireless Registered Users Posts: 162 Major grins
    edited January 15, 2007
    Our database is cranky today.

    Unfortunately, our load balancer is showing its generic message right now. We'll pop up the other shortly. Sorry for the additional outage.
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited January 15, 2007
    Just a couple of minutes ago it was working, now it´s down again with the message:
    Http/1.1 Service Unavailable

    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    nickphoto123nickphoto123 Registered Users Posts: 302 Major grins
    edited January 15, 2007
    Thanks Andy..
    Andy wrote:
    Hi Nick, thanks again for posting. I've made sure that our product manager has seen this. We appreciate it.

    Here's to an ever improving Smugmug...Cheers!!!!!!!!!

    Nicholas
  • Options
    dogwooddogwood Registered Users Posts: 2,572 Major grins
    edited January 15, 2007
    Wireless wrote:
    Our database is cranky today.

    Unfortunately, our load balancer is showing its generic message right now. We'll pop up the other shortly. Sorry for the additional outage.
    Ouch-- right in the middle of an upload. When I get cranky, sometimes it helps to eat something... wait, nevermind, we don't want the database eating anything!

    Just crossing my fingers the downtime(s) today are due to implementing the latest updates soon? ne_nau.gif

    Holy smokes-- you folks did listen to my suggestion for the downtime message! Wow! Thank you!

    Portland, Oregon Photographer Pete Springer
    website blog instagram facebook g+

  • Options
    SeymoreSeymore Banned Posts: 1,539 Major grins
    edited January 15, 2007
    Wireless wrote:
    Our database is cranky today.

    Unfortunately, our load balancer is showing its generic message right now. We'll pop up the other shortly. Sorry for the additional outage.
    :nono
  • Options
    Steve CaviglianoSteve Cavigliano Super Moderators Posts: 3,599 moderator
    edited January 15, 2007
    Just a couple of minutes ago it was working, now it´s down again with the message:
    Http/1.1 Service Unavailable

    Sebastian

    Yes, we are experiencing problems at this time. Our engineers are working feverishly to restore service.

    Sorry :cry

    Steve
    SmugMug Support Hero
  • Options
    PezpixPezpix Registered Users Posts: 391 Major grins
    edited January 15, 2007
    Would it help if I offered a big cookie to the database gremlin? :D

    attachment.php?attachmentid=13710&stc=1&d=1152558860
    Professional Ancient Smugmug Shutter Geek
    Master Of Sushi Noms
    Amateur CSS Dork
  • Options
    RichSRichS Registered Users Posts: 32 Big grins
    edited January 15, 2007
    dogwood wrote:
    I agree. Should start with something like:

    "The site you are trying to access is hosted by Smugmug. We're having some temporary technical difficulties..."

    So I'm reading through this thread, seeing what's going on, and I hop back over to refresh the other tab to see if it's available yet. And I get " The site you are trying to access is hosted by SmugMug."

    Not bad - 2.5 hours to implement customer suggestions.

    I do wish the site was back up though.
  • Options
    ppugappuga Registered Users Posts: 100 Major grins
    edited January 15, 2007
    RichS wrote:
    So I'm reading through this thread, seeing what's going on, and I hop back over to refresh the other tab to see if it's available yet. And I get " The site you are trying to access is hosted by SmugMug."

    Not bad - 2.5 hours to implement customer suggestions.

    I do wish the site was back up though.

    Agree, that's a fast implement of customer suggestions. But please, get us our sites back. Good luck guys! You can do it! :devbobo
  • Options
    WirelessWireless Registered Users Posts: 162 Major grins
    edited January 15, 2007
    We're turning things back up now. Sorry for the extra wait, we were gathering some additional troubleshooting data while things were offline.
  • Options
    onethumbonethumb Administrators Posts: 1,269 Major grins
    edited January 15, 2007
    Wireless wrote:
    We're turning things back up now. Sorry for the extra wait, we were gathering some additional troubleshooting data while things were offline.

    The site is back up, but we don't know for sure if it'll stay up. Sorry. :(

    We're getting a strange error message on one of our core database machines. It feels, to me, like a hardware problem, but we just can't be sure yet.

    We do have spare hardware standing by, but switching is time-consuming, so we're doing a little experimentation first. Hopefully we can solve it without resorting to new hardware.

    Thanks for your patience.

    Don
  • Options
    JeffroJeffro Registered Users Posts: 1,941 Major grins
    edited January 15, 2007
    onethumb wrote:
    The site is back up, but we don't know for sure if it'll stay up. Sorry. :(

    We're getting a strange error message on one of our core database machines. It feels, to me, like a hardware problem, but we just can't be sure yet.

    We do have spare hardware standing by, but switching is time-consuming, so we're doing a little experimentation first. Hopefully we can solve it without resorting to new hardware.

    Thanks for your patience.

    Don

    Keep up the good work, but I'm putting you on the clock....my first motocross race is April 8th! I'll need to upload some shots later that day...rolleyes1.gif
    Always lurking, sometimes participating. :D
  • Options
    photodougphotodoug Registered Users Posts: 870 Major grins
    edited January 15, 2007
    down agin

    http/1.1 service unavailable
  • Options
    onethumbonethumb Administrators Posts: 1,269 Major grins
    edited January 15, 2007
    onethumb wrote:
    The site is back up, but we don't know for sure if it'll stay up. Sorry. :(

    We're getting a strange error message on one of our core database machines. It feels, to me, like a hardware problem, but we just can't be sure yet.

    We do have spare hardware standing by, but switching is time-consuming, so we're doing a little experimentation first. Hopefully we can solve it without resorting to new hardware.

    Thanks for your patience.

    Don

    We're down again, but got some useable data. Definitely looks like a hardware failure. Andrew's on his way to the datacenter now to physically take a look at what's going on.

    We have a couple of avenues to take, so we'll start taking them one at a time. I wouldn't be surprised if there are some more hiccups along the way.

    We'll keep you posted here.

    Don
  • Options
    thegrepperthegrepper Registered Users Posts: 25 Big grins
    edited January 15, 2007
    Disappointing...
    Dear Smugmug:

    I recently enabled and configured my pro account and have made good progress until today which has been an excercise in frustration. My customer is waiting to view their wedding photos and I'm frankly hesitant to enable the link given the instability.

    -Are these outages common?
    -What is the SMUGMUG's SLA?
    -Do you provide availability metrics to your customers?
    -If the problem is a core server, why is there no standby?
    -What components are not redundant?

    I understand the infrastructure challenge you face and know you will resolve this problem but I would like to better understand my exposure so I can address the concerns of my customers.
  • Options
    MannyManny Registered Users Posts: 148 Major grins
    edited January 16, 2007
    thegrepper wrote:
    Dear Smugmug:

    I recently enabled and configured my pro account and have made good progress until today which has been an excercise in frustration. My customer is waiting to view their wedding photos and I'm frankly hesitant to enable the link given the instability.

    -Are these outages common?
    -What is the SMUGMUG's SLA?
    -Do you provide availability metrics to your customers?
    -If the problem is a core server, why is there no standby?
    -What components are not redundant?

    I understand the infrastructure challenge you face and know you will resolve this problem but I would like to better understand my exposure so I can address the concerns of my customers.

    I think that you already answered your own questions :-)

    If you are a Pro, you should have already mailed a DVD to your customer pronto. Nothing ever works perfectly all the time, so you must have options you can control yourself. I think your best backup is your own site and your own DVDs to send out via snail mail if needed.

    Seriously, pop some small pics on to a free Flicker acct or whatever else right away and get your customer looking at the images. Worry about Smug coming back later.

    Cheers

    MG
  • Options
    SystemSystem Registered Users Posts: 8,186 moderator
    edited January 16, 2007
    Here's hoping this works itself out as fast as possible. Just, finally, do I get the cast from my opening Big Event Shoot to finally settle down from the performing and actually *look* at the pictures...

    and the damn server crashes. :-}

    What the previous poster said... I'm sure you're a bit busy just now... but when you're not, those might be useful questions to address.
  • Options
    onethumbonethumb Administrators Posts: 1,269 Major grins
    edited January 16, 2007
    thegrepper wrote:
    Dear Smugmug:

    I recently enabled and configured my pro account and have made good progress until today which has been an excercise in frustration. My customer is waiting to view their wedding photos and I'm frankly hesitant to enable the link given the instability.

    -Are these outages common?
    -What is the SMUGMUG's SLA?
    -Do you provide availability metrics to your customers?
    -If the problem is a core server, why is there no standby?
    -What components are not redundant?

    I understand the infrastructure challenge you face and know you will resolve this problem but I would like to better understand my exposure so I can address the concerns of my customers.

    Hi there,

    Hopefully some of our long-term customers will weigh in here with an unbiased opinion, but the view from the inside looks something like this:

    - These outages ARE NOT common. In terms of non-scheduled downtime, we aim to have less than one hour (99.99%) uptime. Last year (2006) we didn't make it because of one prolonged outage due to a distributed denial of service attack, but excluding that one, we were will within range. Previous years we made it with plenty of room.

    - We don't provide an SLA. You're the first one to ask for one. I'm not opposed to implementing one, but we'd need to think about how to do it. Some estimate of the time down, multiplied by your average sales dollars per second might do the trick, but we'll have to talk about that. I'm not sure our service is really high-dollar enough to warrant an SLA, but maybe I'm wrong?

    - We don't provide availability metrics, and again, you're the first to ask. We'd certainly do so if we had an SLA in place.

    - We have multiple pieces of hardware standing by, but I'm not about to replace a piece of hardware without first verifying that it's the root cause. If it wasn't, I just wasted an hour or two doing nothing useful. We're still diagnosing the problem, but we're getting closer.

    - At this point, I believe the only non-redundant components are core networking switches. (This is a common component to not be redundant - every major internet brand suffers from outages due to network switch failures) All of our routers, servers, and storage are redundant. Many of them are automatic fail-over, too, so you'll never notice an outage. With critical data, though, it's essential to have manual failover so we don't introduce data corruption due to split-brain or time-based latency.

    When you compare our uptime to any major brand, including Google, eBay, Hotmail, or Amazon, you'll find that we're comparable. No-one meets 99.999% uptime, and most don't even come close to 99.99%.

    We're doing the best we can, of course, and will continue to think about how we can improve.

    Thanks for the questions, hopefully some third party validation will happen with our customers, but it's possible my world view differs from theirs. :)

    Don
  • Options
    thegrepperthegrepper Registered Users Posts: 25 Big grins
    edited January 16, 2007
    Manny wrote:
    I think that you already answered your own questions :-)

    If you are a Pro, you should have already mailed a DVD to your customer pronto. Nothing ever works perfectly all the time, so you must have options you can control yourself. I think your best backup is your own site and your own DVDs to send out via snail mail if needed.

    Seriously, pop some small pics on to a free Flicker acct or whatever else right away and get your customer looking at the images. Worry about Smug coming back later.

    Cheers

    MG

    I can surely implement one of your suggestions but I moved to SMUGMUG to simplify workflow and improve the customer experience. My question was less about this outage and more about next time. I'd like to understand if this is once a year or once a month. Hopefully, my previous questions can be addressed once the incident is resolved.
  • Options
    DavidTODavidTO Registered Users, Retired Mod Posts: 19,160 Major grins
    edited January 16, 2007
    thegrepper wrote:
    I can surely implement one of your suggestions but I moved to SMUGMUG to simplify workflow and improve the customer experience. My question was less about this outage and more about next time. I'd like to understand if this is once a year or once a month. Hopefully, my previous questions can be addressed once the incident is resolved.


    Infrequent, and normally resolved in minutes, as far as I can see. I don't clock it, but outages are very rare and quickly recovered.
    Moderator Emeritus
    Dgrin FAQ | Me | Workshops
Sign In or Register to comment.