Webmaster Tools: Increase in server errors

2

Comments

  • RobLoudRobLoud Registered Users Posts: 45 Big grins
    edited August 10, 2014
    chipj wrote: »
    Yes, I'm seeing the same. In fact this time I have more than twice the number of reported errors than the June 27th incident. They all appear to 504 errors.

    I have got the same. Mine has shot from zero to nearly 1000 just in one day (8th August)
  • ian408ian408 Administrators Posts: 21,905 moderator
    edited August 10, 2014
    chipj wrote: »
    Yes, I'm seeing the same. In fact this time I have more than twice the number of reported errors than the June 27th incident. They all appear to 504 errors.

    I have seen a couple of 522's although most are 504.
    wikipedia wrote:
    522 Connection timed out (Cloudflare)
    This status code is not specified in any RFCs, but is used by Cloudflare's reverse proxies to signal that a server connection timed out.
    Moderator Journeys/Sports/Big Picture :: Need some help with dgrin?
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 11, 2014
    As Shandrew already hinted at in this post, Google Webmaster is a tool designed for small websites where bot traffic and regular traffic are handled identically. SmugMug is not a small website. Because Google has more servers than we do, we restrict the amount of bot traffic and prioritize real customer traffic over bots. So in cases where Google sends us too many requests at once, we may have to turn some of their requests down in favor of keeping your (and all other) SmugMug sites available to visitors without interruption.

    That means, if you get such emails from Google Webmasters, the issues they report are something that you can likely fix within Webmaster tools. Under Webmaster tools go to > Crawl > Crawl Errors > Select All > Press the 'Mark As Fixed' button. Then go to Crawl > Fetch as Google > press 'Fetch and Render' and once it's done > Submit to index.

    The way the Google crawler and Google Webmasters currently works, it's nothing out of the ordinary that you may get these kind of errors on occasion.
    Sebastian
    SmugMug Support Hero
  • chipjchipj Registered Users Posts: 149 Major grins
    edited August 11, 2014
    rainforest1155,

    There's a couple of inaccuracies to your reply. First of all Google Webmaster Tools (GWT) is not only meant for small business. It is used by very large corporations who have many web sites. In fact I'm currently working for a Fortune 500 company and we utilize GWT extensively to uncover crawling issues. Some of these sites have a great deal of traffic that would likely rival SmugMug. I have yet to see a 504 error reported in GWT.

    Secondly, the "Mark As Fixed" button in GWT does not "fix" the issue, it only removes the mention of the issue from the reporting. To actually fix the error you'll need to access the host server and make the correct fix there.

    A 504 error or any 500 error needs to be fixed by the SmugMug team on their server. This is a SmugMug issue that individuals will not be able to resolve on their own.
    As Shandrew already hinted at in this post, Google Webmaster is a tool designed for small websites where bot traffic and regular traffic are handled identically. SmugMug is not a small website. Because Google has more servers than we do, we restrict the amount of bot traffic and prioritize real customer traffic over bots. So in cases where Google sends us too many requests at once, we may have to turn some of their requests down in favor of keeping your (and all other) SmugMug sites available to visitors without interruption.

    That means, if you get such emails from Google Webmasters, the issues they report are something that you can likely fix within Webmaster tools. Under Webmaster tools go to > Crawl > Crawl Errors > Select All > Press the 'Mark As Fixed' button. Then go to Crawl > Fetch as Google > press 'Fetch and Render' and once it's done > Submit to index.

    The way the Google crawler and Google Webmasters currently works, it's nothing out of the ordinary that you may get these kind of errors on occasion.
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 11, 2014
    chipj wrote: »
    First of all Google Webmaster Tools (GWT) is not only meant for small business. It is used by very large corporations who have many web sites
    The bulk of my previous reply comes from details that I received from our Ops team. While it may work for your company, in general, GWT is not suited for the use with SmugMug and it's possible that you'll occasionally get error reports from Google.
    chipj wrote: »
    Secondly, the "Mark As Fixed" button in GWT does not "fix" the issue, it only removes the mention of the issue from the reporting. To actually fix the error you'll need to access the host server and make the correct fix there.
    Mark as fixed will tell Google to retry later, which is all that needs to be done. As mentioned before, we may not be able to handle all requests that Google throws at our servers at one time and that may result in Google reporting an error to you. If you tell Google to retry, it should work the next time.
    Sebastian
    SmugMug Support Hero
  • Darter02Darter02 Registered Users Posts: 947 Major grins
    edited August 11, 2014
    Mark as fixed will tell Google to retry later, which is all that needs to be done. As mentioned before, we may not be able to handle all requests that Google throws at our servers at one time and that may result in Google reporting an error to you. If you tell Google to retry, it should work the next time.

    I did just that the last round of errors. Now I am getting almost exactly the same errors to keyword pages. So that didn't work for me.
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 11, 2014
    Darter02 wrote: »
    I did just that the last round of errors. Now I am getting almost exactly the same errors to keyword pages. So that didn't work for me.
    What specific errors are you seeing? Can you include a few examples?
    Sebastian
    SmugMug Support Hero
  • Darter02Darter02 Registered Users Posts: 947 Major grins
    edited August 11, 2014
    Holy smokes I'm now almost to 8000 errors. They are are my main catagories, my HOMEPAGE, and my main business page, as well as my keywords.

    i-R9kXz93-L.jpg

    i-TZTLzWn-L.jpg

    That first page, the catagory Pennsic, is what drives a lot of traffic to my site. Without that I'm dead in the water. The most concerning is the third one, WedServ. It's my wedding business page!! The second error seems to indicate Google couldn't find my homepage. This is really not acceptable. Please ask the techs to address this. If Google can't find my main pages I'm sunk.
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 11, 2014
    Darter02 wrote: »
    Holy smokes I'm now almost to 8000 errors. They are are my main catagories, my HOMEPAGE, and my main business page, as well as my keywords.

    That first page, the catagory Pennsic, is what drives a lot of traffic to my site. Without that I'm dead in the water. The most concerning is the third one, WedServ. It's my wedding business page!! The second error seems to indicate Google couldn't find my homepage. This is really not acceptable. Please ask the techs to address this. If Google can't find my main pages I'm sunk.
    First of all, these errors do not mean that Google has removed any of the pages from their index. It only means that on the latest attempt, Google wasn't able to pull up the page to look for changes.
    Your site is still indexed by Google as you can see when searching for "site:" followed by your domain name. Among the first results, you should see Pennsic and WedServ.
    Google also tells me that it last indexed your site 4 days ago, which would be Aug 8. So their index is pretty current.

    Secondly, the screenshots show a detected date of Aug 9. Are sure that you did try marking all the errors as fixed and also did the second part of what I suggested initially:
    'Then go to Crawl > Fetch as Google > press 'Fetch and Render' and once it's done > Submit to index."'
    If the errors still came up again, that likely means that Google is currently still sending us way too many requests and we have to turn them down for the time being to ensure that your site is fast and fully available for your visitors.
    Over time, all Google requests will get handled, but it may take some time.

    Right now, there's nothing to be done about the Google errors. Again, while Google bombards us with request, we only turn them down to keep the site fast for actual humans wanting to view yours and other SmugMug sites.

    If we were to not do that, all of SmugMug might be down as Google has way server power than we could possibly handle.
    Sebastian
    SmugMug Support Hero
  • AperturePlusAperturePlus Registered Users Posts: 374 Major grins
    edited August 11, 2014
    ...in general, GWT is not suited for the use with SmugMug..

    I need to ask you something Sebastian... is GWT is not suited to track statistics on Smugmug, what is? The built in stats are absolutely useless to us, so how do we know what the traffic is?
  • thenickdudethenickdude Registered Users Posts: 1,302 Major grins
    edited August 11, 2014
    It seems like the issue is sending a 504 response code to Google when they're sending too many requests, which causes it to squawk in their logs. Assuming that SM intentionally sends a 504 to Google, and it's not just due to the server being overloaded, how about sending a 503 Service Unavailable code instead? Google's Webmaster blog even recommends it.

    EDIT: Although judging from the comments, it seems that 503s can also appear in the log as errors, so it might not improve things after all.
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 11, 2014
    I need to ask you something Sebastian... is GWT is not suited to track statistics on Smugmug, what is? The built in stats are absolutely useless to us, so how do we know what the traffic is?
    You can use the other parts of the GWT service, but I'd suggest disabling the crawling / health monitoring email notifications.
    To quote Shandrew from our Ops team in an earlier post on this thread:
    shandrew wrote:
    Unless you are very interested in google's bot/crawler activity itself, I recommend turning off Google Webmaster Tools' email notifications, as they simply aren't terribly useful--they are not designed well for large scale sites like SmugMug's. If you want free and simple external site monitoring, you can try a service like pingdom or siteuptime.
    Sebastian
    SmugMug Support Hero
  • ian408ian408 Administrators Posts: 21,905 moderator
    edited August 11, 2014
    It seems like the issue is sending a 504 response code to Google when they're sending too many requests, which causes it to squawk in their logs. Assuming that SM intentionally sends a 504 to Google, and it's not just due to the server being overloaded, how about sending a 503 Service Unavailable code instead? Google's Webmaster blog even recommends it.

    EDIT: Although judging from the comments, it seems that 503s can also appear in the log as errors, so it might not improve things after all.

    If you are truly rate limiting, the correct response is 429. 503 and 504 imply something is wrong when it's not.
    Moderator Journeys/Sports/Big Picture :: Need some help with dgrin?
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited August 11, 2014
    Hey Ian, Googlebot does not rate limit based on error code, though at one point long ago it may have. All of our dealings with Googlebot are largely manual through our contact with their staff. We (along with a handful of other sites) are unusual in that we serve a huge number of domains with a nearly endless supply of pages to crawl, and have a lot of capacity available. I'm not sure how much detail they would want me to discuss, but they don't currently have a good way for a site like ours to control crawl rates for thousands of customer-controlled domains. All of that is handled the old fashioned way, human-to-human company-to-company communication, and sometimes when we have an update that they don't handle properly, or if they roll out something new that doesn't have their company-specific rate limit working properly, they can send a huge amount of traffic our way. Curiously, none of the other crawlers/bots have this problem.

    Re: other recent posts here, the important things to remember are:
    1. Google Webmaster Tools doesn't monitor your site; it monitors Googlebot's accesses to your site. Googlebot's (and other bot) traffic goes to different servers than customer traffic, because customers are more important than robots, and are generally better behaved :)

    2. If you're concerned about the error rate when GWT reports something like this, try clicking on "Crawl Stats" to see how much more traffic Googlebot is sending to your site. Google is actually doing more indexing of your site (though only Google knows what impact this has on your search rankings)

    3. Google Webmaster Tools is buggy, as the data tends to lag and doesn't report properly--data can lag by several days, and the lagged data is reported on the time that the data is processed rather than when it actually happened. I only know this empirically, from using GWT for a while and having a lot of experience in data/logs processing/reporting. You'll continue to see these error reports in GWT for several days even though their traffic incident ended around 8/11 02:00 Pacific. Their system poorly designed for this, providing misleading and largely useless data.

    4. Google is an independent company and not a subsidiary of SmugMug :D . Our mission is to serve our customers, and Google's goals are different (though they are generally friendly and happy to work with us). The amount of control we have over Google's search, their robots, and their tools is limited.

    5. If you're interested in the technical side of serving a highly dynamic site to search robots, check out http://sorcery.smugmug.com/2013/12/17/using-phantomjs-at-scale/
    I work at SmugMug but these opinions are usually my own.
  • chipjchipj Registered Users Posts: 149 Major grins
    edited August 11, 2014
    rainforest1155,
    I don't want to come off as being combative, but this just isn't correct. GWT is very well suited for uncovering crawling and indexing issues for web sites of any size. The issue is on the SmugMug side of things. GWT merely reports the issues it sees and which they see as problematic when it comes to Googlebot crawling.

    If the SmugMug servers are having issues handling all of the requests it gets then they really need to look at hiring a specialist who is familiar with load balancing and server volumes. The LAST thing they should do is restrict the search bots.

    Secondly, the "Mark as fixed" option (again) doesn't fix the issue no matter how many times you click it. If Googlebot finds an error, it will keep reporting it no matter how many times it returns. If Googlebot can't crawl the site, then it's a potential issue when it comes to visibility on the search engine results page.

    The bulk of my previous reply comes from details that I received from our Ops team. While it may work for your company, in general, GWT is not suited for the use with SmugMug and it's possible that you'll occasionally get error reports from Google.


    Mark as fixed will tell Google to retry later, which is all that needs to be done. As mentioned before, we may not be able to handle all requests that Google throws at our servers at one time and that may result in Google reporting an error to you. If you tell Google to retry, it should work the next time.
  • ian408ian408 Administrators Posts: 21,905 moderator
    edited August 11, 2014
    shandrew wrote: »
    Hey Ian, Googlebot does not rate limit based on error code, though at one point long ago it may have.

    I hope I didn't suggest crawlers were rate limiting based on error codes. It is possible for you to control crawling on the server side and if you are, returning 429 would be more meaningful.
    Moderator Journeys/Sports/Big Picture :: Need some help with dgrin?
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited August 11, 2014
    Ian, agreed, it would be more meaningful to us humans. I'll look into that, however it's rather low priority since it offers little value to customers, and some of the error codes, depending on cause, are from our edge CDN rather than direct.
    I work at SmugMug but these opinions are usually my own.
  • ian408ian408 Administrators Posts: 21,905 moderator
    edited August 11, 2014
    shandrew wrote: »
    Ian, agreed, it would be more meaningful to us humans. I'll look into that, however it's rather low priority since it offers little value to customers, and some of the error codes, depending on cause, are from our edge CDN rather than direct.

    Personally, I think the vast majority of the error are from the CDN.
    Moderator Journeys/Sports/Big Picture :: Need some help with dgrin?
  • AperturePlusAperturePlus Registered Users Posts: 374 Major grins
    edited August 12, 2014
    A question from someone who knows nothing about bots and Google, but when we get these crawl errors, does it affect our ratings/rankings in Google? It seems that Google is pretty picky and pushes you down their list if there are things that they don't like.
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited August 12, 2014
    chipj wrote: »
    I don't want to come off as being combative, but this just isn't correct.
    I'm sorry that you feel that what I mentioned is not correct. Please see shandrews's (he's on our Ops team) post prior to your last one for more details.
    Sebastian
    SmugMug Support Hero
  • chipjchipj Registered Users Posts: 149 Major grins
    edited August 15, 2014
    AperturePlus, It doesn't impact rankings or indexing IF it's an occasional incident. There could be an impact if it happens on a regular basis though. So far I am seeing 6 straight days of 504 errors reported by Google. Apparently SmugMug is blocking Googlebot (and likely other search bots) and this IS an issue when it comes to site crawling and indexing.

    If Google, Bing, ste. can't crawl your web site then they can't make a determination as to the value of your content. Google has posted information about this at:
    https://support.google.com/webmasters/answer/2387297?hl=en
    A question from someone who knows nothing about bots and Google, but when we get these crawl errors, does it affect our ratings/rankings in Google? It seems that Google is pretty picky and pushes you down their list if there are things that they don't like.
  • chipjchipj Registered Users Posts: 149 Major grins
    edited August 15, 2014
    shandrew,

    I can appreciate that you are having issues with traffic volumes. But blocking search bots is just not a good practice when it comes to helping your customers get their content crawled and indexed for organic search.

    Three months after New SmuMug was released I realized that it's new architecture was resulting in search bot indexing issues. In fact even today, almost a year after the release of SmugMug, only about 10% of the SmugMug pages Google has crawled on my site is in their search index. Other non-SmugMug sites that I operate are typically about 95% indexed.

    As a result, I moved my SmugMug gallery to a subdomain about 9 months ago so that I had better control of the indexing of my main home page. This has worked for me... I get very little search traffic to my SmugMug content, but about 99% of my NON-SmugMug pages are indexed. Fortunately I no longer have to rely on my SmugMug site for search traffic.

    BTW, Google has posted information about blocking their bot at:
    https://support.google.com/webmasters/answer/2387297?hl=en

    shandrew wrote: »
    Hey Ian, Googlebot does not rate limit based on error code, though at one point long ago it may have. All of our dealings with Googlebot are largely manual through our contact with their staff. We (along with a handful of other sites) are unusual in that we serve a huge number of domains with a nearly endless supply of pages to crawl, and have a lot of capacity available. I'm not sure how much detail they would want me to discuss, but they don't currently have a good way for a site like ours to control crawl rates for thousands of customer-controlled domains. All of that is handled the old fashioned way, human-to-human company-to-company communication, and sometimes when we have an update that they don't handle properly, or if they roll out something new that doesn't have their company-specific rate limit working properly, they can send a huge amount of traffic our way. Curiously, none of the other crawlers/bots have this problem.

    Re: other recent posts here, the important things to remember are:
    1. Google Webmaster Tools doesn't monitor your site; it monitors Googlebot's accesses to your site. Googlebot's (and other bot) traffic goes to different servers than customer traffic, because customers are more important than robots, and are generally better behaved :)

    2. If you're concerned about the error rate when GWT reports something like this, try clicking on "Crawl Stats" to see how much more traffic Googlebot is sending to your site. Google is actually doing more indexing of your site (though only Google knows what impact this has on your search rankings)

    3. Google Webmaster Tools is buggy, as the data tends to lag and doesn't report properly--data can lag by several days, and the lagged data is reported on the time that the data is processed rather than when it actually happened. I only know this empirically, from using GWT for a while and having a lot of experience in data/logs processing/reporting. You'll continue to see these error reports in GWT for several days even though their traffic incident ended around 8/11 02:00 Pacific. Their system poorly designed for this, providing misleading and largely useless data.

    4. Google is an independent company and not a subsidiary of SmugMug :D . Our mission is to serve our customers, and Google's goals are different (though they are generally friendly and happy to work with us). The amount of control we have over Google's search, their robots, and their tools is limited.

    5. If you're interested in the technical side of serving a highly dynamic site to search robots, check out http://sorcery.smugmug.com/2013/12/17/using-phantomjs-at-scale/
  • AperturePlusAperturePlus Registered Users Posts: 374 Major grins
    edited August 16, 2014
    Thanks Chipj. I presumed as much.
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited August 18, 2014
    chipj wrote: »
    So far I am seeing 6 straight days of 504 errors reported by Google.

    Google is wrong: "Google Webmaster Tools is buggy, as the data tends to lag and doesn't report properly--data can lag by several days, and the lagged data is reported on the time that the data is processed rather than when it actually happened. I only know this empirically, from using GWT for a while and having a lot of experience in data/logs processing/reporting. You'll continue to see these error reports in GWT for several days even though their traffic incident ended around 8/11 02:00 Pacific. Their system poorly designed for this, providing misleading and largely useless data."

    We restrict Googlebot's traffic volume to a reasonably large amount. During their recent experiment we served a lot more traffic to them than normal. Check the "Crawl Stats" under GWT if you'd like to see.

    I appreciate your posts, but it isn't possible for you to know more about this crawler/bot traffic than I and the Googlebot developers that I discuss these topics with. We actually observe what is going on on both ends (rather than looking through the low-quality/incorrect information provided by GWT).
    I work at SmugMug but these opinions are usually my own.
  • RobLoudRobLoud Registered Users Posts: 45 Big grins
    edited August 25, 2014
    Fetch as Google Issue
    I have noticed that since this issue started the first time around I have had problems when using the fetch as google in webmaster tools.

    I have had a couple of 'Not Found' and now the status is coming back as 'Partial' even for my home page.

    Why is Google having problems being able to read / decipher the submitted links when there was no issues previously.

    Has anyone else had or noticed this problem or is it just me?

    Robin
  • AperturePlusAperturePlus Registered Users Posts: 374 Major grins
    edited August 25, 2014
    shandrew wrote: »
    I appreciate your posts, but it isn't possible for you to know more about this crawler/bot traffic than I and the Googlebot developers that I discuss these topics with. We actually observe what is going on on both ends (rather than looking through the low-quality/incorrect information provided by GWT).

    I did a double take when I first saw this comment! Gee - that is a little presumptuous is it not Shandrew, or at least, extremely arrogant of you?
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited August 25, 2014
    RobLoud wrote: »
    I have had a couple of 'Not Found' and now the status is coming back as 'Partial' even for my home page.

    "Partial" is the normal response when doing a "fetch and render" from GWT. If you click on the fetch, you'll see something like "Googlebot couldn't get all resources for this page", which is normal--there are various resources on everyone's page that are blocked to bots, even Google content like their font api.

    When did you get the "Not Found", and does GWT provide any additional information when you click on the line?
    I work at SmugMug but these opinions are usually my own.
  • DeeRichDeeRich Registered Users Posts: 76 Big grins
    edited August 25, 2014
    No stats at all today - yesterday I had stats
    Can't get my stats at all today. Could yesterday. What's going on?

    Trying to install the StatCounter but finding it not so easy. It tells me to look for things that are not there.
  • RobLoudRobLoud Registered Users Posts: 45 Big grins
    edited August 25, 2014
    shandrew wrote: »
    "Partial" is the normal response when doing a "fetch and render" from GWT. If you click on the fetch, you'll see something like "Googlebot couldn't get all resources for this page", which is normal--there are various resources on everyone's page that are blocked to bots, even Google content like their font api.

    When did you get the "Not Found", and does GWT provide any additional information when you click on the line?

    Thanks for the reply.

    I made a mistake with the not found reference, this was for the same gallery, but I changed the gallery name to try and get it to fetch. However the same gallery would not fetch or fetch and render on the 8th August using the correct name.
    The fetch would return with 'Temporaly Unavailable' and the fetch & render would return 'Unreachable'. However the same gallery has returned with 'Partial' when I tried again today as per my previous post.
  • ian408ian408 Administrators Posts: 21,905 moderator
    edited August 25, 2014
    chipj wrote: »
    shandrew,
    I can appreciate that you are having issues with traffic volumes. But blocking search bots is just not a good practice when it comes to helping your customers get their content crawled and indexed for organic search.

    On this one, I'll have to side with Shandrew. Crawling affects performance. It's one thing to have one site and a dozen bots. Quite another to have 100 sites with a dozen bots each and limiting the number of bots just means it will take longer to complete.

    As far as crawling goes, Google isn't the only one. Baidu, Yahoo!, and a dozen others crawl.

    I also know first hand what crawling can do if the target is not properly configured and I cannot fault SmugMug for their approach. Especially knowing that someone is trying to figure out how to make it better.
    Moderator Journeys/Sports/Big Picture :: Need some help with dgrin?
Sign In or Register to comment.