Options

How to edit .htaccess to block Semalt

DCraneDCrane Registered Users Posts: 3 Beginner grinner
edited March 24, 2015 in SmugMug Support
I'm looking to block the website semalt .com (Don't go here!) from crawling my page. If you Google this, they are not a legitimate business and skirt many standards supposed to be used by web crawlers. They also mess up analytics reporting and are apparently amassing a huge database of personal information and even send out spyware to people that visit their site.
So, I'm looking to block them from accessing my site. I've read the best way to do this is by editing the .htaccess file to block referrals from certain domains.
My question then is: how to I edit/add a .htaccess with SmugMug, or is there an alternative way to block referrals from certain domains? Any process of blocking them needs to also block subdomains like semalt.semalt .com or 24.semalt .com as they use an infinite number of variations to avoid being blocked...

Thanks in advance!

David Crane
DCranePhoto.com

Comments

  • Options
    RichardRichard Administrators, Vanilla Admin Posts: 19,919 moderator
    edited October 2, 2014
    I never heard of these guys, but if they are as bad as you say, maybe SmugMug could do something to block access globally for all its clients. ear.gif
  • Options
    brianbbrianb Registered Users Posts: 96 Big grins
    edited October 3, 2014
    I see the same traffic to my smugmug site. Please block them!
  • Options
    DCraneDCrane Registered Users Posts: 3 Beginner grinner
    edited October 5, 2014
    Email to SmugMug
    I've sent off an email to SmugMug's Heroes asking for help and voicing my concern with Semalt. I suggest you all tell them you are concerned about this issue as well and maybe we can get them to block access site-wide. I can't imagine what kind of server traffic 10-20 site visits per day to every one of their hosted sites would cause :uhoh
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited October 6, 2014
    All SmugMug sites are automatically configured to only allow certain web crawlers / bots. You can check out which crawlers and bots we accept by adding /robots.txt to your site address. In specific, the:
    User-agent: *
    Disallow: /
    at the end of the robots.txt file disallows any crawler / bot that wasn't specifically allowed in earlier in the file.
    Sebastian
    SmugMug Support Hero
  • Options
    RichardRichard Administrators, Vanilla Admin Posts: 19,919 moderator
    edited October 6, 2014
    My understanding is that robots.txt relies on the goodwill of the crawler to follow the specified requests. Legitimate crawlers will respect it, but it does nothing to prevent a malicious one from entering. If the OP is correct about this particular site, it will probably not help. Or has SmugMug done something special to enforce it somehow?
  • Options
    DCraneDCrane Registered Users Posts: 3 Beginner grinner
    edited October 6, 2014
    Correct, Richard.
    These guys have been ignoring the robot.txt files as they are not a well-meaning crawler. They are actually accumulating a list of sites and their bot runs through all of them over and over as a referral from their site. It's been suggested that people that go to their website and sign up for a free trial then become a host for their bots to run from their computer and IP, protecting Semalt from detection and making them harder to block out. Not sure what they're trying to do here but it's sketchy.

    Further, SmugMug has come back and basically said they are "looking into it," but I cannot have access .htaccess file so there is nothing they will do about it. "You can block access by making a gallery password protected" ... Not really what I was going for there.

    I suggest anyone else with a concern about this issue email smugmug support and ask them to consider blocking access from these domains site wide.
    But for now, it looks like there is nothing else smugmug is willing to do.
  • Options
    TeachTeach Registered Users Posts: 320 Major grins
    edited October 6, 2014
    I had sent a message to our Ops team to take a look and see if they have any suggestions or other ideas as to the best way to handle the issue with .htaccess to block Semalt. Please allow them sometime to take a deeper look into this issue.
    Heather
    SmugMug Support Hero
  • Options
    shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited October 7, 2014
    There are a number of questions here. I'll try to answer them all.

    1. I have a public web site, can I block some third party from accessing it?

    No. Your site is public, so it's open to the world to see. It may be possible to detect abusive patterns (such as too much activity in a short period of time), and it may be possible to deny access to certain IP blocks, but overall, if your site is public, then anyone with internet access can download the public information from your web site. If you don't want everyone to be able to access your site, you need to make it either unlisted or password-protected.

    2. Some third-party is messing up my public web site's analytics. What can I do?

    There's not too much you can do. Again, with a public web site, your analytics data is a product of public access to your web site. High end web analytics can sometimes remove activity that looks like automated crawling (when it isn't clearly identified as a crawler), but basic analytics tools like GA/SmugMug stats/statcounter/etc don't. Distinguishing human from shady bot activity is a very difficult problem.

    3. What is referrer spam? What can I do about it?

    There's a good description here: http://www.incapsula.com/blog/semalt-botnet-spam.html . Some bad actors attempt to raise the SEO profile of their own site by creating lots of redirection links from their site to other sites, hoping that appearing in the analytics logs will benefit themselves if the analytics logs are public. SmugMug's stats are not public so this does them no good. They are merely acting as an annoyance.

    4. What should I do about Semalt?

    My recommendation is to do nothing about them and to go out and shoot more photos. I don't see Semalt appearing in SmugMug's stats referers, although if you do see them, please PM me and I will take a look. If they are appearing in other stats you use such as GA or Statcounter, you should talk to those organizations about filtering them out. Refer is a voluntary header, so any client can send anything they want in the referer header--thus it's one that is easily polluted with bad data. They do have a removal tool at http://semalt.com/project_crawler.php , which may or may not be effective.
    I work at SmugMug but these opinions are usually my own.
  • Options
    Hikin' MikeHikin' Mike Registered Users Posts: 5,450 Major grins
    edited October 13, 2014
  • Options
    Djm3006Djm3006 Registered Users Posts: 226 Major grins
    edited January 6, 2015
    brianb wrote: »
    I see the same traffic to my smugmug site. Please block them!
    All so
    Buttons-for-website.com
  • Options
    carloseocarloseo Registered Users Posts: 1 Beginner grinner
    edited March 11, 2015
    stop semalt
    Try point 2 here it might help you to stop semalt and buttons-for-website
    http://www.ohow.co/block-referrer-spam-list/
  • Options
    Djm3006Djm3006 Registered Users Posts: 226 Major grins
    edited March 24, 2015
    Djm3006 wrote: »
    All so
    Buttons-for-website.com

    In the last week this site, social-buttons.com has turned up in Analytics
Sign In or Register to comment.