• Home
  • SEM
    • PPC
    • SEO
    • How To
  • Bangladesh
  • School
    • HTTP
    • REP
    • Bots n’ htaccess
    • Source Code
    • Excel 2010
    • HTML XHTML Entities
    • Gmail Operators
    • HTML & ASCII
  • About
    • Privy
    • About This Blog
  • Contact
  • বাংলা

Saidul Hassan

Digital Marketing Evangelist

Home / School / Robots.txt explained and blocking bad bots with htaccess

Robots.txt explained and blocking bad bots with htaccess

26 Aug, 2013 By Saidul Hassan · Filed Under: School

Blocking Baidu and Yandex Search Spiders
Implemented the following into the top of the HTACCESS file to have block the Baidu, Yandex and Sosospider bots from spidering website.

“———————————————————————————–

SetEnvIfNoCase User-agent “Baidu” spammer=yes
SetEnvIfNoCase User-agent “Yandex” spammer=yes
SetEnvIfNoCase User-agent “Sosospider” spammer=yes

order deny,allow

deny from env=spammer

————————————————————————————-”
Source


Robots.txt Specifications > Webmasters > Google Developers Source
Frequently asked questions > Webmasters > Google Developers Source
Block or remove pages using a robots.txt file – Webmaster Tools Help Source.


Example
User-agent: *
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/wp-content/plugins/
Disallow: /blog/wp-content/cache/
Disallow: /blog/wp-content/themes/
Disallow: /blog/wp-includes/js
Disallow: /category/*/*
Disallow: */trackback
Disallow: /*?*
Disallow: /*?
Disallow: /*~*
Disallow: /*~

User-Agent: Baiduspider
User-Agent: Baiduspider-ads
User-Agent: Baiduspider-cpro
User-Agent: Baiduspider-favo
User-Agent: Baiduspider-news
User-Agent: Baiduspider-video
User-Agent: Baiduspider-image
User-agent: Yandex
Disallow: /

sitemap: https://saidulhassan.com/sitemap.xml

The following two tabs change content below.
  • Bio
  • Latest Posts
My Twitter profileMy Facebook profileMy Google+ profileMy LinkedIn profileMy Instagram profileMy Pinterest profile

Saidul Hassan

Managing Partner at Up Arrow Consulting
COO at Up Arrow Consulting, MCC manager, & Technical SEO consultant. Certified Google Partners and Microsoft Bing Ads Accredited Professional, Python enthusiast, wannabe SysAdmin. Graduated from School of Management & Business Administration (SOMBA), Khulna University.
My Twitter profileMy Facebook profileMy Google+ profileMy LinkedIn profileMy Instagram profileMy Pinterest profile

Latest posts by Saidul Hassan (see all)

  • How to use PrismJS syntax highlighter on WordPress without plugin - 30 Mar, 2020
  • Download an Entire Website for Offline Viewing - 26 Nov, 2019
  • HMA Pro VPN Setup for Multiple Locations without User/Password Every time in Linux CLI - 14 May, 2018

Share:

  • Twitter
  • Facebook
  • LinkedIn
  • Pinterest
  • Pocket
  • Email
  • Print

First published on 26 Aug, 2013 · Last updated 26 Aug, 2013 · Tagged With: bot, crawler, robots.txt, spider

How to use PrismJS syntax highlighter on WordPress without plugin

30 Mar, 2020 By Saidul Hassan

Download an Entire Website for Offline Viewing

26 Nov, 2019 By Saidul Hassan

HMA Pro VPN Setup for Multiple Locations without User/Password Every time in Linux CLI

14 May, 2018 By Saidul Hassan

How to Search and Delete Files Containing Specific String in Name

27 Nov, 2017 By Saidul Hassan

How to Bulk Rename Video Files based on Resolution, Size etc.

27 Nov, 2017 By Saidul Hassan

Archives

  • March 2020
  • November 2019
  • May 2018
  • November 2017
  • July 2017
  • March 2017
  • February 2017
  • June 2016
  • November 2015
  • July 2015
  • June 2015
  • May 2015
  • January 2015
  • December 2014
  • September 2014
  • February 2014
  • January 2014
  • October 2013
  • September 2013
  • August 2013
  • February 2013
  • January 2013
  • November 2012
  • October 2012
  • September 2012
  • July 2012
  • August 2011
  • June 2011
  • May 2011
  • ♥ Bangladesh ♥
    Log in · Privacy Policy · Contact
    Copyright © 2011 Saidul Hassan

  • DMCA

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.