如果你没为你的magento网店写robots.txt,过不了几天就会发现google收录你skin文件夹,甚至还有其他系统文件夹。我一直搞不懂,为什么google对magento系统的文件夹这么着迷,用其他网店系统的时候都没遇到过这种情况。下面是magento网店robots.txt的一种写法,主要是禁止google抓取系统文件夹和容易制造重复页面的功能页面,比如商品评论产品标签等,注释就不写了,你应该都看得懂。
# Crawlers Setup
User-agent: *
# Allowable Index
Allow: /*?p=
Allow: /catalog/seo_sitemap/category/
Allow:/catalogsearch/result/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=
# Website Sitemap
Sitemap: http://www.pidanjia.com/sitemap.xml
转载请注明: 文章转自皮蛋家; 本文地址:http://www.pidanjia.com/magento/283.

