АБРАКАДАБРА (Тоже самое но в читаемом виде)
Zapret indeksacii https s pomo6'u .htaccess
I tak, v etot raz xotel bi rasskazat' o nebol'6oy problemke, voznik6ey na odnom iz moix saytov. A slu4ilos' sleduu6ee, Google umudrilsa “s&est'” https-versiu sayta, malo togo, on vibral osnovnim zerkalom https-mordu. Stoit li govorit', 4to pozicii sayta posle takogo nejelatel'nogo uveli4enia dubley (a https versia sayta bila to4noy kopiey http versii, i sobstvenno ne prednazna4alas' dla poiskovix botov) proseli.
Pervim delom idu smotret' 4to posovetuet sam Google… google.com/support/webmasters
|
Block or remove your entire website using a robots.txt file
To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root: User-agent: *
Disallow: /
To remove your site from Google only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root: User-agent: Googlebot
Disallow: /
Each port must have its own robots.txt file. In particular, if you serve content via both http and https, youll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, youd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt): User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt): User-agent: *
Disallow: / |
Dla kajdogo porta doljen bit' sozdan sobstvenniy fayl robots.txt. V 4astnosti, esli ispol'zuutsa protokoli http i https, dla kajdogo iz nix potrebuutsa otdel'nie fayli robots.txt. Naprimer, 4tobi razre6it' poiskovomu robotu Google indeksirovat' vse stranici http i zapretit' skanirovat' https, fayli robots.txt doljni vigladet' tak, kak opisano nije.
Dla protokola http (http://server.ru/robots.txt):
User-agent: * Allow: /
Dla protokola https (https://server.ru/robots.txt):
User-agent: * Disallow: /
No 4to delat' esli http i https-fayli sayta lejat v odnoy papke?
V dannoy situacii na pomo6' pridet fayl .htaccess - sozdaem dla sayta dva fayla robots.txt, perviy fayl budet soderjat' vse neobxodimie dla normal'noy indeksacii sayta instrukcii, a vtoroy budet polnost'u zapre6at' indeksaciu - Disallow: / - kak i rekomenduet Google. Vtoroy fayl mi nazovem robots-https.txt, a v .htaccess zapi6em takie stroki:
RewriteEngine on
RewriteCond %{HTTPS} on RewriteRule ^robots.txt$ robots-https.txt
4to eto zna4it na praktike? Pri obra6enii poiskovogo robota k saytu 4erez http pauk polu4aet standartniy fayl robots.txt, a pri obra6enii 4erez https-port poiskoviy bot polu4it fayl robots-https.txt v kotorom polnost'u zapre6ena indeksacia sayta.
Posle provedenia opisannoy vi6e proceduri na tret'i sutki vse https-stranici moego sayta iz bazi Google is4ezli. Nadeus' komu-nibud' prigoditsa etot nebol'6oy opit
vzato s http://www.svift.org/2007/tools/https-robots-txt
|