You may want to patch your robots.txt

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

You may want to patch your robots.txt

McBride, Ian S.
We just found out that Google is blocking the indexing of some of our pages because the default Drupal robots.txt blocks /misc/ and /modules/ and there are some CSS, JS, and image files in those directories that modules load. This is a change to how Google interprets the robots.txt directives that they added a few weeks ago with their change to mobile site rankings, apparently.

Anyway, there's a patch for it here: https://www.drupal.org/node/2364343


Ian McBride
Web Technologies & Services
Middlebury College
[hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=715460
or send a blank email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: You may want to patch your robots.txt

Jay Dansand
Great catch!

It looks like it doesn't affect sites that use JS/CSS aggregation (probably - unless of course some module adds the files with preprocess = FALSE).


-- 
Jay Dansand '08
Senior Web Application Developer
Technology Services, Seeley G. Mudd Library
Lawrence University
Appleton, WI
920-832-6585
[hidden email]

-----Original Message-----
From: McBride, Ian S. [mailto:[hidden email]]
Sent: Thursday, May 07, 2015 10:43 AM
To: Monster Menus Development
Subject: You may want to patch your robots.txt

We just found out that Google is blocking the indexing of some of our pages because the default Drupal robots.txt blocks /misc/ and /modules/ and there are some CSS, JS, and image files in those directories that modules load. This is a change to how Google interprets the robots.txt directives that they added a few weeks ago with their change to mobile site rankings, apparently.

Anyway, there's a patch for it here: https://www.drupal.org/node/2364343


Ian McBride
Web Technologies & Services
Middlebury College
[hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=849053.214f550f57fa54a976b5f2d087b6d379&n=T&l=monster_menus&o=715460
or send a blank email to [hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=715464
or send a blank email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: You may want to patch your robots.txt

McBride, Ian S.
In reply to this post by McBride, Ian S.
Yes, though it does affect image includes. We noticed it because /misc/grippie.png was blocking Google from indexing pages with comment textareas that use the grippie div to make them draggable to enlarge.

Of course, the supplied patch doesn't handle images, so you'd have to do something like:

# JS/CSS/GIF/JPG/JPEG/PNG
Allow: /misc/*.gif
Allow: /misc/*.jpg
Allow: /misc/*.jpeg
Allow: /misc/*.png
Allow: /misc/*.js
Allow: /misc/*.css
Allow: /modules/*.gif
Allow: /modules/*.jpg
Allow: /modules/*.jpeg
Allow: /modules/*.png
Allow: /modules/*.js
Allow: /modules/*.css
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.js
Allow: /profiles/*.css
Allow: /themes/*.gif
Allow: /themes/*.jpg
Allow: /themes/*.jpeg
Allow: /themes/*.png
Allow: /themes/*.js
Allow: /themes/*.css

I'm sure there's a more sane approach.


Ian McBride
Web Technologies & Services
Middlebury College
[hidden email]
________________________________________
From: Jay Dansand <[hidden email]>
Sent: Thursday, May 7, 2015 11:48 AM
To: Monster Menus Development
Subject: RE: You may want to patch your robots.txt

Great catch!

It looks like it doesn't affect sites that use JS/CSS aggregation (probably - unless of course some module adds the files with preprocess = FALSE).


--
Jay Dansand '08
Senior Web Application Developer
Technology Services, Seeley G. Mudd Library
Lawrence University
Appleton, WI
920-832-6585
[hidden email]

-----Original Message-----
From: McBride, Ian S. [mailto:[hidden email]]
Sent: Thursday, May 07, 2015 10:43 AM
To: Monster Menus Development
Subject: You may want to patch your robots.txt

We just found out that Google is blocking the indexing of some of our pages because the default Drupal robots.txt blocks /misc/ and /modules/ and there are some CSS, JS, and image files in those directories that modules load. This is a change to how Google interprets the robots.txt directives that they added a few weeks ago with their change to mobile site rankings, apparently.

Anyway, there's a patch for it here: https://www.drupal.org/node/2364343


Ian McBride
Web Technologies & Services
Middlebury College
[hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=849053.214f550f57fa54a976b5f2d087b6d379&n=T&l=monster_menus&o=715460
or send a blank email to [hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685439.7e7cbccf9bb225cf8471bffe1cb67503&n=T&l=monster_menus&o=715464
or send a blank email to [hidden email]

---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=715466
or send a blank email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: You may want to patch your robots.txt

Dan Wilga-2
In reply to this post by McBride, Ian S.
I don't think we've ever used the default robots.txt file that comes
with Drupal, because we have far too many customizations in it.

One of the most frequent customizations arises from requests along this
line:

   "Help! I just discovered that Document X has been world-readable for
weeks, and Google has cached and can reveal [particular bit of sensitive
information] to the whole world!"

So we end up having to:

1. Move the document somewhere else where it's protected.
2. Add an entry to robots.txt excluding it.
3. Tell Google to re-crawl that page.

Why is #2 necessary? Who knows. It's Google.


---
You are currently subscribed to monster_menus as: [hidden email].
To unsubscribe click here: http://lists.middlebury.edu/u?id=685503.6b071f880fe6a965a128164e6d09ea81&n=T&l=monster_menus&o=715469
or send a blank email to [hidden email]
Loading...