How to change a text look in a message?
Allow robot access to protected content
Hello Guest
  
  • Login
• Register…
• Start blog
  • Who, Where, When
• What can I do?
• What to Read?
  • Polls
• Avatars
• Interests
  • Cities and Countries
• Random blog
• Users search
  • Search
• Games
• Tests
• QAIX
  • Сообщества
• Talxy Chat
• Horoscope
• Online
 
Зарегистрируйся!

QAIX > Search Engine Optimization > Allow robot access to protected content 9 June 2006 01:55:38

  Recent blog posts: 
  They have birthday today: 
  Forums:   
  Discuss: 
  Recent forum topics: 
  Recent forum comments:
  Moderators:

Allow robot access to protected content

Sholom 9 June 2006 01:55:38
 Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.


I can't allow by user-agent since my authentication software doesn't
allow that. Is there any way to give Google a username and password? Or

is there an IP, or range of IPs, that google uses?


Thanks.

Add comment
Sholom 7 June 2006 22:55:01 permanent link ]
 Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.


I can't allow by user-agent since my authentication software doesn't
allow that. Is there any way to give Google a username and password? Or

is there an IP, or range of IPs, that google uses?


Thanks.

Add comment
John Bokma 7 June 2006 22:58:01 permanent link ]
 "Sholom" <sdeen@diamonds.net­> wrote:
Anyone know how to allow Google's robots to index protected content?>
My company has a site that requires a subscription to access the info,> but we'd like to have google index those pages. I see there are many> sites who've managed this.

Yup, it's called cloaking. I'll report it when I see it.
I can't allow by user-agent since my authentication software doesn't> allow that. Is there any way to give Google a username and password? Or>
is there an IP, or range of IPs, that google uses?

Yes, and this might get you banned.

--
John Freelance Perl programmer: http://castleamber.­com/

A better start menu with Quick Launch:
http://johnbokma.co­m/windows/quick-laun­ch.html
Add comment
Pet @ www.gymratz.co.uk ;В¬) 7 June 2006 23:03:41 permanent link ]
 Sholom wrote:> Anyone know how to allow Google's robots to index protected content?>
My company has a site that requires a subscription to access the info,> but we'd like to have google index those pages.

Huh?

Surely once indexed by Google then it's publicly available content anyway.

Why would a search engine waste resources on crawling content which it
couldn't publish.

Am I missing something here?

--
http://gymratz.co.u­k - Best Gym Equipment & Bodybuilding Supplements UK.
http://trade-price-­supplements.co.uk - TRADE PRICED SUPPLEMENTS for ALL!
http://fitness-equi­pment-uk.com - UK's No.1 Fitness Equipment Suppliers.
http://Water-Rower.­co.uk - Worlds best prices on the Worlds best Rower.
Add comment
Borek 7 June 2006 23:03:41 permanent link ]
 On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@diamonds.net­> wrote:
Anyone know how to allow Google's robots to index protected content?>
My company has a site that requires a subscription to access the info,> but we'd like to have google index those pages. I see there are many> sites who've managed this.

Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two things
at the same time - first, I read cached content. Second, I report such
site to Google.

Best,
Borek
--
http://www.chembudd­y.com
http://www.ph-meter­.info/pH-Nernst-equa­tion
http://www.terapia-­kregoslupa.waw.pl
Add comment
Sholom 7 June 2006 23:53:42 permanent link ]
 Thanks to all for the replies. I had no idea this was such a sensitive
issue. If our publication has information that could be helpful to
someone, I figured they should know about it. I guess that upsets some.

However, I'm pretty sure I've come across dozens of legitimate sites on
SE's, particularly on Google News, that require registration. Wall
Street Journal is one example that comes to mind. Their link shows up
with a "(subscription)" tag next to it, and I was just wondering how
they get that done.

I may be misunderstanding the terms and concepts here; perhaps Google
News is not strictly a search engine, and it is only there that it's
allowed.

(As an aside, re the cache issue, I was under the impression that a
"robots=nocache" meta tag prevents the search engine from showing a
cached page.)

Add comment
Big Bill 8 June 2006 00:46:19 permanent link ]
 On 7 Jun 2006 18:58:01 GMT, John Bokma <john@castleamber.c­om> wrote:
"Sholom" <sdeen@diamonds.net­> wrote:>
Anyone know how to allow Google's robots to index protected content?>>
My company has a site that requires a subscription to access the info,>> but we'd like to have google index those pages. I see there are many>> sites who've managed this.>
Yup, it's called cloaking. I'll report it when I see it.>
I can't allow by user-agent since my authentication software doesn't>> allow that. Is there any way to give Google a username and password? Or>>
is there an IP, or range of IPs, that google uses? >
Yes, and this might get you banned.

Go talk to fantomaster. www.fantomaster.com­

BB
--

http://www.kruse.co­.uk/seo-services.htm­
http://www.here-be-­posters.co.uk/lempic­ka-prints.htm
http://www.crystal-­liaison.com/armani/i­ndex.html

Add comment
John Bokma 8 June 2006 00:59:08 permanent link ]
 "Sholom" <sdeen@diamonds.net­> wrote:
Thanks to all for the replies. I had no idea this was such a sensitive> issue.

Of course it is. Do you like it when on Google's SERP it appears that the
content is freely available and next you're greeted with a register page?
(As an aside, re the cache issue, I was under the impression that a> "robots=nocache" meta tag prevents the search engine from showing a> cached page.)

Yup, that's cloaking, and I report it when I see it.

--
John Freelance Perl programmer: http://castleamber.­com/

Creating a customized Command Prompt shortcut:
http://johnbokma.co­m/windows/command-pr­ompt-shortcut.html
Add comment
Roy Schestowitz 8 June 2006 06:34:31 permanent link ]
 __/ [ Borek ] on Wednesday 07 June 2006 20:03 \__
On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@diamonds.net­> wrote:>
Anyone know how to allow Google's robots to index protected content?>>
My company has a site that requires a subscription to access the info,>> but we'd like to have google index those pages. I see there are many>> sites who've managed this.>
Easy way to get banned.>
I hate sites that are indexed but not accessible. Usually I do two things> at the same time - first, I read cached content. Second, I report such> site to Google.

There is a way around this. Change user-agent string to googlebot and you're
in. To be honest, I didn't know this trick until somebody told me last week.
And I agree with Borek: it's annoying and given that it's a mild form of
cloaking (different content served to SE's and people or hiding
information), it is basis for banishment.

Best wishes,

Roy

--
Roy S. Schestowitz | {Hide sig} {Show sig} >{Close Application}<
http://Schestowitz.­com | Free as in Free Beer В¦ PGP-Key: 0x74572E8E
3:30am up 41 days 9:03, 11 users, load average: 2.47, 1.24, 0.78
http://iuron.com - semantic engine to gather information
Add comment
John Bokma 8 June 2006 08:23:20 permanent link ]
 Roy Schestowitz <newsgroups@schesto­witz.com> wrote:
__/ [ Borek ] on Wednesday 07 June 2006 20:03 \__>
On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@diamonds.net­>>> wrote: >>
Anyone know how to allow Google's robots to index protected content?>>>
My company has a site that requires a subscription to access the>>> info, but we'd like to have google index those pages. I see there>>> are many sites who've managed this.>>
Easy way to get banned.>>
I hate sites that are indexed but not accessible. Usually I do two>> things at the same time - first, I read cached content. Second, I>> report such site to Google.>
There is a way around this. Change user-agent string to googlebot and> you're in.

If they check for that, yup. Some sites check for the crawlers, based on
IP or name.

To be honest, I didn't know this trick until somebody told> me last week.

Wasn't me, but 2+ years ago:
http://johnbokma.co­m/mexit/2004/04/24/c­hanginguseragent.htm­l

Funny, I notice that I have a link to report spam with google on my site
:-D­ My site is getting too big. Or maybe I should say: a site is getting
good when you limit Google to your site when looking for some info (which
I do now and then, I even made a special keymark for it :-D­

--
John isa Perl programmer: http://johnbokma.co­m/perl/perlprogramme­r.html

Fox G Bar: http://johnbokma.co­m/firefox/google-too­lbar-customizing.htm­l
Add comment
Roy Schestowitz 8 June 2006 11:22:03 permanent link ]
 __/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__
Roy Schestowitz <newsgroups@schesto­witz.com> wrote:>
__/ [ Borek ] on Wednesday 07 June 2006 20:03 \__>>
On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen@diamonds.net­>>>> wrote:>>>
Anyone know how to allow Google's robots to index protected content?>>>>
My company has a site that requires a subscription to access the>>>> info, but we'd like to have google index those pages. I see there>>>> are many sites who've managed this.>>>
Easy way to get banned.>>>
I hate sites that are indexed but not accessible. Usually I do two>>> things at the same time - first, I read cached content. Second, I>>> report such site to Google.>>
There is a way around this. Change user-agent string to googlebot and>> you're in.>
If they check for that, yup. Some sites check for the crawlers, based on> IP or name.


In worse scenarios, if you have no browser extensions, wget can be used to
fetch the page in question. There's the "--user-agent" option.

To be honest, I didn't know this trick until somebody told>> me last week.>
Funny, I notice that I have a link to report spam with google on my site> :-D­ My site is getting too big. Or maybe I should say: a site is getting> good when you limit Google to your site when looking for some info (which> I do now and then, I even made a special keymark for it :-D­


*smile* I can remember the time when I ceased to maintain the sitemap and
lost that visual, conceptual idea of how my site was constructed. It is now
somewhat of a messy Web, which I sometimes try to restructure. Same
situation with E-mail accounts, Web hosts, and domain names.

Best wishes,

Roy

--
Roy S. Schestowitz | Othello for Win32/Linux: http://othellomaste­r.com
http://Schestowitz.­com | Free as in Free Beer В¦ PGP-Key: 0x74572E8E
8:15am up 41 days 13:48, 11 users, load average: 0.95, 0.81, 0.77
http://iuron.com - semantic engine to gather information
Add comment
John Bokma 8 June 2006 20:06:14 permanent link ]
 Roy Schestowitz <newsgroups@schesto­witz.com> wrote:
__/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__

[..]
If they check for that, yup. Some sites check for the crawlers, based>> on IP or name.>
In worse scenarios, if you have no browser extensions, wget can be> used to fetch the page in question. There's the "--user-agent" option.

In worse scenarios that doesn't work, unless you work at Google.

[ website structures ]> *smile* I can remember the time when I ceased to maintain the sitemap> and lost that visual, conceptual idea of how my site was constructed.> It is now somewhat of a messy Web, which I sometimes try to> restructure. Same situation with E-mail accounts, Web hosts, and> domain names.


I think the messy web structure is the best. Websites are rarely a perfect
tree structure.

--
John Freelance Perl programmer: http://castleamber.­com/

Creating a customized Command Prompt shortcut:
http://johnbokma.co­m/windows/command-pr­ompt-shortcut.html
Add comment
Roy Schestowitz 8 June 2006 20:22:15 permanent link ]
 __/ [ John Bokma ] on Thursday 08 June 2006 17:06 \__
Roy Schestowitz <newsgroups@schesto­witz.com> wrote:>
__/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__>
[..]>
If they check for that, yup. Some sites check for the crawlers, based>>> on IP or name.>>
In worse scenarios, if you have no browser extensions, wget can be>> used to fetch the page in question. There's the "--user-agent" option.>
In worse scenarios that doesn't work, unless you work at Google.


Maybe they can set up an account for us. You know... to use as a proxy, via
SSH, or PHPProxy, or whatever. They could even interface it:

http://proxy.google­.com

Imagine the banner. Imagine the integration with Google Wi-Fi, which is at
the moment deployed in SF and the Bay Area.

[ website structures ]>> *smile* I can remember the time when I ceased to maintain the sitemap>> and lost that visual, conceptual idea of how my site was constructed.>> It is now somewhat of a messy Web, which I sometimes try to>> restructure. Same situation with E-mail accounts, Web hosts, and>> domain names.>
I think the messy web structure is the best. Websites are rarely a perfect> tree structure.


Most are progress-driven. No top-down approach . No specification. No plan.
Very natural for sites that expend without a pre-allocated budget (c/f
Google.com), as well as personal sites.

Best wishes,

Roy

--
Roy S. Schestowitz | Windows O/S: chmod a-x internet; kill -9 internet
http://Schestowitz.­com | Free as in Free Beer В¦ PGP-Key: 0x74572E8E
5:15pm up 41 days 22:48, 10 users, load average: 1.14, 1.35, 1.40
http://iuron.com - semantic engine to gather information
Add comment
John Bokma 8 June 2006 22:08:40 permanent link ]
 Roy Schestowitz <newsgroups@schesto­witz.com> wrote:
Most are progress-driven. No top-down approach . No specification. No> plan.

Oh, mine had a specification, and a plan. But the plan didn't include some
things that are now on my site :-)­

But I think it's quite hard to organize information in a good way. What
works for one person is a disaster for the other.

--
John isa Perl programmer: http://johnbokma.co­m/perl/perlprogramme­r.html

Fox G Bar: http://johnbokma.co­m/firefox/google-too­lbar-customizing.htm­l
Add comment
Harlan Messinger 8 June 2006 22:45:19 permanent link ]
 John Bokma wrote:> "Sholom" <sdeen@diamonds.net­> wrote:>
Anyone know how to allow Google's robots to index protected content?>>
My company has a site that requires a subscription to access the info,>> but we'd like to have google index those pages. I see there are many>> sites who've managed this.>
Yup, it's called cloaking. I'll report it when I see it.

Seems strange to me that that would be verboten, considering that's they
way Google CopyrightBuster will function--indexing off-line works that
you'll then probably have to buy if you want to see them once you've
identified them via Google.
Add comment
Dk_sz 9 June 2006 01:43:41 permanent link ]
 
My company has a site that requires a subscription to access the info,>> but we'd like to have google index those pages. I see there are many>> sites who've managed this.>
Yup, it's called cloaking. I'll report it when I see it.

Have you reported webmasterworld then? :-)­

--
best regards
Thomas Schulz
http://www.micro-sy­s.dk/products/sitema­p-generator/
http://www.micro-sy­s.dk/products/websit­e-analyzer/



Add comment
John Bokma 9 June 2006 01:55:38 permanent link ]
 "dk_sz" <dk_sz@hotmail.com>­ wrote:
My company has a site that requires a subscription to access the info,>>> but we'd like to have google index those pages. I see there are many>>> sites who've managed this.>>
Yup, it's called cloaking. I'll report it when I see it.>
Have you reported webmasterworld then? :-)­

Haven't seen them yet. Give a query and I am happy to report them.

--
John isa Perl programmer: http://johnbokma.co­m/perl/perlprogramme­r.html

Fox G Bar: http://johnbokma.co­m/firefox/google-too­lbar-customizing.htm­l
Add comment
 

Add new comment

As:
Login:  Password:  
 
 
  
 
Пожалуйста, относитесь к собеседникам уважительно, не используйте нецензурные слова, не злоупотребляйте заглавными буквами, не публикуйте рекламу и объявления о купле/продаже, а также материалы нарушающие сетевой этикет или УК РФ.


QAIX > Search Engine Optimization > Allow robot access to protected content 9 June 2006 01:55:38

see also:
[JBossCache] - HELP: ClassNotFound…
jboss-head build.644 Build Successful
jboss-head build.652 Build Successful
пройди тесты:
see also:
handheld computer
restauration postgresql database
{censored} this bloody mother…

  Copyright © 2001—2008 QAIX
Idea: Miсhael Monashev
Помощь и задать вопросы можно в сообществе support.qaix.com.
Сообщения об ошибках оставляем в сообществе bugs.qaix.com.
Предложения и комментарии пишем в сообществе suggest.qaix.com.
Информация для родителей.
Write us at:
If you would like to report an abuse of our service, such as a spam message, please .