Freepages-Help-L Archives

Archiver > Freepages-Help > 2002-11 > 1038539590


From: Elsi <>
Subject: Re: Fw: [FreeHelp] FINDING MY WEBPAGE ON ROOTSWEB
Date: Thu, 28 Nov 2002 21:13:49 -0600
References: <016101c29597$1ba82580$0100007f@pat><3.0.1.32.20021127115347.0125db90@mail.augustmail.com><5.1.0.14.0.20021128190620.022b17d0@pop3.norton.antivirus>
In-Reply-To: <3DE6D381.32B75AD7@YAHOO.com>


At 09:40 PM 11/28/2002 -0500, you wrote:
>Pat Asher wrote:
>
>> Rod,
>>
>> It is not a "problem". It is the difference in the way a spider works
>> (http), and the cron job that reads the TITLE tags for the Freepages index
>> from the server (not http).
>>
>> Pat
>
> I consider it a problem in that it is an undocumented "quirk", a
deviation from
>the way users expect the title tags to be found. IMO, this needs to be
either fixed
>-- so the TITLE tags are extracted in the same way that users experience
it with
>their user agents -- or documented as an exception to the usefulness of
the ssi
>stuff.

Rod - I'll be honest in saying that you are the very first person I've ever
run into who used SSI for the TITLE of a web page. I don't think it even
occurred to the guys who wrote the directory indexer -- hence the lack of
"documentation" for the way in which this works. The directory indexer
opens one and only one file in your xxxxx_html directory and reads the
contents of the <.title>yyyy<./title> line. I don't think anyone ever
thought that in this file one would include the TITLE via SSI. SSI is
typically used to bring the same information/text into many pages and no
one ever considered that you'd want lots of different pages with the same
TITLE in them.

> Also, I have not yet received a definitive answer about the spider that
>constructs the indexes for SearchThingy: Does it or does it not use the HTML
>protocol? I don't want to wait 6 months only to discover then that, oh yeah,
>the spider doesn't use the HTML protocol so your pages will never be indexed.

The RootsWeb spider -- like the Altavista & Google spiders -- accesses your
pages via HTTP, receiving exactly the same page (well, minus any JavaScript
'write' commands) as the user at the browser.

Elsi


This thread: