Freepages-Help-L Archives

Archiver > Freepages-Help > 2002-11 > 1038434223

From: Rod Dav4is <>
Date: Wed, 27 Nov 2002 16:57:09 -0500
References: <016101c29597$1ba82580$0100007f@pat> <> <>

Elsi wrote:

> Rod:
> I certainly need a better description of "this problem".
> The FreePages Directory is created by a job running on the server which
> simply looks in the user's web directories and extracts the <.title> tag
> comments. If your index.html file doesn't have one, then the Directory
> routine won't work. It does not access the index.html file via the HTTP
> server and therefore the SSI and redirections do not affect the directory
> generation.

Yes, I discovered this flaw in trying to get my page into the directory. I
should not have had to "discover" it. It should be documented that redirection and
ssi orders are /not/ accounted by the process extracting the title for the
directories. Or, the flaw should be fixed so that redirection and ssi orders /are/
obeyed in the process.

> The 'Search Thingy' indexer may or may not use an HTTP client. If it does,
> then the SSI substitution will have been done prior to Search Thingy seeing
> the page.

Yes. This is what we expect -- or, at least, what I expect!

> Certainly when Google or other external indexers access our
> pages, they retrieve them with a standard HTTP client.

Yes. This is well known.

> None of these
> indexers will execute any JavaScript or ActiveX that they find on the page.
> Therefore, if you have content that is generated with a JavaScript
> command, it will be invisible to the indexer.

Yes. This is also well known. I use only the most inconsequential scripts --
exclusively simple navigation aids -- and very few of them.

> And, none of the indexers will follow redirections as such.

I'm not so sure about this. I think that most /will/ follow a redirection within
the same domain. At least, Alexa and Google seem to have done so.

> So -- if you want to explain how the problem manifests itself, I might have
> some insight on what (if anything) can be done.

The problem is that my pages have not being indexed by RW in these several
months since first content was placed there. This may be attributable to the assumed
starting point being a redirection to the page in the site where I want to start,
and that redirection being ignored by the crawler.
I also worry, if ssi orders are not being processed (i.e. the crawler is
accessing by other than the standard HTML client), that the most significant parts
of my content (namely, included files) will not be indexed in any case.


This thread: