|
From
TNPC issue #3.21...
Pulling Down Web Content for Offline Browsing
by Al Gordon
October 12, 2000
I was complaining the other day to my colleague Dan Butler about
the problems associated with pulling multiple Web pages down from
a site--a series of newspaper articles, for example. Saving each
page one by one can be tedious.
Said Dan, "Have you tried Teleport Pro?"
Um, no. I hadn't.
So I did.
Teleport Pro ($39.95) turns out to be the product of Tennyson
Maxwell Information Systems Inc. in Cambridge, MA, which is the
city next to mine. It's the classic case of looking worldwide to
find something that was available on the next block.
The program technically is a "Web spider," but I like to think of
it as a really nifty do-it-yourself search engine. The real-world
uses are numerous. For example, you can easily download a
complete site, including its navigation structure, so you can set
up a mirror Web site, track changes in a Web site, or schedule
scans of a site at a time of your choosing. (Be aware that there
are copyright issues as well as server traffic considerations
with mirroring a Web site. -- Ed.)
You can download a collection of Web pages optimized for off-line
browsing; Teleport will rearrange the links appropriately. The
latter is the feature I use constantly. News Web sites tend to
keep content up for a short period of time, so if you want to
save an electronic "clip" you need to download it.
Teleport opens up with a Windows Explorer-like interface, split
into two frames. You can use a convenient wizard to start a job--
"called a project"--or set your parameters manually. I found that
a combination of both was the way to go: use the wizard to get
started and fine-tune with manual settings.
It is that rare bit of software that almost works too well. Until
I learned how to set up a project properly (with a little
coaching from the Tenmax folks), I tended to overdo. I pointed
the program at a press release on Compaq's Web site, and nearly
downloaded all of the company's Web pages.
The default setting is to go several links out from the starting
page of your project, which is good for typical personal Web site
or corporate intranet needs. However, beware of those pages that
have those navigation bars (usually down the left side) chock
full of links. Teleport, like any good spider, sees those links
and chases after them. Unless you really do want to download an
entire corporate Web site, the best way to go is to start your
project with the shortest possible scope--zero links deep (that
is the view of just the page you accessed and its supporting
graphics, etc.) or one link (the page and those pages to which
you can go directly from that page). Then, add depth one link at
a time. Keep expanding your search until you get the level of
information you need.
Teleport allows you to search a Web site for keywords (although I
would like to see this feature expanded to include more Boolean
operators), so that you can find pages that have specific
information you are seeking. You can exclude unwanted URLs and
file types, again to narrow down your package. A scheduler lets
you run your project whenever you want.
Downloading Web content can be tedious; Teleport makes it easy.
http://www.TheNakedPC.com/t/321/tr.cgi?al1
You can reach Al Gordon at:
mailto:al@TheNakedPC.com
Copyright © 2000, PRIME Consulting Group, Inc. and Dan Butler.
All Rights Reserved.
The Naked PC is a trademark of PRIME Consulting Group, Inc.
ISSN: 1522-4422
You may reprint an article from TNPC as long as you show the
entire article and include the authors byline, excerpt and
subscription information as shown:
Pulling Down Web Content for Offline Browsing
by Al Gordon
(This article originally appeared in The Naked PC
newsletter #3.21, subscribe at http://www.TheNakedPC.com)
|