Notifications
Clear all

Website Ripper

9 Posts
8 Users
0 Likes
1,110 Views
erowe
(@erowe)
Posts: 144
Estimable Member
Topic starter
 

I was wondering if someone could recommend a good website ripper.

I’m hoping to find a tool or set of tools that would be able to download a user’s entire Facebook, Picassa, Web Mail, or other account pages & content.

Has anyone tried this? Any thoughts?

Colleagues of mine have done this one page (or email) at a time, however I was looking for a way to speed up and automate the process. Something that might scale well…

(The basic assumption here is that the legal bases for doing this are covered and the user is logged on to their account.)

 
Posted : 06/04/2009 7:46 pm
(@kovar)
Posts: 805
Prominent Member
 

Greetings,

I usually take one of two approaches

1) Use Adobe Acrobat's ability to crawl a website and turn each page into a PDF
2) Use wget or httrack to download the entire site

I've not found any tool that'll work 100% reliably on complex sites. What do you do with video, downloadable executables, and the like?

To get absolutely everything, I've done a video capture of a web browser session and just navigated through the entire site by hand. It isn't elegant, but it gets all the drop down menus, streaming video, etc.

-David

 
Posted : 06/04/2009 8:12 pm
(@ronanmagee)
Posts: 145
Estimable Member
 

A tool called wget with the correct parameters should be able to help you out.

If you google wget windows or wget win32 you should find a number of versions that you can try out.

I have used a windows gui version that was quite good but can't find the link for it at the min. If I do I'll update the thread.

Ronan

[DOH - beaten to it, but yes - httrack was what I've used before]

 
Posted : 06/04/2009 8:16 pm
(@benclelland)
Posts: 21
Eminent Member
 

Another one I have used before is BlackWidow, although I've not used it for a while I think it can delay the downloading so that it isn't so apparent that a piece of software is downloading a copy of that site.

 
Posted : 06/04/2009 9:10 pm
(@surfandwork)
Posts: 26
Eminent Member
 

I've used HTTrack to copy a website and saved it to a CD to view offline. It does not copy some of the active/script stuff, but a good job overall. And it's free.

http//www.httrack.com/

http//en.wikipedia.org/wiki/HTTrack

 
Posted : 07/04/2009 5:31 am
Saladin
(@saladin)
Posts: 9
Active Member
 

I've used a firefox plugin for this purpose with good results. (also extracted the active scripts in good order - but YMMV, depending on the website)

It's called 'Scrapbook' (currently on version…1.3.3.9)

(install/run firefox, go to /tools/add-ons, select 'get addons', type in scrapbook to search field, select 'add to firefox' from there…)

And the price is right, being free.

Some of the nice options to use when the extraction starts are to restrict the extraction to the domain (so links offsite aren't retrieved), level of links to traverse, filetypes to include/exclude, etc etc.

Best of luck with it!

 
Posted : 07/04/2009 9:32 am
(@surfandwork)
Posts: 26
Eminent Member
 

How to Capture a MySpace Page for Investigative Purposes
http//www.search.org/files/pdf/MySpacePageCaptureGuide.pdf

SEARCH High-Tech Crime Publications
http//www.search.org/programs/hightech/publications/

 
Posted : 07/04/2009 10:06 pm
ecophobia
(@ecophobia)
Posts: 127
Estimable Member
 

HTTrack and IDM (non free) are my two favorite tools.

surfandwork,
good links. Thanks.

 
Posted : 08/04/2009 4:04 pm
(@dccfguru)
Posts: 22
Eminent Member
 

I've used Website Ripper Copier in the past. Works fairly well IMO.

www(dot)tensons(dot)com/products/websiterippercopier/

 
Posted : 08/04/2009 11:40 pm
Share: