100 Useful Command-Line Utilities

by Oliver; 2014

95. wget

wget is a tool for downloading files from the web. If the movie test.MOV is hosted on example.com, you can grab it with wget:

$ wget http://example.com/test.MOV

Of course, you can batch this using a loop. Suppose that there are a bunch of files on example.com and you put their names in a text file, list.txt. Instead of surfing over to the page and having to click each link in your browser, you can do:

$ cat list.txt | while read i; do echo $i; wget "http://example.com/"${i}; done

or, equivalently:

$ while read i; do echo $i; wget "http://example.com/"${i}; done < list.txt

As a concrete example, to get and untar the latest version (as of this writing) of the GNU Coreutils, try:

$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.23.tar.xz
$ tar -xvf coreutils-8.23.tar.xz

You can also use wget to download a complete offline copy of a webpage. This page in Linux Journal describes how. I will quote their example verbatim:

$ wget \
   --recursive \
   --no-clobber \
   --page-requisites \
   --html-extension \
   --convert-links \
   --restrict-file-names=windows \
   --domains website.org \
   --no-parent \
       www.website.org/tutorials/html/

This command downloads the Web site www.website.org/tutorials/html/.

The options are:
--recursive: download the entire Web site.
--domains website.org: don't follow links outside website.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).

<PREV NEXT>