Doug Vitale Tech Blog

Wget Download Manager

According to the official Wget FAQs, wget “is a network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols. The program supports recursive retrieval of web-authoring pages as well as FTP sites—you can use wget to make mirrors of archives and home pages or to travel the Web like a WWW robot, checking for broken links.”

Wget runs on UNIX-like operating systems (such as Linux) and has been ported to Microsoft Windows.

To install on Linux, you should use your package manager application such as Synaptic if it doesn’t come by default on your Linux distribution of choice. Wget for Windows is available here. After it is installed, you run it from a command line (cmd.exe). It does not offer a graphical user interface (GUI).

Wget usage options are presented below and in the official GNU Wget user manual and the Wget Wikipedia page. Some excellent examples of wget commands can be found in this LifeHacker.com article and in the Wget Noob’s Guide (169 KB PDF).

Jump to:

The basic syntax for wget commands is:

wget [option]... [URL]...
.

Wget command options

Description

wget -a [logfile] or --append-output=[logfile] Appends to the file specified by [logfile]. Same as -o, only it appends to [logfile] instead of overwriting the old log file. If [logfile] does not already exist, a new file is created.
wget -A [accept_list] or --accept [accept_list] Specifies a comma-separated list of file name suffixes or patterns to accept. The command wget -A gif,jpg will restrict the download to only files ending with ‘gif’ or ‘jpg’.
wget --ask-password Prompts for a password for each connection established. Incompatible with the --password switch.
wget --auth-no-challenge Sends basic HTTP authentication information (plain text username and password) for all requests without first waiting for the server’s challenge.
wget -b or --background Sends Wget to the background immediately after startup. If no output file is specified by -o, output is redirected to wget-log.
wget -B [URL] or --base=[URL] Resolves relative URL links using [URL] as the point of reference, when reading links from an HTML file specified via the -i option (together with --force-html, or when the input file was fetched remotely from a server describing it as HTML). For example, if you specify http://foo/bar/a.html for [URL] and Wget reads ../baz/b.html from the input file, it would be resolved to http://foo/baz/b.html ('/bar/a.html is overwritten by /baz/b.html in the input file).
wget --bind-address=[address] When making client TCP/IP connections, bind to [address] on the local machine, which may be a host name or IP address. Useful for downloading on multihomed hosts.
wget -c or --continue Continues getting a partially-downloaded file (useful for when you want to finish a download started by a previous instance of Wget).
wget --ca-certificate=[file] Uses [file] as the file with the bundle of certificate authorities (CAs) to verify peers. The certificates must be in PEM format.
wget --ca-directory=[directory] Specifies the directory containing CA certificates in PEM format. Using --ca-directory is more efficient than --ca-certificate when many certificates are installed because it allows Wget to fetch certificates on demand.
wget --certificate=[file] Uses the client certificate stored in [file]. Necessary for servers that are configured to require certificates from the clients that connect to them.
wget --certificate-type=[type] Specifies the type of the client certificate. Valid values are PEM (assumed by default) and DER, also known as ASN1.
wget --config=[file] Specifies the location of a startup file you wish Wget to use.
wget --connect-timeout=[seconds] Sets the connection timeout to the number of seconds specified. Connections that take longer to establish will be aborted.
wget --content-disposition Enables experimental (not fully-functional) support for Content-Disposition headers. This option is useful for some file-downloading CGI programs that use Content-Disposition headers to describe what the name of a downloaded file should be.
wget --cut-dirs=[number] Ignores a [number] of directory components. This is useful for getting fine-grained control over the directory where recursive retrieval will be saved. For example, the command wget --cut-dirs=1 ftp.xemacs.org/pub/xemacs/ will result in files being saved locally to ftp.xemacs.org/xemacs/.
wget -d or --debug Enables the display of debugging output.
wget -D [domain_list] or --domains=[domain_list] Limits host spanning; i.e., this specifies the domains for Wget to follow where [domain-list] is a comma-separated list of domains.
wget --default-page=[name] Uses [name] as the default file name when it isn’t known instead of index.html.
wget --delete-after Tells Wget to locally delete every file it downloads after the download is complete.
wget --dns-timeout=[seconds] Sets the DNS lookup timeout to the number of seconds specified. DNS lookups that don’t complete within the specified time will fail. By default, there is no timeout on DNS lookups within Wget.
wget -e [command] or --execute [command] Executes [command] as if it were a part of .wgetrc, the wget startup file. [Command] will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one command, use multiple instances of -e.
wget -E or --adjust-extension Causes the suffix ‘.html’ to be appended to file names without this suffix.
wget --egd-file=[file] Uses [file] as the EGD (Entropy Gathering Daemon) socket.
wget --exclude-directories=[list] Specifies a comma-separated list of directories you wish to exclude from being downloaded; the opposite of --include-directories.
wget --exclude-domains [domain-list] Specifies the domains that are not to be followed; the opposite of -D or --domains.
wget -F or --force-html Forces input files to be treated as HTML files.
wget --follow-ftp Follows the FTP links found in HTML documents. By default Wget ignores all FTP links.
wget --follow-tags=[list] Specifies a comma-separated list of HTML tag/attribute pairs to be considered instead of Wget’s default list.
wget --ftp-user=[username] /--ftp-password=[pass] Specifies the username and password to authenticate to an FTP server. To prevent passwords from being seen, store them in .wgetrc or .netrc and change their access permissions with the chmod command.
wget -h or --help Displays the Wget help message.
wget -H or --span-hosts Enables spanning across hosts when doing recursive retrieving. In other words this configures Wget to follow and download all links it encounters. Normally Wget will not visit hosts outside of the URL domain name you specified on the command line.
wget --header=[header_line] Sends [header_line] along with the rest of the headers in each HTTP request.
wget --http-user=[username] /--http-password=[pass] Specifies the username and password to authenticate to an HTTP server. According to the type of the challenge it receives from the server, Wget will encode the credentials using either the basic (insecure), digest, or Windows NTLM authentication scheme.
wget -i [file] or --input-file=[file] Reads URLs from a local or external [file].
wget -I [list] or --include-directories=[list] Specifies a comma-separated list of directories you wish to follow during the download ([list] may contain wildcards). Other directories are ignored, and the directories are absolute paths.
wget --ignore-case Configures Wget to ignore case-sensitivity when matching files and directories.
wget --ignore-length If Wget repeatedly gets stuck attempting to download the same file, use the --ignore-length switch. Some HTTP servers and CGI programs send out bogus Content-Length headers, causing Wget to assume that not all the document was retrieved.
wget --ignore-tags=[list] Specifies a comma-separated list of certain HTML tags to skip when recursively looking for documents to download; the opposite of --follow-tags.
wget -k or --convert-links Converts the links in the document to make them suitable for viewing locally on the host that is downloading. For example, if the downloaded file /foo/doc.html links to /bar/img.gif (also downloaded), then the link in doc.html will be modified to point to '../bar/img.gif'. Alternatively if the downloaded file /foo/doc.html links to the non-downloaded /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
wget -K or --backup-converted Backs up the original version of a converted file with a ‘.orig’ suffix.
wget --keep-session-cookies Causes --save-cookies to also save session cookies. Session cookies are normally not saved because they are meant to be kept in memory and discarded when you close the Web browser.
wget -l [depth] or --level=[depth] Specifies the maximum depth level for recursive retrievals. The default maximum depth is five layers.
wget -L or --relative Configures Wget to follow relative links only. Relative links are those that do not refer to the web server root. 'a href="foo.gif"' is a relative link; 'a href="http://www.server.com/foo/foo.gif"' is not.
wget --limit-rate=[amount] Limits the download speed to the specified amount of bytes per second. [Amount] may be expressed in bytes (default), kilobytes with the k suffix, or megabytes with the m suffix. For example, wget --limit-rate=20k will limit the download rate to 20KB/s.
wget --load-cookies [file] Loads cookies from [file] before the first HTTP retrieval, where [file] is a text file in the format of Netscape’s cookies.txt file.
wget --local-encoding=[encoding] Configures Wget to use the specified encoding.
wget -m or --mirror Enables the relevant options for mirroring downloads, such as recursion, time-stamping, and infinite recursion depth, and it keeps FTP directory listings.
wget --max-redirect=[number] Specifies the maximum number of redirections to follow for a resource (the default is 20).
wget -N or --timestamping Turns on time-stamping, which allows Wget to be aware of the time of last modification of both local and remote files.
wget -nc or --no-clobber When running Wget without -N, -nc, -r, or -p, downloading the same file in the same directory will result in the original copy of file being preserved and the second copy being named with a numeric suffix, as in ‘file.1′. If that file is downloaded again, the third copy will be named ‘file.2′, and so on. When -nc is specified, this behavior is suppressed and Wget will refuse to download newer copies of ‘file’.
wget -nd or --no-directories Prevents the creation of a hierarchy of directories when retrieving recursively. With this option enabled, all files will get saved to the current directory without clobbering (if a name shows up more than once, the file names will get numeric suffixes; also see the -nc switch above for reference on clobbering).
wget -nH or --no-host-directories Disables the generation of host-prefixed directories. By default, invoking wget -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
wget --no-cache Disables server-side cache (caching is allowed by default). Useful for retrieving and flushing out-of-date documents on proxy servers.
wget --no-check-certificate Disables checking the server certificate against the available certificate authorities, and does not require that the URL host name match the common name presented by the certificate.
wget --no-cookies Disables the use of cookies. The default is to use cookies; however, the storing of cookies is not enabled by default.
wget --no-glob Disables FTP globbing.
wget --no-http-keep-alive Disables the keep-alive function of HTTP downloads.
wget --no-iri Turns off internationalized URI (IRI) support, which is enabled by default. Use --iri to re-enable it.
wget --no-passive-ftp Disables the use of the passive FTP transfer mode, which mandates that the client connect to the server to establish the data connection rather than the other way around.
wget --no-proxy Prevents the use proxies even if the appropriate *_proxy environment variable is defined.
wget --no-remove-listing Disables the removal of the temporary .listing files generated by FTP retrievals (these files normally contain the raw directory listings received from FTP servers).
wget --no-use-server-timestamps Prevents the setting of the local file’s timestamp using the timestamp on the server. By default, when a file is downloaded its timestamps are set to match those from the remote file on the server.
wget -np, --no-parent, or no_parent = on Disallows the retrieval of links that refer to the hierarchy above the beginning directory; i.e., this disallows ascending into any parent directory or directories.
wget -nv or --no-verbose Turns off verbosity but still provides more details than -q or --quiet.
wget -o [logfile] or --output-file=[logfile] Logs all messages to the file specified.
wget -O [file] or --output-document=[file] Documents will be concatenated together and written to [file].
wget -p or --page-requisites Configures Wget to download all the files necessary to properly display HTML pages, including inlined images, sounds, and referenced style sheets.
wget -P [prefix] or --directory-prefix=[prefix] Sets the directory prefix to [prefix]. The directory prefix is the directory where all other files and sub-directories will be saved to, i.e., the top of the retrieval tree. The default directory prefix is ‘.’ (the current directory).
wget --post-data=[string] / --post-file=[file] Uses POST as the method for all HTTP requests and sends the specified data in the request body. --post-data sends [string] as data, while --post-file sends the contents of [file].
wget --prefer-family=[family] Configures Wget to connect to the addresses within the specified address family first when faced with a choice of several address families. Valid families are none, IPv4, and IPv6.
wget --private-key=[file] Reads the private key from [file], which allows you to provide the private key in a file separate from the certificate.
wget --private-key-type=[type] Specifies the type of the private key (valid options are PEM and DER).
wget --progress=[type] Selects the type of the progress indicator you wish to view. Valid options for [type] are dot and bar.
wget --protocol-directories Uses the protocol name as a directory component of local file names. For example, the command wget -r http://host will save to http/host/... rather than just to host/.
wget --proxy-user=[user] / --proxy-password=[pass] Specifies the username as [user] and the password as [pass] for authentication on a proxy server. Wget encodes the credentials using the basic authentication scheme.
wget -q or --quiet Turns off Wget’s output.
wget -Q or --quota Specifies a maximum quota for downloaded files. The value can be specified in bytes (default), in kilobytes (with the k suffix), or megabytes (with the m suffix). Example: wget -Q2m -i sites. Setting quota to 0 or inf sets the quota to unlimited.
wget -r or --recursive Enables recursive downloading, in which Wget is capable of traversing parts of the Web (or a single HTTP/FTP server), following links and directory structures.
wget -R [reject_list] or --reject [reject_list] Specifies a comma-separated list of file name suffixes or patterns to reject. To download a whole page except for .mpeg and .au files, use wget -R mpg,mpeg,au.
wget --random-file=[file] Uses [file] as the source of random data for seeding the pseudo-random number generator on systems without /dev/random.
wget --random-wait Causes the time between requests to vary between 0.5 and 1.5 wait seconds.
wget --read-timeout=[seconds] Sets the read (and write) timeout to the numeric value in [seconds]. If, at any point in the download, no data is received for more than the specified number of seconds, reading fails and the download is restarted. The default read timeout is 900 seconds.
wget --referer=[url] Includes the ‘Referer: [url]‘ header in HTTP requests.
wget --remote-encoding=[encoding] Configures Wget to use the specified encoding. This affects how Wget converts URIs found in files from the remote encoding to UTF-8 during recursive downloads.
wget --restrict-file-names=[modes] Changes which characters found in remote URLs must be escaped during the generation of local file names. The modes are a comma-separated set of text values, such as unix, windows, nocontrol, ascii, lowercase, and uppercase.
wget --retr-symlinks Causes symbolic links to be traversed, and the pointed-to files are then retrieved.
wget --retry-connrefused Configures Wget to consider the “connection refused” response a transient error and try the download again. Normally Wget gives up on a URL when it is unable to connect to the site (the failure to connect is assumed to be a sign that the server is not running).
wget -S or --server-response Displays the headers sent by HTTP servers and the responses sent by FTP servers.
wget --save-cookies [file] Saves cookies to [file] before exiting. This will not save cookies that have expired or session cookies (that have no expiration time).
wget --save-headers Saves the headers sent by the HTTP server to the file (preceding the actual contents, with an empty line as the separator).
wget --secure-protocol=[protocol] Selects the secure protocol to be used. Valid options are auto, SSLv2, SSLv3, and TLSv1.
wget --spider Configures Wget not to act as a downloader but merely as a spider (i.e., it will seek out web pages and verify that they are there).
wget --strict-comments Enables the strict parsing of HTML comments.
wget -t [number] or --tries=[number] Sets the number of download retries to [number]. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like ‘connection refused’ or ‘not found (404)’, which are not retried.
wget -T [seconds] or --timeout=[seconds] Sets the network timeout to the number of specified seconds, after which Wget aborts the operation.
wget --trust-server-names Upon a redirection, the last component of the redirection URL will be used as the local file name.
wget -U [agent_string] or --user-agent=[agent_string] Identifies Wget as [agent_string] in the User-Agent header field to the server.
wget --unlink Forces Wget to unlink the file instead of clobbering the existing file (see -nc above for reference on clobbering). This option is useful for downloading to the directory with hardlinks.
wget --user=[username] / --password=[pass] Specifies the username and password for both FTP and HTTP file retrieval.
wget -v or --verbose Enables verbose output from Wget, which is the default setting.
wget -V or --version Displays the version of Wget.
wget -w [seconds] or --wait=[seconds] Configures Wget to wait the specified number of seconds between retrievals. Instead of seconds, the time can be specified in minutes using the m suffix, in hours using the h suffix, or in days using the d suffix.
wget --waitretry=[seconds] For when you don’t want Wget to wait between every retrieval but only between the retries of failed downloads. Wget will wait 1 second after the first failure on a given file, then wait 2 seconds after the second failure on that file, up to the maximum number of [seconds] you specify. By default, Wget will assume a value of 10 seconds.
wget -x or --force-directories Creates a hierarchy of directories, even if one would not have been created otherwise. For example, the command wget -x http://fly.srk.fer.hr/robots.txt will save the file locally as fly.srk.fer.hr/robots.txt. This is the opposite of wget -nd.
wget -X [list], --exclude [list], or exclude_directories = [list] Specifies a list of directories excluded from the download. For example, to exclude the /cgi-bin directory, use the command wget -X /cgi-bin. This is the opposite of -I [list] or --include-directories=[list].
wget -4 or --inet4-only Specifies connections only to hosts using IPv4.
wget -6 or --inet6-only Specifies connections only to hosts using IPv6.


Wget command examples

wget http://dougvitale.wordpress.com/
The default command to download a URL.

wget -i [file]
Downloads the URLs specified in [file], such as list.txt.

wget --limit-rate=100k http://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz
Downloads the tar.gz file at a maximum of 100 Kb/s.

wget --tries=45 http://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz
Wget will try to download the file until it either gets it all or it exceeds the default number of retries (which is 20). You can alter this default as shown above.

wget --reject=gif http://dougvitale.wordpress.com/
Tells Wget not to download GIFs when it downloads the URL provided.

wget -r http://dougvitale.wordpress.com/ -o douglog
Creates a five-level deep mirror image of the URL provided, with the log of activities saved to douglog.

wget -r -A .mp3 http://dougvitale.wordpress.com/
Downloads only MP3s from the URL provided.

wget -p --convert-links http://dougvitale.wordpress.com/dir/page.html
Downloads only page.html and all the elements needed to properly display it, such as style sheets and images.

wget -r -nd -np http://dougvitale.wordpress.com/music/
Downloads all the files in the /music/ directory without going up into any parent directories (-np) and without locally recreating the directory structure (-nd).

wget -c http://linux-distro.com/iso/linuxdistro.iso
Tells Wget to continuously try downloading a file until successful.

wget -k -m http://dougvitale.wordpress.com/
Fully downloads (mirrors) the website provided and changes the links to work locally.


Recommended reading

If you found the content of this article helpful and want to expand your knowledge further, please consider buying a relevant book or two using the links below. Thanks!

Complete Guide to Internet Privacy, Anonymity & Security on Amazon Internet Privacy & Anonymity  P2P Networking and Applications on Amazon P2P Networking and Applications

Google Hacking for Penetration Testers on Amazon Google Hacking for Penetration Testers Google Hacks on Amazon Google Hacks

About these ads

Written by Doug Vitale

November 2, 2011 at 3:02 PM

%d bloggers like this: