wget is a command-line, open-source utility to download files and web pages from the internet. It gets data from the internet and displays it in your terminal or saves it to a file. This utility is non-interactive. You can get the most out of it through scripts or even schedule file downloads.
Typically, web browsers also download files except, by default, they render the information in a graphical window and require a user to interact with them. Other Linux users can use the CURL command to transfer data from a network server.
Let’s show you how use the wget command to download web pages and files from the internet.
Installing wget on ubuntu
To install wget on Ubuntu/Debian based Linux systems:
$ apt-get install wget
Downloading a file with the wget command
You can download a file with wget by providing a specific link to a URL. If your URL defaults to index.html, then the index page is downloaded. By default, the content downloads to a file with the same filename in your current working directory. The wget command also provides several options to pipe the output to less or tail.
rolando@enterprise ~> wget http://example.com --2020-10-04 03:33:33-- http://example.com/ Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946 Connecting to example.com (example.com)|93.184.216.34|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1256 (1.2K) [text/html] Saving to: ‘index.html.1’ index.html.1 100%[======================>] 1.23K --.-KB/s in 0s 2021-11-09 12:06:03 (49.7 MB/s) - ‘index.html.1’ saved [1256/1256]
Sending downloaded data to standard output
You can use the -output-document with a dash – character to send your downloaded data to standard output.
wget http://example.com --output-document - <!doctype html> <html> <head> Unpacking pamu2fcfg (1.2.0-1~ppa1~hirsute1) ... Setting up pamu2fcfg (1.2.0-1~ppa1~hirsute1) ... ; <<>> DiG 9.16.8-Ubuntu <<>> example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40774 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 <meta name="viewport" content="width=device-width, initial-scale=1" /> <style type="text/css"> body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; } div { width: 600px; margin: 5em auto; padding: 2em; background-color: #fdfdff; border-radius: 0.5em; box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02); } a:link, a:visited { color: #38488f; text-decoration: none; } @media (max-width: 700px) { div { margin: 0 auto; width: auto; } } </style> </head> <body> <div> <h1>Example Domain</h1> <p>This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.</p> <p><a href="https://www.iana.org/domains/example">More information...</a></p> </div> </body> </html> - 100%[=====================================================================================>] 1.23K --.-KB/s in 0s 2020-10-04 03:33:39 (86.5 MB/s) - written to stdout [1256/1256]
Saving downloads with a different file name
You can use the –output-document option or -O to specify a different output file name for your download.
$ wget http://example.com --output-document foo.html or $ wget http://example.com -O foofoofoo.html
Downloading a sequence of files
Wget can download several files if you know the location and file name pattern of the files. You can use Bash syntax to specify a range of integers to represent a sequence of file names from start to end.
$ wget http://example.com/filename_{1..7}.webp
Downloading multiple pages and files
You can download multiple files with the wget command by specifying all the URLs containing the files to download.
$ wget URL1 URL2 URL3
Resuming a partial download
If you’re downloading large files, there might be interruptions to the download. Wget can determine where your download stopped before it continues with the partial download. It is handy if you’re downloading large files like a Fedora 35 Linux distro ISO. To continue a download, use the –continue or -c option.
$ wget --continue https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-desktop-amd64.iso
Managing recursive downloads with the wget command
Use the –recursive or -r option to Turn on recursive downloads with the wget command. The wget recursive mode crawl through a provided site URL and follows all links up to the default or a specified maximum depth level.
$ wget -r example.com
By default, the maximum recursive download depth is 5. However, wget provides the -l option to specify your maximum recursion depth.
$ wget -r -l 11 example.com
You can specify infinite recursion with the ‘-l 0’ option. For example, wget will download all the files on a website if you set the maximum depth to zero (-l 0).
Converting links for local viewing
The –convert-links is yet another essential wget option that converts links to make them suitable for local viewing.
$ wget -r l 3 --convert-links example.com
Downloading Specific File Types
You can use the -A option with the wget command to download specific file types during recursive downloads. For example, use the following wget command to download pdf files from a website.
$ wget -A '*.pdf -r example.com
Downloading Files From FTP Server
The wget command can come in handy when you need to download files from an FTP Server.
$ wget --ftp-user=username --ftp-password=password ftp://example.com/myfile.pdf
the ftp server can be a FQDN or an IP number
You can also use the -r recursive option with the FTP protocol to download FTP files recursively.
$ wget -r --ftp-user=username --ftp-password=pass ftp://192.168.1.9/
Setting max download size with wget command
You can set the max download size during recursive file retrievals using the –quota flag option. You can specify download size in bytes (default), kilobytes (k suffix), or megabytes (m suffix). The download process will be aborted when the limit is exceeded.
$ wget -r --quota=1024m fosslinux.com
Note that download quotas do not affect downloading a single file.
Setting download speed limit with wget command
You can also use the wget –limit-rate flag option to limit download speed when downloading files. For example, the following command will download the ‘foofoo.tar.gz’ file and limits the download speed to 256KB/s.
$ wget --limit-rate=256k URL/ foo.tar.gz
Note that you can express the desired download rate in bytes (no suffix), kilobytes (using k suffix), or megabytes (using m suffix).
Mirroring a website with the wget command
You can download or mirror an entire site, including its directory structure with the –mirror option. Mirroring a site is similar to recursive download with no maximum depth level. You can also use the –recursive –level inf –timestamping –no-remove-listing option, which means it’s infinitely recursive.
You can also use wget to archive a site with the –no-cookies –page-requisites –convert-links options. It will download complete pages and ensure that the site copy is self-contained and similar to the original site.
$ wget --mirror --convert-links fosslinux.com $ wget -recursive --level inf --timestamping –no-remove-listing
Note that archiving a site will download a lot of data especially if the website is old.
Reading URLs from a text file
The wget command can read multiple URLs from a text file using the -i option. The input text file can contain multiple URLs, but each URL has to start in a new line.
$ wget -i URLS.txt
Modifying HTML headers
HTTP header information is one of the metadata information embedded in the packets that computers send to communicate during data exchange. For example, every time you visit a website, your browser sends HTTP request headers. You can use the –debug option to reveal the header information wget sends to your browser for each request.
rolando@enterprise ~/Test> wget --debug example.com DEBUG output created by Wget 1.21 on linux-gnu. Reading HSTS entries from /home/rolando/.wget-hsts URI encoding = ‘UTF-8’ Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8) --2021-12-08 01:36:45-- http://example.com/ Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946 Caching example.com => 93.184.216.34 2606:2800:220:1:248:1893:25c8:1946 Connecting to example.com (example.com)|93.184.216.34|:80... connected. Created socket 3. Releasing 0x00005583c8e10870 (new refcount 1). ---request begin--- GET / HTTP/1.1 User-Agent: Wget/1.21 Accept: */* Accept-Encoding: identity Host: example.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Age: 548152 Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Wed, 08 Dec 2021 06:36:45 GMT Etag: "3147526947+ident" Expires: Wed, 15 Dec 2021 06:36:45 GMT Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT Server: ECS (mic/9A9C) Vary: Accept-Encoding X-Cache: HIT Content-Length: 1256 ---response end--- 200 OK Registered socket 3 for persistent reuse. URI content encoding = ‘UTF-8’ Length: 1256 (1.2K) [text/html] Saving to: ‘index.html.1’ index.html.1 ====================================================>] 1.23K --.-KB/s in 0s 2021-12-08 01:36:45 (95.7 MB/s) - ‘index.html.1’ saved [1256/1256]
Running wget command as a web spider
You can make the wget command function as a web spider using the –spider option. In essence, it will not download any web pages but will only check that they are there. Moreover, any broken URLs will be reported.
$ wget -r --spider example.com
Running wget command in the background
You can use the -b / –background option to run the wget process in the background. It is essential if you are downloading large files that will take longer to complete.
$ wget -b example.com/latest.tar.gz
By default, the output of the wget process is redirected to ‘wget-log’. However, you can specify a different log file with the -o option.
To monitor the wget process, use the tail command.
$ tail -f wget-log
Changing the User-Agent the wget command
You can change the default User Agent with the –user-agent option. For example, you can use ‘Mozilla/4.0’ as wget User-Agent to retrieve fosslinux.com with the following command.
$ wget --user-agent='Mozilla/4.0' fosslinux.com