Using wget Linux command

wget is a command-line, open-source utility to download files and web pages from the internet. It gets data from the internet and displays it in your terminal or saves it to a file. This utility is non-interactive. You can get the most out of it through scripts or even schedule file downloads.

Typically, web browsers also download files except, by default, they render the information in a graphical window and require a user to interact with them. Other Linux users can use the CURL command to transfer data from a network server.

Let’s show you how use the wget command to download web pages and files from the internet.

Installing wget on ubuntu

To install wget on Ubuntu/Debian based Linux systems:

$ apt-get install wget

Downloading a file with the wget command

You can download a file with wget by providing a specific link to a URL. If your URL defaults to index.html, then the index page is downloaded. By default, the content downloads to a file with the same filename in your current working directory. The wget command also provides several options to pipe the output to less or tail.

rolando@enterprise ~> wget http://example.com
--2020-10-04 03:33:33-- http://example.com/
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘index.html.1’
index.html.1 100%[======================>] 1.23K --.-KB/s in 0s
2021-11-09 12:06:03 (49.7 MB/s) - ‘index.html.1’ saved [1256/1256]

Sending downloaded data to standard output

You can use the -output-document with a dash – character to send your downloaded data to standard output.

wget http://example.com --output-document -

<!doctype html>
<html>
<head>
Unpacking pamu2fcfg (1.2.0-1~ppa1~hirsute1) ...
Setting up pamu2fcfg (1.2.0-1~ppa1~hirsute1) ...
; <<>> DiG 9.16.8-Ubuntu <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40774
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
-                                         100%[=====================================================================================>]   1.23K  --.-KB/s    in 0s      

2020-10-04 03:33:39 (86.5 MB/s) - written to stdout [1256/1256]

Saving downloads with a different file name

You can use the –output-document option or -O to specify a different output file name for your download.

$ wget http://example.com --output-document foo.html

or

$ wget http://example.com -O foofoofoo.html

Downloading a sequence of files

Wget can download several files if you know the location and file name pattern of the files. You can use Bash syntax to specify a range of integers to represent a sequence of file names from start to end.

$ wget http://example.com/filename_{1..7}.webp

Downloading multiple pages and files

You can download multiple files with the wget command by specifying all the URLs containing the files to download.

$ wget URL1 URL2 URL3

Resuming a partial download

If you’re downloading large files, there might be interruptions to the download. Wget can determine where your download stopped before it continues with the partial download. It is handy if you’re downloading large files like a Fedora 35 Linux distro ISO. To continue a download, use the –continue or -c option.

$ wget --continue https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-desktop-amd64.iso

Managing recursive downloads with the wget command

Use the –recursive or -r option to Turn on recursive downloads with the wget command. The wget recursive mode crawl through a provided site URL and follows all links up to the default or a specified maximum depth level.

$ wget -r example.com

By default, the maximum recursive download depth is 5. However, wget provides the -l option to specify your maximum recursion depth.

$ wget -r -l 11 example.com

You can specify infinite recursion with the ‘-l 0’ option. For example, wget will download all the files on a website if you set the maximum depth to zero (-l 0).

Converting links for local viewing

The –convert-links is yet another essential wget option that converts links to make them suitable for local viewing.

$ wget -r l 3 --convert-links example.com

Downloading Specific File Types

You can use the -A option with the wget command to download specific file types during recursive downloads. For example, use the following wget command to download pdf files from a website.

$ wget -A '*.pdf -r example.com

Downloading Files From FTP Server

The wget command can come in handy when you need to download files from an FTP Server.

$ wget --ftp-user=username --ftp-password=password ftp://example.com/myfile.pdf

the ftp server can be a FQDN or an IP number

You can also use the -r recursive option with the FTP protocol to download FTP files recursively.

$ wget -r --ftp-user=username --ftp-password=pass ftp://192.168.1.9/

Setting max download size with wget command

You can set the max download size during recursive file retrievals using the –quota flag option. You can specify download size in bytes (default), kilobytes (k suffix), or megabytes (m suffix). The download process will be aborted when the limit is exceeded.

$ wget -r --quota=1024m fosslinux.com

Note that download quotas do not affect downloading a single file.

Setting download speed limit with wget command

You can also use the wget –limit-rate flag option to limit download speed when downloading files. For example, the following command will download the ‘foofoo.tar.gz’ file and limits the download speed to 256KB/s.

$ wget --limit-rate=256k URL/ foo.tar.gz

Note that you can express the desired download rate in bytes (no suffix), kilobytes (using k suffix), or megabytes (using m suffix).

Mirroring a website with the wget command

You can download or mirror an entire site, including its directory structure with the –mirror option. Mirroring a site is similar to recursive download with no maximum depth level. You can also use the –recursive –level inf –timestamping –no-remove-listing option, which means it’s infinitely recursive.

You can also use wget to archive a site with the –no-cookies –page-requisites –convert-links options. It will download complete pages and ensure that the site copy is self-contained and similar to the original site.

$ wget --mirror --convert-links fosslinux.com 
$ wget -recursive --level inf --timestamping –no-remove-listing

Note that archiving a site will download a lot of data especially if the website is old.

Reading URLs from a text file

The wget command can read multiple URLs from a text file using the -i option. The input text file can contain multiple URLs, but each URL has to start in a new line.

$ wget -i URLS.txt

Modifying HTML headers

HTTP header information is one of the metadata information embedded in the packets that computers send to communicate during data exchange. For example, every time you visit a website, your browser sends HTTP request headers. You can use the –debug option to reveal the header information wget sends to your browser for each request.

rolando@enterprise ~/Test> wget --debug example.com
DEBUG output created by Wget 1.21 on linux-gnu.

Reading HSTS entries from /home/rolando/.wget-hsts
URI encoding = ‘UTF-8’
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2021-12-08 01:36:45--  http://example.com/
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Caching example.com => 93.184.216.34 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
Created socket 3.
Releasing 0x00005583c8e10870 (new refcount 1).

---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.21
Accept: */*
Accept-Encoding: identity
Host: example.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK
Age: 548152
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Wed, 08 Dec 2021 06:36:45 GMT
Etag: "3147526947+ident"
Expires: Wed, 15 Dec 2021 06:36:45 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (mic/9A9C)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

---response end---
200 OK
Registered socket 3 for persistent reuse.
URI content encoding = ‘UTF-8’
Length: 1256 (1.2K) [text/html]
Saving to: ‘index.html.1’

index.html.1                          ====================================================>]   1.23K  --.-KB/s    in 0s      

2021-12-08 01:36:45 (95.7 MB/s) - ‘index.html.1’ saved [1256/1256]

Running wget command as a web spider

You can make the wget command function as a web spider using the –spider option. In essence, it will not download any web pages but will only check that they are there. Moreover, any broken URLs will be reported.

$ wget -r --spider example.com

Running wget command in the background

You can use the -b / –background option to run the wget process in the background. It is essential if you are downloading large files that will take longer to complete.

$ wget -b example.com/latest.tar.gz

By default, the output of the wget process is redirected to ‘wget-log’. However, you can specify a different log file with the -o option.

To monitor the wget process, use the tail command.

$ tail -f wget-log

Changing the User-Agent the wget command

You can change the default User Agent with the –user-agent option. For example, you can use ‘Mozilla/4.0’ as wget User-Agent to retrieve fosslinux.com with the following command.

$ wget --user-agent='Mozilla/4.0' fosslinux.com