Title : Download files listed in a http index with wget
Date : 16 June 2020
Tags : wget internet
Sometimes I need to download files through http from a list on an "autoindex"
page and it's always painful to find a correct command for this.
The easy solution is **wget** but you need to use the correct parameters
because wget has a lot of mirroring options but you only want specific ones to
achieve this goal.
I ended up with the following command:
wget --continue --accept "*.tgz" --no-directories --no-parent --recursive http://ftp.fr.openbsd.org/pub/OpenBSD/6.7/amd64/
This will download every tgz files available at the address given as last parameter.
The parameters given will filter to only download the **tgz** files, put the
files in the current working directory and most important, don't try to escape
to the parent directory to start downloading again. The `--continue`` parameter
allow to interrupt wget and start again, downloaded file will be skipped and
partially downloaded files will be completed.
**Do not reuse this command if files changed on the remote server** because
continue feature only work if your local file and the remote file are the same,
this simply look at the local and remote names and will ask the remote server
to start downloading at the current byte range of your local file. If meanwhile
the remote file changed, you will have a mix of the old and new file.
Obviously ftp protocol would be better suited for this download job but ftp is
less and less available so I find **wget** to be a nice workaround for this.