I tried to download an image using wget but got an error like the following.
--2011-10-01 16:45:42-- http://www.icerts.com/images/logo.jpg
Resolving www.icerts.com... 97.74.86.3
Connecting to www.icerts.com|97.74.86.3|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-10-01 16:45:43 ERROR 404: Not Found.
My browser has no problem loading the image.
What’s the problem?
curl can’t download either.
Thanks.
Sam
asked Oct 1, 2011 at 23:47
2
You need to add the referer field in the headers of the HTTP request. With wget, you just need the —header arg :
wget http://www.icerts.com/images/logo.jpg --header "Referer: www.icerts.com"
And the result :
--2011-10-02 02:00:18-- http://www.icerts.com/images/logo.jpg
Résolution de www.icerts.com (www.icerts.com)... 97.74.86.3
Connexion vers www.icerts.com (www.icerts.com)|97.74.86.3|:80...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: 6102 (6,0K) [image/jpeg]
Sauvegarde en : «logo.jpg»
answered Oct 2, 2011 at 0:05
blotusblotus
4064 silver badges3 bronze badges
2
I had the same problem with a Google Docs URL. Enclosing the URL in quotes did the trick for me:
wget "https://docs.google.com/spreadsheets/export?format=tsv&id=1sSi9f6m-zKteoXA4r4Yq-zfdmL4rjlZRt38mejpdhC23" -O sheet.tsv
answered May 5, 2015 at 21:52
e18re18r
7,4124 gold badges44 silver badges40 bronze badges
1
You will also get a 404 error if you are using ipv6 and the server only accepts ipv4.
To use ipv4, make a request adding -4:
wget -4 http://www.php.net/get/php-5.4.13.tar.gz/from/this/mirror
answered Mar 17, 2013 at 19:27
Eli WhiteEli White
9841 gold badge9 silver badges20 bronze badges
0
I had same problem.
Solved using single quotes like this:
$ wget 'http://www.icerts.com/images/logo.jpg'
wget version in use:
$ wget --version
GNU Wget 1.11.4 Red Hat modified
answered Nov 9, 2017 at 13:57
MauricioMauricio
3934 silver badges9 bronze badges
Wget 404 error also always happens if you want to download the pages from WordPress-website by typing
wget -r http://somewebsite.com
If this website is built using WordPress you’ll get such an error:
ERROR 404: Not Found.
There’s no way to mirror WordPress-website because the website content is stored in the database and wget is not able to grab .php files. That’s why you get Wget 404 error.
I know it’s not this question’s case, because Sam only wants to download a single picture, but it can be helpful for others.
answered Jan 6, 2019 at 12:55
1
Actually I don’t know what is the reason exactly, I have faced this like of problem.
if you have the domain’s IP address (ex 208.113.139.4), please use the IP address instead of domain (in this case www.icerts.com)
wget 192.243.111.11/images/logo.jpg
Go to find the IP from URL https://ipinfo.info/html/ip_checker.php
answered Aug 27, 2020 at 11:23
MafeiMafei
2,6862 gold badges15 silver badges32 bronze badges
I want to add something to @blotus’s answer,
In case adding the referrer header does not solve the issue, May be you are using the wrong referrer (Sometimes the referrer is different from the URL’s domain name).
Paste the URL on a web browser and find the referrer from developer tools (Network -> Request Headers).
answered Jun 25, 2021 at 11:26
I met exactly the same problem while setting up GitHub actions with Cygwin. Only after I used wget --debug <url>
, I realized that URL is appended with 0xd
symbol which is r
(carriage return).
For this kind of problem there is the solution described in docs:
you can also use igncr in the SHELLOPTS environment variable
So I added the following lines to my YAML script to make wget work properly, as well as other shell commands in my GHA workflow:
env:
SHELLOPTS: igncr
answered Nov 24, 2022 at 12:51
Rom098Rom098
2,4254 gold badges34 silver badges51 bronze badges
Asked
11 years, 11 months ago
Viewed
3k times
When trying to use the command line on Windows to wget
a file, I’m getting a 404 error
C:Usersxxxx>wget + http://www.restaurantanzu.com/PDFs/Dinner908.pdf;
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:Program FilesGnuWin32/etc/wgetrc
--2011-06-07 15:59:19-- http://+/
Resolving . 67.199.65.121
Connecting to |67.199.65.121|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-06-07 15:59:21 ERROR 404: Not Found.
blahdiblah
4,7332 gold badges22 silver badges30 bronze badges
asked Jun 7, 2011 at 23:06
Why is the + right after the wget command? Looks like wget is trying to resolve/pull that as a URL based on the end, the «http://+/»
Try it without the +. In fact, try it with just:
wget http://www.restaurantanzu.com/PDFs/Dinner908.pdf
This works for me on a linux box.
answered Jun 8, 2011 at 0:05
baraboombaraboom
8145 silver badges7 bronze badges
Drop the + (as mentioned by @baraboom) and also drop the semi-colon at the end of the line.
answered Jun 8, 2011 at 3:08
jdigitaljdigital
8915 silver badges10 bronze badges
If you use brace expansion with wget
, you can fetch sequentially-numbered images with ease:
$ wget 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
It fetches the first 10 files numbered 90.jpg
to 99.jpg
just fine, but 100.jpg
and onward return a 404: File not found error (I only have 100 images stored on the server). These non-existent files become more of «a problem» if you use a larger range, such as {00..200}
, with 100 non-existent files, it increases the script’s execution time, and might even become a slight burden (or at least annoyance) on the server.
Is there any way for wget
to stop after it has received its first 404 error? (or even better, two in a row, in case there was a missing file in the range for another reason) The answer does not need to use brace expansion; loops are fine too.
asked Jul 22, 2014 at 6:06
2
If you’re happy with a loop:
for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
wget "$url" || break
done
That will run wget
for each URL in your expansion until it fails, and then break
out of the loop.
If you want two failures in a row it gets a bit more complicated:
for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
if wget "$url"
then
failed=
elif [ "$failed" ]
then
break
else
failed=yes
fi
done
You can shrink that a little with &&
and ||
instead of if
, but it gets pretty ugly.
I don’t believe wget
has anything built in to do that.
answered Jul 22, 2014 at 6:13
Michael HomerMichael Homer
74k16 gold badges211 silver badges233 bronze badges
2
You could use the $?
variable to get the return code of wget. If it’s non-zero then it means an error occured and you tally it up until it reached a threshold, then it could break out of the loop.
Something like this off the top of my head
#!/bin/bash
threshold=0
for x in {90..110}; do
wget 'http://www.iqandreas.com/sample-images/100-100-color/'$x'.jpg'
wgetreturn=$?
if [[ $wgetreturn -ne 0 ]]; then
threshold=$(($threshold+$wgetreturn))
if [[ $threshold -eq 16 ]]; then
break
fi
fi
done
The for loop can be cleaned up a bit, but you can understand the general idea.
Changing the $threshold -eq 16
to -eq 24
would mean it would fail 3 times before it would stop, however it wouldn’t be twice in a row, it would be if it failed twice in the loop.
The reason why 16
and 24
are used is that is the total of the return codes.
wget responds with a return code of 8
when it receives a response code that corresponds to an error from the server, and thus 16
is the total after 2 errors.
Stopping when failures only occur twice in a row can be done by resetting the threshold whenever wget
succeeds, i.e. when the return code is 0
A list of wget return codes can be found here — http://www.gnu.org/software/wget/manual/html_node/Exit-Status.html
answered Jul 22, 2014 at 6:13
LawrenceLawrence
2,25215 silver badges15 bronze badges
2
IMO, focusing in on wget
‘s exit code/status may be too naive for some use-cases, so here is one that considers the HTTP Status Code as well for some granular decision making.
wget
provides a -S/--server-response
flag to print out the HTTP Response Headers on STDERR
of the command — which we can extract and act upon.
#!/bin/bash
set -eu
error_max=2
error_count=0
urls=( 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg' )
for url in "${urls[@]}"; do
set +e
http_status=$( wget --server-response -c "$url" 2>&1 )
exit_status=$?
http_status=$( awk '/HTTP//{ print $2 }' <<<"$http_status" | tail -n 1 )
if (( http_status >= 400 )); then
# Considering only HTTP Status errors
case "$http_status" in
# Define your actions for each 4XX Status Code below
410) : Gone
;;
416) : Requested Range Not Satisfiable
error_count=0 # Reset error_count in case of `wget -c`
;;
403) : Forbidden
;&
404) : Not Found
;&
*) (( error_count++ ))
;;
esac
elif (( http_status >= 300 )); then
# We're unlikely to reach here in case of 1XX, 3XX in $http_status
# but ..
exit_status=0
elif (( http_status >= 200 )); then
# 2XX in $http_status considered successful
exit_status=0
elif (( exit_status > 0 )); then
# Where wget's exit status is one of
# 1 Generic error code.
# 2 Parse error
# - when parsing command-line options, the .wgetrc or .netrc...
# 3 File I/O error.
# 4 Network failure.
# 5 SSL verification failure.
# 6 Username/password authentication failure.
# 7 Protocol errors.
(( error_count++ ))
fi
echo "$url -> http_status: $http_status, exit_status=$exit_status, error_count=$error_count" >&2
if (( error_count >= error_max )); then
echo "error_count $error_count >= $error_max, bailing out .." >&2
exit "$exit_status"
fi
done
answered Jul 2, 2017 at 14:55
With GNU Parallel this ought to work:
parallel --halt 1 wget ::: 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
From version 20140722 you can almost have your «two in a row»-failure: —halt 2% will allow for 2% of the jobs to fail:
parallel --halt 2% wget ::: 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
answered Jul 22, 2014 at 23:12
Ole TangeOle Tange
33.1k30 gold badges99 silver badges193 bronze badges
What I’ve used successfully is
wget 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg' 2>&1 | grep -q 'ERROR 404: Not Found'
grep -q
looks for the 404 error message pattern in its input and dies as soon as it sees it. wget
receives a SIGPIPE signal as soon as it tries to write to the pipe from which grep
is no longer reading. In practice wget
dies pretty quickly after getting that first 404.
answered Aug 14, 2021 at 0:58
Kyle JonesKyle Jones
14.7k3 gold badges40 silver badges51 bronze badges
In python you can do
from subprocess import *
def main():
for i in range(90, 110):
try :
url = "url/"+str(i)
check_output(["wget", url])
except CalledProcessError:
print "Wget returned none zero output, quiting"
sys.exit(0)
Checkout the documentation for subprocess if you want to do more https://docs.python.org/2/library/subprocess.html
answered Jun 20, 2017 at 8:28
2
You must log in to answer this question.
Not the answer you’re looking for? Browse other questions tagged
.
Not the answer you’re looking for? Browse other questions tagged
.
0
1
Добрый день. Хотел автоматизировать установку заббикс в bash скрипте. Но столкнулся с проблемой. На этапе скачивания wget (через bash) не скачивается файл. Точнее выдаёт ошибку Http request sent, awaiting response… 404 Not Found ERROR 404: Not Found. Не знаю как это победить. Если же просто запускать без bash то файл нормально скачивается. С ключами команды wget также поигрался. Безрезультатно.
Есть следующий сценарий
#!/usr/bin/bash
#Download
wget "https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb"
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update
apt upgrade
#Install Zabbix server, frontend, agent, database, httpd
apt install zabbix-server-mysql
apt install zabbix-frontend-php
apt install zabbix-apache-conf
apt install zabbix-agent
apt install mysql-server
#Create DB (example1)
mysql -uroot -p <<EOF
create database zabbix character set utf8 collate utf8_bin;
create user 'zabbix'@'localhost' identified by 'password';
grant all privileges on zabbix.* to 'zabbix'@'localhost';
EOF
#Import initial schema and data
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix
#Configure the database for Zabbix server
echo DBPassword=password >> /etc/zabbix/zabbix_server.conf
#Configure frontend
sed -i 's:# php_value date.timezone.*:php_value date.timezone Europe/Riga:g' /etc/zabbix/apache.conf;
#Start zabbix server processes start at system boot
systemctl restart zabbix-server zabbix-agent apache2
systemctl enable zabbix-server zabbix-agent apache2
I am trying to copy a file from a datastore into my one of the directories in my Ubuntu VM.
The path to the file in the datastore (datastore1) is,
software/Ubuntu/Ubuntu/EnterpriseRepository_11.1.1.3.0 and the filename is OER111130_generic.zip
The command I give in the Ubuntu terminal shell is,
wget —http-user=root —http-password=vmpwd ‘http://172.18.12.20/software/Ubuntu/Ubuntu/EnterpriseRepository_11.1.1.3.0/OER111130_generic.zip dsName=datastore1’ —no-check-certificate
It is returning with this error
HTTP request sent, awaiting response.. 404 Not Found