Wget ошибка 404 not found - Не ошибается лишь тот, кто ничего не делает!

I tried to download an image using wget but got an error like the following.

--2011-10-01 16:45:42--  http://www.icerts.com/images/logo.jpg
Resolving www.icerts.com... 97.74.86.3
Connecting to www.icerts.com|97.74.86.3|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-10-01 16:45:43 ERROR 404: Not Found.

My browser has no problem loading the image.
What’s the problem?
curl can’t download either.

Thanks.

Sam

asked Oct 1, 2011 at 23:47

You need to add the referer field in the headers of the HTTP request. With wget, you just need the —header arg :

wget http://www.icerts.com/images/logo.jpg --header "Referer: www.icerts.com"

And the result :

--2011-10-02 02:00:18--  http://www.icerts.com/images/logo.jpg
Résolution de www.icerts.com (www.icerts.com)... 97.74.86.3
Connexion vers www.icerts.com (www.icerts.com)|97.74.86.3|:80...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: 6102 (6,0K) [image/jpeg]
Sauvegarde en : «logo.jpg»

answered Oct 2, 2011 at 0:05

blotusblotus

4064 silver badges3 bronze badges

I had the same problem with a Google Docs URL. Enclosing the URL in quotes did the trick for me:

wget "https://docs.google.com/spreadsheets/export?format=tsv&id=1sSi9f6m-zKteoXA4r4Yq-zfdmL4rjlZRt38mejpdhC23" -O sheet.tsv

answered May 5, 2015 at 21:52

e18re18r

7,4124 gold badges44 silver badges40 bronze badges

You will also get a 404 error if you are using ipv6 and the server only accepts ipv4.

To use ipv4, make a request adding -4:

wget -4 http://www.php.net/get/php-5.4.13.tar.gz/from/this/mirror

answered Mar 17, 2013 at 19:27

Eli WhiteEli White

9841 gold badge9 silver badges20 bronze badges

I had same problem.
Solved using single quotes like this:

$ wget 'http://www.icerts.com/images/logo.jpg'

wget version in use:

$ wget --version
GNU Wget 1.11.4 Red Hat modified

answered Nov 9, 2017 at 13:57

MauricioMauricio

3934 silver badges9 bronze badges

Wget 404 error also always happens if you want to download the pages from WordPress-website by typing

wget -r http://somewebsite.com

If this website is built using WordPress you’ll get such an error:

ERROR 404: Not Found.

There’s no way to mirror WordPress-website because the website content is stored in the database and wget is not able to grab .php files. That’s why you get Wget 404 error.

I know it’s not this question’s case, because Sam only wants to download a single picture, but it can be helpful for others.

answered Jan 6, 2019 at 12:55

Actually I don’t know what is the reason exactly, I have faced this like of problem.
if you have the domain’s IP address (ex 208.113.139.4), please use the IP address instead of domain (in this case www.icerts.com)

wget 192.243.111.11/images/logo.jpg

Go to find the IP from URL https://ipinfo.info/html/ip_checker.php

answered Aug 27, 2020 at 11:23

MafeiMafei

2,6862 gold badges15 silver badges32 bronze badges

I want to add something to @blotus’s answer,

In case adding the referrer header does not solve the issue, May be you are using the wrong referrer (Sometimes the referrer is different from the URL’s domain name).

Paste the URL on a web browser and find the referrer from developer tools (Network -> Request Headers).

answered Jun 25, 2021 at 11:26

I met exactly the same problem while setting up GitHub actions with Cygwin. Only after I used wget --debug <url>, I realized that URL is appended with 0xd symbol which is r (carriage return).

For this kind of problem there is the solution described in docs:

you can also use igncr in the SHELLOPTS environment variable

So I added the following lines to my YAML script to make wget work properly, as well as other shell commands in my GHA workflow:

env:
  SHELLOPTS: igncr

answered Nov 24, 2022 at 12:51

Rom098Rom098

2,4254 gold badges34 silver badges51 bronze badges

Источник

Asked
11 years, 11 months ago

Viewed
3k times

When trying to use the command line on Windows to wget a file, I’m getting a 404 error

C:Usersxxxx>wget + http://www.restaurantanzu.com/PDFs/Dinner908.pdf;
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:Program FilesGnuWin32/etc/wgetrc
--2011-06-07 15:59:19--  http://+/

Resolving . 67.199.65.121
Connecting to |67.199.65.121|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-06-07 15:59:21 ERROR 404: Not Found.

blahdiblah

4,7332 gold badges22 silver badges30 bronze badges

asked Jun 7, 2011 at 23:06

Why is the + right after the wget command? Looks like wget is trying to resolve/pull that as a URL based on the end, the «http://+/»

Try it without the +. In fact, try it with just:

wget http://www.restaurantanzu.com/PDFs/Dinner908.pdf

This works for me on a linux box.

answered Jun 8, 2011 at 0:05

baraboombaraboom

8145 silver badges7 bronze badges

Drop the + (as mentioned by @baraboom) and also drop the semi-colon at the end of the line.

answered Jun 8, 2011 at 3:08

jdigitaljdigital

8915 silver badges10 bronze badges

Источник

If you use brace expansion with wget, you can fetch sequentially-numbered images with ease:

$ wget 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'

It fetches the first 10 files numbered 90.jpg to 99.jpg just fine, but 100.jpg and onward return a 404: File not found error (I only have 100 images stored on the server). These non-existent files become more of «a problem» if you use a larger range, such as {00..200}, with 100 non-existent files, it increases the script’s execution time, and might even become a slight burden (or at least annoyance) on the server.

Is there any way for wget to stop after it has received its first 404 error? (or even better, two in a row, in case there was a missing file in the range for another reason) The answer does not need to use brace expansion; loops are fine too.

asked Jul 22, 2014 at 6:06

If you’re happy with a loop:

for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
    wget "$url" || break
done

That will run wget for each URL in your expansion until it fails, and then break out of the loop.

If you want two failures in a row it gets a bit more complicated:

for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
    if wget "$url"
    then
        failed=
    elif [ "$failed" ]
    then
        break
    else
        failed=yes
    fi
done

You can shrink that a little with && and || instead of if, but it gets pretty ugly.

I don’t believe wget has anything built in to do that.

answered Jul 22, 2014 at 6:13

Michael HomerMichael Homer

74k16 gold badges211 silver badges233 bronze badges

You could use the $? variable to get the return code of wget. If it’s non-zero then it means an error occured and you tally it up until it reached a threshold, then it could break out of the loop.

Something like this off the top of my head

#!/bin/bash

threshold=0
for x in {90..110}; do
    wget 'http://www.iqandreas.com/sample-images/100-100-color/'$x'.jpg'
    wgetreturn=$?
    if [[ $wgetreturn -ne 0 ]]; then
        threshold=$(($threshold+$wgetreturn))
        if [[ $threshold -eq 16 ]]; then
                break
        fi
    fi
done

The for loop can be cleaned up a bit, but you can understand the general idea.

Changing the $threshold -eq 16 to -eq 24 would mean it would fail 3 times before it would stop, however it wouldn’t be twice in a row, it would be if it failed twice in the loop.

The reason why 16 and 24 are used is that is the total of the return codes.
wget responds with a return code of 8 when it receives a response code that corresponds to an error from the server, and thus 16 is the total after 2 errors.

Stopping when failures only occur twice in a row can be done by resetting the threshold whenever wget succeeds, i.e. when the return code is 0

A list of wget return codes can be found here — http://www.gnu.org/software/wget/manual/html_node/Exit-Status.html

answered Jul 22, 2014 at 6:13

LawrenceLawrence

2,25215 silver badges15 bronze badges

IMO, focusing in on wget‘s exit code/status may be too naive for some use-cases, so here is one that considers the HTTP Status Code as well for some granular decision making.

wget provides a -S/--server-response flag to print out the HTTP Response Headers on STDERR of the command — which we can extract and act upon.

#!/bin/bash

set -eu

error_max=2
error_count=0

urls=( 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg' )

for url in "${urls[@]}"; do
  set +e
  http_status=$( wget --server-response -c "$url" 2>&1 )
  exit_status=$?
  http_status=$( awk '/HTTP//{ print $2 }' <<<"$http_status" | tail -n 1 )

  if (( http_status >= 400 )); then
    # Considering only HTTP Status errors
    case "$http_status" in
      # Define your actions for each 4XX Status Code below
      410) : Gone
        ;;
      416) : Requested Range Not Satisfiable
        error_count=0  # Reset error_count in case of `wget -c`
        ;;
      403) : Forbidden
        ;&
      404) : Not Found
        ;&
      *)     (( error_count++ ))
        ;;
    esac
  elif (( http_status >= 300 )); then
     # We're unlikely to reach here in case of 1XX, 3XX in $http_status
     # but ..
     exit_status=0
  elif (( http_status >= 200 )); then
     # 2XX in $http_status considered successful
     exit_status=0
  elif (( exit_status > 0 )); then

    # Where wget's exit status is one of
    # 1   Generic error code.
    # 2   Parse error 
    #     - when parsing command-line options, the .wgetrc or .netrc...
    # 3   File I/O error.
    # 4   Network failure.
    # 5   SSL verification failure.
    # 6   Username/password authentication failure.
    # 7   Protocol errors.

    (( error_count++ ))
  fi

  echo "$url -> http_status: $http_status, exit_status=$exit_status, error_count=$error_count" >&2

  if (( error_count >= error_max )); then
    echo "error_count $error_count >= $error_max, bailing out .." >&2
    exit "$exit_status"
  fi

done

answered Jul 2, 2017 at 14:55

With GNU Parallel this ought to work:

parallel --halt 1 wget ::: 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'

From version 20140722 you can almost have your «two in a row»-failure: —halt 2% will allow for 2% of the jobs to fail:

parallel --halt 2% wget ::: 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'

answered Jul 22, 2014 at 23:12

Ole TangeOle Tange

33.1k30 gold badges99 silver badges193 bronze badges

What I’ve used successfully is

wget 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg' 2>&1 | grep -q 'ERROR 404: Not Found'

grep -q looks for the 404 error message pattern in its input and dies as soon as it sees it. wget receives a SIGPIPE signal as soon as it tries to write to the pipe from which grep is no longer reading. In practice wget dies pretty quickly after getting that first 404.

answered Aug 14, 2021 at 0:58

Kyle JonesKyle Jones

14.7k3 gold badges40 silver badges51 bronze badges

In python you can do

from subprocess import *

def main():
    for i in range(90, 110):
       try :
          url = "url/"+str(i)
          check_output(["wget", url])
       except CalledProcessError:
          print "Wget returned none zero output, quiting"
          sys.exit(0)

Checkout the documentation for subprocess if you want to do more https://docs.python.org/2/library/subprocess.html

answered Jun 20, 2017 at 8:28

You must log in to answer this question.

Not the answer you’re looking for? Browse other questions tagged

.

Источник

0

1

Добрый день. Хотел автоматизировать установку заббикс в bash скрипте. Но столкнулся с проблемой. На этапе скачивания wget (через bash) не скачивается файл. Точнее выдаёт ошибку Http request sent, awaiting response… 404 Not Found ERROR 404: Not Found. Не знаю как это победить. Если же просто запускать без bash то файл нормально скачивается. С ключами команды wget также поигрался. Безрезультатно.
Есть следующий сценарий

#!/usr/bin/bash

#Download

wget "https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb"
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update
apt upgrade

#Install Zabbix server, frontend, agent, database, httpd

apt install zabbix-server-mysql
apt install zabbix-frontend-php
apt install zabbix-apache-conf
apt install zabbix-agent
apt install mysql-server

#Create DB (example1)

mysql -uroot -p <<EOF
create database zabbix character set utf8 collate utf8_bin;
create user 'zabbix'@'localhost' identified by 'password';
grant all privileges on zabbix.* to 'zabbix'@'localhost';
EOF

#Import initial schema and data

zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix

#Configure the database for Zabbix server

echo DBPassword=password >> /etc/zabbix/zabbix_server.conf

#Configure frontend

sed -i 's:# php_value date.timezone.*:php_value date.timezone Europe/Riga:g' /etc/zabbix/apache.conf;

#Start zabbix server processes start at system boot

systemctl restart zabbix-server zabbix-agent apache2
systemctl enable zabbix-server zabbix-agent apache2

Источник

I am trying to copy a file from a datastore into my one of the directories in my Ubuntu VM.

The path to the file in the datastore (datastore1) is,

software/Ubuntu/Ubuntu/EnterpriseRepository_11.1.1.3.0 and the filename is OER111130_generic.zip

The command I give in the Ubuntu terminal shell is,

wget —http-user=root —http-password=vmpwd ‘http://172.18.12.20/software/Ubuntu/Ubuntu/EnterpriseRepository_11.1.1.3.0/OER111130_generic.zip dsName=datastore1’ —no-check-certificate

It is returning with this error

HTTP request sent, awaiting response.. 404 Not Found

Источник

You must log in to answer this question.

Не пропустите эти материалы по теме: