Psql игнорировать ошибки

I am using psql with a PostgreSQL database and the following copy command:

COPY isa (np1, np2, sentence) FROM 'c:Downloadsisa.txt' WITH DELIMITER '|'

I get:

ERROR:  extra data after last expected column

How can I skip the lines with errors?

Braiam's user avatar

asked Apr 14, 2016 at 21:30

Superdooperhero's user avatar

SuperdooperheroSuperdooperhero

7,50419 gold badges82 silver badges134 bronze badges

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.

There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.

Solution

Fix your input file instead.

If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

  • How to update selected rows with values from a CSV file in Postgres?
  • COPY command: copy only specific columns from csv

answered Apr 14, 2016 at 22:46

Erwin Brandstetter's user avatar

Erwin BrandstetterErwin Brandstetter

595k144 gold badges1056 silver badges1214 bronze badges

5

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

  1. Copy the original table and call it dummy_original_table
  2. in the original table, create a trigger like this:
    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$  
    DECLARE
        v_rec   RECORD;
    BEGIN
        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
        IF v_rec IS NOT NULL THEN
            RETURN NULL;
        END IF; 
        BEGIN 
            INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
                ON CONFLICT DO NOTHING;
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
        RETURN NULL;
    END;
  1. Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'

Alon Barad's user avatar

Alon Barad

1,4211 gold badge13 silver badges24 bronze badges

answered Dec 30, 2020 at 22:59

Nulik's user avatar

Workaround: remove the reported errant line using sed and run copy again

Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run copy again, e.g.,

#!/bin/bash
bad_line_number=5  # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv

[per the comment from @Botond_Balázs ]

answered Jan 12, 2021 at 0:24

Rob Bednark's user avatar

Rob BednarkRob Bednark

25.5k23 gold badges78 silver badges123 bronze badges

1

Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:

#!/bin/bash

input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
    echo "$input_line" > $tmp_file
    psql my_database 
     -c "
     COPY my_table 
     FROM `$tmp_file` 
     DELIMITER '|'
     CSV;
    "
done

Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.

answered Jan 11, 2021 at 23:20

Rob Bednark's user avatar

Rob BednarkRob Bednark

25.5k23 gold badges78 silver badges123 bronze badges

Assume this simple SQL query:

INSERT INTO table (col1,col2) VALUES (val1,val2),(val3,val4),(val5,val6);

Lets say val3 is invalid value for col1. This would cause psql to abort whole INSERT command — it would not insert (val1,val2) nor (val5,val6) either.

Is it possible to make postgresql ignore this error so it does not insert (val3,val4) pair but would still proceed with (val1,val2) and (val5,val6)?

I’m getting daily database dumps in text files from my partner (can’t change that), which I use to make my copy of it. Sometimes his huge INSERT queries cause errors like:

ERROR:  invalid byte sequence for encoding "UTF8": 0x00

… which makes whole 30000+ values not inserted to the table, because one of those values is wrong.

I have a table stored all my project rss channel url, now I found some url end with ‘/’ but some sub url are not. I my app I have to handle this situation in everywhere. Then I want to store all the sub url link without the last ‘/’, if the url end with ‘/’, I want to delete the end of ‘/’. I have write the update sql command like this:

UPDATE rss_sub_source 
SET sub_url = SUBSTRING(sub_url, 1, CHAR_LENGTH(sub_url) - 1) 
WHERE sub_url LIKE '%/';

when I execute the sql:

SQL Error [23505]: ERROR: duplicate key value violates unique constraint "unique_sub_url"
  Detail: Key (sub_url)=(https://physicsworld.com/feed) already exists.

the error shows that some url without ‘/’ have already exists. when I update wht end with ‘/’ url, it will conflict with the exists one because I add an uniq constraint. There table contains thousands of url, update one by one obviously impossible. So I want to ignore and jump to update the url if it did not obey the uniq constraint, only update the success record. Finnaly delete the end with ‘/’ record.

Is it possible to ignore the update error events in PostgreSQL? if not what should I do to make all rss url did not end with ‘/’?

score:20

Accepted answer

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.

There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.

Solution

Fix your input file instead.

If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

  • How to update selected rows with values from a CSV file in Postgres?
  • COPY command: copy only specific columns from csv

Read More

  • How to ignore errors with psql copy meta-command
  • Copying a snowflake table with nulls and empty strings to csv that can be imported with psql copy command
  • psql copy command hangs with large CSV data set
  • How to sort the output of Find to a psql copy command to load data in order?
  • How to use postgresql copy command with ANSI encoding?
  • How to add variable into path string in copy command PSQL

score:1

Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:

#!/bin/bash

input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
    echo "$input_line" > $tmp_file
    psql my_database 
     -c "
     COPY my_table 
     FROM `$tmp_file` 
     DELIMITER '|'
     CSV;
    "
done

Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.

score:1

Workaround: remove the reported errant line using sed and run copy again

Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run copy again, e.g.,

#!/bin/bash
bad_line_number=5  # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv

[per the comment from @Botond_Balázs ]

score:5

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

  1. Copy the original table and call it dummy_original_table
  2. in the original table, create a trigger like this:
    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$  
    DECLARE
        v_rec   RECORD;
    BEGIN
        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
        IF v_rec IS NOT NULL THEN
            RETURN NULL;
        END IF; 
        BEGIN 
            INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
                ON CONFLICT DO NOTHING;
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
        RETURN NULL;
    END;
  1. Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'

More questions

  • How to use PROGRAM clause of Postgres COPY command with Powershell command
  • How to preserve new line character while performing psql copy command
  • Postgres — How to use psql to create a database with the -c command and password authentication?
  • How to log the output along with error messages to a file while running a script on psql command line on Freebsd OS?
  • Postgresql dynamic query in COPY stament with psql command
  • PostgreSQL — how to use copy command (importing text file) when the line ends with delimiter
  • How do I copy data from a CSV dump into a PSQL database with a different structure?
  • how to insert filenames with single quotes in a postgresql COPY command from delphi, parameters?
  • How to PSQL COPY with PYTHON
  • How to exit from PostgreSQL command line utility: psql
  • Docker — How can run the psql command in the postgres container?
  • How do I copy data from one table to another in postgres using copy command
  • How to make a select with array contains value clause in psql
  • How to copy from CSV file to PostgreSQL table with headers in CSV file?
  • How to execute multiple queries using psql command from bash shell?
  • In PostgreSQL, how to insert data with COPY command?
  • How to make Postgres Copy ignore first line of large txt file
  • Run batch file with psql command without password
  • Is it possible to turn off quote processing in the Postgres COPY command with CSV format?
  • How to use the psql command to list, create, use and examine databases?
  • How Postgresql COPY TO STDIN With CSV do on conflic do update?
  • How to copy structure of one table to another with foreign key constraints in psql?
  • how to set up the psql command in cygwin?
  • copy command with psycopg2 library

More questions with similar tag

  • How to pass typed array as argument in PostgreSQL?
  • Insert batch in postgres from list of lists using python and sql alchemy
  • Lock multiple rows but get only one row as returned result
  • Django Heroku DataError when populating database from script
  • Regex to check if all the field is within brackets
  • How to count repeating values in a column in PostgreSQL?
  • connecting to PostgreSQL over a network
  • What happens when AWS RDS CPU utilisation reaches100%?
  • Graceful Primary Key Error handling in Python/psycopg2
  • How to resume on error during csv import in Postgresql
  • Slow Inserts in PostgreSQL
  • Sequelize-auto for Postgres
  • how to Insert multiple columns using PQputCopyData
  • Column naming in Postgresql functions
  • PostgreSQL on Elastic Beanstalk (Amazon Linux 2)
  • Difference in sequence of query generated in Django and Postgres for select_for_update
  • PostgreSql: order result — II
  • PostgreSQL primary key length limit
  • Is there a shared query plan cache for Postgres?
  • Insert multiple rows across 3 tables in 1 query using returned id
  • How to update table on postgres with join statement
  • Postgres: Error using GROUP BY and ORDER (on heroku)
  • postgresql select distinct on column but only if another column doesn’t contain a value
  • how can i get the sum of two colums realtime
  • malformed array literal — PostgreSQL
  • Batch Insert from Dataframe to DB ignoring failed row in Pyspark
  • Bit masking in Postgres
  • Generate series of months in a column at postgresql
  • postgres hierarchy — count of child levels and sort by date of children or grandchildren
  • Python, SQLAlchemy and Postgresql: understanding inheritance

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

  1. Copy the original table and call it dummy_original_table
  2. in the original table, create a trigger like this:
    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$      DECLARE        v_rec   RECORD;    BEGIN        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;        IF v_rec IS NOT NULL THEN            RETURN NULL;        END IF;         BEGIN             INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)                ON CONFLICT DO NOTHING;        EXCEPTION            WHEN OTHERS THEN                NULL;        END;        RETURN NULL;    END;
  1. Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'


You cannot skip the errors without skipping the whole command up to and including Postgres 9.5. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

There was an attempt to add error logging to COPY in Postgres 9.0, spearheaded by Aster Data, but it was never committed. The company was later acquired by Teradata, so I doubt they are still pursuing the project.

Solution

Fix your input file instead.

If you have one or more additional column in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

  • How to update selected rows with values from a CSV file in Postgres?
  • COPY command: copy only specific columns from csv


This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters

Show hidden characters

from .models import Url
def get_columns(line):
column_str = re.search(r’COPY [-_w.]+ (([^)]+))’, line).group(1)
columns = column_str.split(‘,’)
return [column.strip() for column in columns]
def is_start(text):
if text.startswith(‘COPY ‘):
return True
return False
def is_end(text):
if text.startswith(‘.’):
return True
return False
def input_data_in_db(path, db):
start = None
columns = None
with open(path) as f:
for line in f:
if not start:
start = is_start(line)
if start:
columns = get_columns(line)
continue
if is_end(line):
break
dic = {}
for i, column in enumerate(columns):
dic[column] = line.split(‘t’)[i].strip()
if dic[column] == ‘n’ or dic[column] == »:
dic[column] = None
print(dic)
try:
obj = db(**dic)
obj.save()
except Exception as e:
print(e)
path = ‘data.dump’
db = Url
input_data_in_db(path=path, db=db)

Понравилась статья? Поделить с друзьями:
  • Psp ошибка удаления игры
  • Pubg ошибка не удалось выполнить инициализацию steam
  • Pubg ошибка не допускается вашей платформой
  • Pubg ошибка время подключения истекло
  • Pubg ошибка serialization error action needed