Psql игнорировать ошибки

I am using psql with a PostgreSQL database and the following copy command:

COPY isa (np1, np2, sentence) FROM 'c:Downloadsisa.txt' WITH DELIMITER '|'

I get:

ERROR:  extra data after last expected column

How can I skip the lines with errors?

asked Apr 14, 2016 at 21:30

SuperdooperheroSuperdooperhero

7,50419 gold badges82 silver badges134 bronze badges

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.

There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.

Solution

Fix your input file instead.

If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

How to update selected rows with values from a CSV file in Postgres?
COPY command: copy only specific columns from csv

answered Apr 14, 2016 at 22:46

Erwin BrandstetterErwin Brandstetter

595k144 gold badges1056 silver badges1214 bronze badges

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

Copy the original table and call it dummy_original_table
in the original table, create a trigger like this:

    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$  
    DECLARE
        v_rec   RECORD;
    BEGIN
        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
        IF v_rec IS NOT NULL THEN
            RETURN NULL;
        END IF; 
        BEGIN 
            INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
                ON CONFLICT DO NOTHING;
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
        RETURN NULL;
    END;

Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'

Alon Barad

1,4211 gold badge13 silver badges24 bronze badges

answered Dec 30, 2020 at 22:59

Workaround: remove the reported errant line using sed and run copy again

Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run copy again, e.g.,

#!/bin/bash
bad_line_number=5  # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv

[per the comment from @Botond_Balázs ]

answered Jan 12, 2021 at 0:24

Rob BednarkRob Bednark

25.5k23 gold badges78 silver badges123 bronze badges

Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:

#!/bin/bash

input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
    echo "$input_line" > $tmp_file
    psql my_database 
     -c "
     COPY my_table 
     FROM `$tmp_file` 
     DELIMITER '|'
     CSV;
    "
done

Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.

answered Jan 11, 2021 at 23:20

Rob BednarkRob Bednark

25.5k23 gold badges78 silver badges123 bronze badges

Источник

Assume this simple SQL query:

INSERT INTO table (col1,col2) VALUES (val1,val2),(val3,val4),(val5,val6);

Lets say val3 is invalid value for col1. This would cause psql to abort whole INSERT command — it would not insert (val1,val2) nor (val5,val6) either.

Is it possible to make postgresql ignore this error so it does not insert (val3,val4) pair but would still proceed with (val1,val2) and (val5,val6)?

I’m getting daily database dumps in text files from my partner (can’t change that), which I use to make my copy of it. Sometimes his huge INSERT queries cause errors like:

ERROR:  invalid byte sequence for encoding "UTF8": 0x00

… which makes whole 30000+ values not inserted to the table, because one of those values is wrong.

Источник

I have a table stored all my project rss channel url, now I found some url end with ‘/’ but some sub url are not. I my app I have to handle this situation in everywhere. Then I want to store all the sub url link without the last ‘/’, if the url end with ‘/’, I want to delete the end of ‘/’. I have write the update sql command like this:

UPDATE rss_sub_source 
SET sub_url = SUBSTRING(sub_url, 1, CHAR_LENGTH(sub_url) - 1) 
WHERE sub_url LIKE '%/';

when I execute the sql:

SQL Error [23505]: ERROR: duplicate key value violates unique constraint "unique_sub_url"
  Detail: Key (sub_url)=(https://physicsworld.com/feed) already exists.

the error shows that some url without ‘/’ have already exists. when I update wht end with ‘/’ url, it will conflict with the exists one because I add an uniq constraint. There table contains thousands of url, update one by one obviously impossible. So I want to ignore and jump to update the url if it did not obey the uniq constraint, only update the success record. Finnaly delete the end with ‘/’ record.

Is it possible to ignore the update error events in PostgreSQL? if not what should I do to make all rss url did not end with ‘/’?

Источник

score:20

Accepted answer

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.

There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.

Solution

Fix your input file instead.

Related answers with detailed instructions:

How to update selected rows with values from a CSV file in Postgres?
COPY command: copy only specific columns from csv

How to ignore errors with psql copy meta-command
Copying a snowflake table with nulls and empty strings to csv that can be imported with psql copy command
psql copy command hangs with large CSV data set
How to sort the output of Find to a psql copy command to load data in order?
How to use postgresql copy command with ANSI encoding?
How to add variable into path string in copy command PSQL

score:1

Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:

#!/bin/bash

input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
    echo "$input_line" > $tmp_file
    psql my_database 
     -c "
     COPY my_table 
     FROM `$tmp_file` 
     DELIMITER '|'
     CSV;
    "
done

score:1

Workaround: remove the reported errant line using sed and run copy again

Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run copy again, e.g.,

#!/bin/bash
bad_line_number=5  # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv

[per the comment from @Botond_Balázs ]

score:5

I had to make a work-around this way:

Copy the original table and call it dummy_original_table
in the original table, create a trigger like this:

    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$  
    DECLARE
        v_rec   RECORD;
    BEGIN
        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
        IF v_rec IS NOT NULL THEN
            RETURN NULL;
        END IF; 
        BEGIN 
            INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
                ON CONFLICT DO NOTHING;
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
        RETURN NULL;
    END;

Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'

More questions with similar tag

How to pass typed array as argument in PostgreSQL?
Insert batch in postgres from list of lists using python and sql alchemy
Lock multiple rows but get only one row as returned result
Django Heroku DataError when populating database from script
Regex to check if all the field is within brackets
How to count repeating values in a column in PostgreSQL?
connecting to PostgreSQL over a network
What happens when AWS RDS CPU utilisation reaches100%?
Graceful Primary Key Error handling in Python/psycopg2
How to resume on error during csv import in Postgresql
Slow Inserts in PostgreSQL
Sequelize-auto for Postgres

how to Insert multiple columns using PQputCopyData
Column naming in Postgresql functions
PostgreSQL on Elastic Beanstalk (Amazon Linux 2)
Difference in sequence of query generated in Django and Postgres for select_for_update
PostgreSql: order result — II
PostgreSQL primary key length limit
Is there a shared query plan cache for Postgres?
Insert multiple rows across 3 tables in 1 query using returned id
How to update table on postgres with join statement
Postgres: Error using GROUP BY and ORDER (on heroku)
postgresql select distinct on column but only if another column doesn’t contain a value
how can i get the sum of two colums realtime
malformed array literal — PostgreSQL
Batch Insert from Dataframe to DB ignoring failed row in Pyspark
Bit masking in Postgres
Generate series of months in a column at postgresql
postgres hierarchy — count of child levels and sort by date of children or grandchildren
Python, SQLAlchemy and Postgresql: understanding inheritance

Источник

I had to make a work-around this way:

Copy the original table and call it dummy_original_table
in the original table, create a trigger like this:

    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$      DECLARE        v_rec   RECORD;    BEGIN        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;        IF v_rec IS NOT NULL THEN            RETURN NULL;        END IF;         BEGIN             INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)                ON CONFLICT DO NOTHING;        EXCEPTION            WHEN OTHERS THEN                NULL;        END;        RETURN NULL;    END;

Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'

Источник

You cannot skip the errors without skipping the whole command up to and including Postgres 9.5. There is currently no more sophisticated error handling.

copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

There was an attempt to add error logging to COPY in Postgres 9.0, spearheaded by Aster Data, but it was never committed. The company was later acquired by Teradata, so I doubt they are still pursuing the project.

Solution

Fix your input file instead.

If you have one or more additional column in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

How to update selected rows with values from a CSV file in Postgres?
COPY command: copy only specific columns from csv

Источник

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters

Show hidden characters

	from .models import Url


	def get_columns(line):
	column_str = re.search(r’COPY [-_w.]+ (([^)]+))’, line).group(1)
	columns = column_str.split(‘,’)
	return [column.strip() for column in columns]


	def is_start(text):
	if text.startswith(‘COPY ‘):
	return True
	return False


	def is_end(text):
	if text.startswith(‘.’):
	return True
	return False


	def input_data_in_db(path, db):
	start = None
	columns = None
	with open(path) as f:
	for line in f:
	if not start:
	start = is_start(line)
	if start:
	columns = get_columns(line)
	continue

	if is_end(line):
	break

	dic = {}
	for i, column in enumerate(columns):
	dic[column] = line.split(‘t’)[i].strip()
	if dic[column] == ‘n’ or dic[column] == »:
	dic[column] = None
	print(dic)
	try:
	obj = db(**dic)
	obj.save()
	except Exception as e:
	print(e)


	path = ‘data.dump’
	db = Url
	input_data_in_db(path=path, db=db)

Источник

Solution

Solution

Read More

More questions

More questions with similar tag

Solution

Не пропустите эти материалы по теме: