I am using psql
with a PostgreSQL database and the following copy
command:
COPY isa (np1, np2, sentence) FROM 'c:Downloadsisa.txt' WITH DELIMITER '|'
I get:
ERROR: extra data after last expected column
How can I skip the lines with errors?
asked Apr 14, 2016 at 21:30
SuperdooperheroSuperdooperhero
7,50419 gold badges82 silver badges134 bronze badges
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
copy
is just a wrapper around SQL COPY
that channels results through psql. The manual for COPY
:
COPY
stops operation at the first error. This should not lead to problems in the event of aCOPY TO
, but the target table will
already have received earlier rows in aCOPY FROM
. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invokeVACUUM
to recover the wasted space.
Bold emphasis mine. And:
COPY FROM
will raise an error if any line of the input file contains
more or fewer columns than are expected.
COPY
is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY
in Postgres 9.0 but it was never committed.
Solution
Fix your input file instead.
If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa
and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT
selected columns (or expressions) to your target table isa
from there.
Related answers with detailed instructions:
- How to update selected rows with values from a CSV file in Postgres?
- COPY command: copy only specific columns from csv
answered Apr 14, 2016 at 22:46
Erwin BrandstetterErwin Brandstetter
595k144 gold badges1056 silver badges1214 bronze badges
5
It is too bad that in 25 years Postgres doesn’t have -ignore-errors
flag or option for COPY
command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
- Copy the original table and call it
dummy_original_table
- in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
- Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'
Alon Barad
1,4211 gold badge13 silver badges24 bronze badges
answered Dec 30, 2020 at 22:59
Workaround: remove the reported errant line using sed
and run copy
again
Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed
and run copy again, e.g.,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from @Botond_Balázs ]
answered Jan 12, 2021 at 0:24
Rob BednarkRob Bednark
25.5k23 gold badges78 silver badges123 bronze badges
1
Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database
-c "
COPY my_table
FROM `$tmp_file`
DELIMITER '|'
CSV;
"
done
Additionally, you could modify the script to capture the psql
stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line
and the captured stdout/stderr to stdin and/or append it to a file.
answered Jan 11, 2021 at 23:20
Rob BednarkRob Bednark
25.5k23 gold badges78 silver badges123 bronze badges
Assume this simple SQL query:
INSERT INTO table (col1,col2) VALUES (val1,val2),(val3,val4),(val5,val6);
Lets say val3
is invalid value for col1
. This would cause psql to abort whole INSERT command — it would not insert (val1,val2)
nor (val5,val6)
either.
Is it possible to make postgresql ignore this error so it does not insert (val3,val4)
pair but would still proceed with (val1,val2)
and (val5,val6)
?
I’m getting daily database dumps in text files from my partner (can’t change that), which I use to make my copy of it. Sometimes his huge INSERT queries cause errors like:
ERROR: invalid byte sequence for encoding "UTF8": 0x00
… which makes whole 30000+ values not inserted to the table, because one of those values is wrong.
I have a table stored all my project rss channel url, now I found some url end with ‘/’ but some sub url are not. I my app I have to handle this situation in everywhere. Then I want to store all the sub url link without the last ‘/’, if the url end with ‘/’, I want to delete the end of ‘/’. I have write the update sql command like this:
UPDATE rss_sub_source
SET sub_url = SUBSTRING(sub_url, 1, CHAR_LENGTH(sub_url) - 1)
WHERE sub_url LIKE '%/';
when I execute the sql:
SQL Error [23505]: ERROR: duplicate key value violates unique constraint "unique_sub_url"
Detail: Key (sub_url)=(https://physicsworld.com/feed) already exists.
the error shows that some url without ‘/’ have already exists. when I update wht end with ‘/’ url, it will conflict with the exists one because I add an uniq constraint. There table contains thousands of url, update one by one obviously impossible. So I want to ignore and jump to update the url if it did not obey the uniq constraint, only update the success record. Finnaly delete the end with ‘/’ record.
Is it possible to ignore the update error events in PostgreSQL? if not what should I do to make all rss url did not end with ‘/’?
score:20
Accepted answer
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
copy
is just a wrapper around SQL COPY
that channels results through psql. The manual for COPY
:
COPY
stops operation at the first error. This should not lead to problems in the event of aCOPY TO
, but the target table will
already have received earlier rows in aCOPY FROM
. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invokeVACUUM
to recover the wasted space.
Bold emphasis mine. And:
COPY FROM
will raise an error if any line of the input file contains
more or fewer columns than are expected.
COPY
is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY
in Postgres 9.0 but it was never committed.
Solution
Fix your input file instead.
If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa
and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT
selected columns (or expressions) to your target table isa
from there.
Related answers with detailed instructions:
- How to update selected rows with values from a CSV file in Postgres?
- COPY command: copy only specific columns from csv
Read More
- How to ignore errors with psql copy meta-command
- Copying a snowflake table with nulls and empty strings to csv that can be imported with psql copy command
- psql copy command hangs with large CSV data set
- How to sort the output of Find to a psql copy command to load data in order?
- How to use postgresql copy command with ANSI encoding?
- How to add variable into path string in copy command PSQL
score:1
Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database
-c "
COPY my_table
FROM `$tmp_file`
DELIMITER '|'
CSV;
"
done
Additionally, you could modify the script to capture the psql
stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line
and the captured stdout/stderr to stdin and/or append it to a file.
score:1
Workaround: remove the reported errant line using sed
and run copy
again
Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed
and run copy again, e.g.,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from @Botond_Balázs ]
score:5
It is too bad that in 25 years Postgres doesn’t have -ignore-errors
flag or option for COPY
command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
- Copy the original table and call it
dummy_original_table
- in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
- Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'
More questions
- How to use PROGRAM clause of Postgres COPY command with Powershell command
- How to preserve new line character while performing psql copy command
- Postgres — How to use psql to create a database with the -c command and password authentication?
- How to log the output along with error messages to a file while running a script on psql command line on Freebsd OS?
- Postgresql dynamic query in COPY stament with psql command
- PostgreSQL — how to use copy command (importing text file) when the line ends with delimiter
- How do I copy data from a CSV dump into a PSQL database with a different structure?
- how to insert filenames with single quotes in a postgresql COPY command from delphi, parameters?
- How to PSQL COPY with PYTHON
- How to exit from PostgreSQL command line utility: psql
- Docker — How can run the psql command in the postgres container?
- How do I copy data from one table to another in postgres using copy command
- How to make a select with array contains value clause in psql
- How to copy from CSV file to PostgreSQL table with headers in CSV file?
- How to execute multiple queries using psql command from bash shell?
- In PostgreSQL, how to insert data with COPY command?
- How to make Postgres Copy ignore first line of large txt file
- Run batch file with psql command without password
- Is it possible to turn off quote processing in the Postgres COPY command with CSV format?
- How to use the psql command to list, create, use and examine databases?
- How Postgresql COPY TO STDIN With CSV do on conflic do update?
- How to copy structure of one table to another with foreign key constraints in psql?
- how to set up the psql command in cygwin?
- copy command with psycopg2 library
More questions with similar tag
- How to pass typed array as argument in PostgreSQL?
- Insert batch in postgres from list of lists using python and sql alchemy
- Lock multiple rows but get only one row as returned result
- Django Heroku DataError when populating database from script
- Regex to check if all the field is within brackets
- How to count repeating values in a column in PostgreSQL?
- connecting to PostgreSQL over a network
- What happens when AWS RDS CPU utilisation reaches100%?
- Graceful Primary Key Error handling in Python/psycopg2
- How to resume on error during csv import in Postgresql
- Slow Inserts in PostgreSQL
- Sequelize-auto for Postgres
- how to Insert multiple columns using PQputCopyData
- Column naming in Postgresql functions
- PostgreSQL on Elastic Beanstalk (Amazon Linux 2)
- Difference in sequence of query generated in Django and Postgres for select_for_update
- PostgreSql: order result — II
- PostgreSQL primary key length limit
- Is there a shared query plan cache for Postgres?
- Insert multiple rows across 3 tables in 1 query using returned id
- How to update table on postgres with join statement
- Postgres: Error using GROUP BY and ORDER (on heroku)
- postgresql select distinct on column but only if another column doesn’t contain a value
- how can i get the sum of two colums realtime
- malformed array literal — PostgreSQL
- Batch Insert from Dataframe to DB ignoring failed row in Pyspark
- Bit masking in Postgres
- Generate series of months in a column at postgresql
- postgres hierarchy — count of child levels and sort by date of children or grandchildren
- Python, SQLAlchemy and Postgresql: understanding inheritance
It is too bad that in 25 years Postgres doesn’t have -ignore-errors
flag or option for COPY
command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
- Copy the original table and call it
dummy_original_table
- in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$ DECLARE v_rec RECORD; BEGIN -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec; IF v_rec IS NOT NULL THEN RETURN NULL; END IF; BEGIN INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key) ON CONFLICT DO NOTHING; EXCEPTION WHEN OTHERS THEN NULL; END; RETURN NULL; END;
- Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E't'
You cannot skip the errors without skipping the whole command up to and including Postgres 9.5. There is currently no more sophisticated error handling.
copy
is just a wrapper around SQL COPY
that channels results through psql. The manual for COPY
:
COPY
stops operation at the first error. This should not lead to
problems in the event of aCOPY TO
, but the target table will already
have received earlier rows in aCOPY FROM
. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM
to recover the wasted space.
Bold emphasis mine. And:
COPY FROM
will raise an error if any line of the input file contains
more or fewer columns than are expected.
There was an attempt to add error logging to COPY
in Postgres 9.0, spearheaded by Aster Data, but it was never committed. The company was later acquired by Teradata, so I doubt they are still pursuing the project.
Solution
Fix your input file instead.
If you have one or more additional column in your input file and the file is otherwise consistent, you might add dummy columns to your table isa
and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT
selected columns (or expressions) to your target table isa
from there.
Related answers with detailed instructions:
- How to update selected rows with values from a CSV file in Postgres?
- COPY command: copy only specific columns from csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from .models import Url | |
def get_columns(line): | |
column_str = re.search(r’COPY [-_w.]+ (([^)]+))’, line).group(1) | |
columns = column_str.split(‘,’) | |
return [column.strip() for column in columns] | |
def is_start(text): | |
if text.startswith(‘COPY ‘): | |
return True | |
return False | |
def is_end(text): | |
if text.startswith(‘.’): | |
return True | |
return False | |
def input_data_in_db(path, db): | |
start = None | |
columns = None | |
with open(path) as f: | |
for line in f: | |
if not start: | |
start = is_start(line) | |
if start: | |
columns = get_columns(line) | |
continue | |
if is_end(line): | |
break | |
dic = {} | |
for i, column in enumerate(columns): | |
dic[column] = line.split(‘t’)[i].strip() | |
if dic[column] == ‘n’ or dic[column] == »: | |
dic[column] = None | |
print(dic) | |
try: | |
obj = db(**dic) | |
obj.save() | |
except Exception as e: | |
print(e) | |
path = ‘data.dump’ | |
db = Url | |
input_data_in_db(path=path, db=db) |