Ошибка неверная последовательность байт для кодировки utf8 0x00

I get the following error when inserting data from mysql into postgres.

Do I have to manually remove all null characters from my input data?
Is there a way to get postgres to do this for me?

ERROR: invalid byte sequence for encoding "UTF8": 0x00

asked Aug 28, 2009 at 15:13

ScArcher2's user avatar

ScArcher2ScArcher2

85.1k43 gold badges120 silver badges160 bronze badges

PostgreSQL doesn’t support storing NULL (x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field — which should store anything you want, but won’t support text operations on it.

Given that PostgreSQL doesn’t support it in text values, there’s no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it’s likely going to be easier to do that in preprocessing before you load it.

StackzOfZtuff's user avatar

answered Aug 28, 2009 at 18:06

Magnus Hagander's user avatar

Magnus HaganderMagnus Hagander

23.6k5 gold badges55 silver badges43 bronze badges

1

If you are using Java, you could just replace the x00 characters before the insert like following:

myValue.replaceAll("u0000", "")

The solution was provided and explained by Csaba in following post:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

in Java you can actually have a «0x0» character in your string, and
that’s valid unicode. So that’s translated to the character 0x0 in
UTF8, which in turn is not accepted because the server uses null
terminated strings… so the only way is to make sure your strings
don’t contain the character ‘u0000’.

answered Aug 22, 2017 at 6:24

David Dal Busco's user avatar

David Dal BuscoDavid Dal Busco

7,87515 gold badges54 silver badges96 bronze badges

2

Just regex out null bytes:

s/x00//g;

answered Jan 8, 2013 at 16:12

hicham's user avatar

2

Only this regex worked for me:

sed 's/\0//g'

So as you get your data do this: $ get_data | sed 's/\0//g' which will output your data without 0x00

answered Oct 5, 2018 at 15:21

techkuz's user avatar

techkuztechkuz

3,5285 gold badges34 silver badges62 bronze badges

You can first insert data into blob field and then copy to text field with the folloing function

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'\000'::bytea in ref.blob_field);
      WHILE i > 0 LOOP
        ref.bob_field := set_byte(ref.blob_field, i-1, 20);
        i := position(E'\000'::bytea in ref.blobl_field);
      END LOOP

    UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
    END LOOP;

End; $$ LANGUAGE plpgsql; 

SELECT blob2text();

Hambone's user avatar

Hambone

15.5k8 gold badges46 silver badges68 bronze badges

answered Oct 13, 2009 at 6:15

Raido's user avatar

RaidoRaido

5715 silver badges11 bronze badges

If you need to store null characters in text fields and don’t want to change your data type other than text then you can follow my solution too:

Before insert:

myValue = myValue.replaceAll("u0000", "SomeVerySpecialText")

After select:

myValue = myValue.replaceAll("SomeVerySpecialText","u0000")

I’ve used «null» as my SomeVerySpecialText which I am sure that there will be no any «null» string in my values at all.

answered Nov 26, 2018 at 10:04

Ismail Yavuz's user avatar

Ismail YavuzIsmail Yavuz

6,5886 gold badges27 silver badges49 bronze badges

This kind of error can also happen when using COPY and having an escaped string containing NULL values(00) such as:

"Hx00x00x00tjxA8x9E#Dx98+xCAxF0xA7xBBlxC5x19xD7x8DxB6x18xEDJx1En"

If you use COPY without specifying the format 'CSV' postgres by default will assume format 'text'. This has a different interaction with backlashes, see text format.

If you’re using COPY or a file_fdw make sure to specify format 'CSV' to avoid this kind of errors.

answered Aug 6, 2019 at 2:36

Steve Chavez's user avatar

I’ve spent the last 8 hours trying to import the output of ‘mysqldump —compatible=postgresql’ into PostgreSQL 8.4.9, and I’ve read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.

MySQL 5.1.52 data dumped:

mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo

PostgreSQL 8.4.9 server as destination

Loading the data with ‘psql -U rt_user -f foo’ is reporting (many of these, here’s one example):

psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

According the following, there are no NULL (0x00) characters in the input file.

database-dumps:rcf-temp1# sed 's/x0/ /g' < foo > nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls

Likewise, another check with Perl shows no NULLs:

database-dumps:rcf-temp1# perl -ne '/00/ and print;' foo
database-dumps:rcf-temp1#

As the «HINT» in the error mentions, I have tried every possible way to set ‘client_encoding’ to ‘UTF8’, and I succeed but it has no effect toward solving my problem.

database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
 client_encoding
-----------------
 UTF8
(1 row)

database-dumps:rcf-temp1#

Perfect, yet:

database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...

Barring the «According to Hoyle» correct answer, which would be fantastic to hear, and knowing that I really don’t care about preserving any non-ASCII characters for this seldom-referenced data, what suggestions do you have?

Update: I get the same error with an ASCII-only version of the same dump file at import time. Truly mind-boggling:

database-dumps:rcf-temp1# # convert any non-ASCII character to a space
database-dumps:rcf-temp1# perl -i.bk -pe 's/[^[:ascii:]]/ /g;' mysql5-dump.sql
database-dumps:rcf-temp1# sum mysql5-dump.sql mysql5-dump.sql.bk
41053 2545611 mysql5-dump.sql
50145 2545611 mysql5-dump.sql.bk
database-dumps:rcf-temp1# cmp mysql5-dump.sql mysql5-dump.sql.bk
mysql5-dump.sql mysql5-dump.sql.bk differ: byte 1304850, line 30
database-dumps:rcf-temp1# # GOOD!
database-dumps:rcf-temp1# psql -U postgres -f mysql5-dump.sql --variable=client_encoding=utf-8 rt3
...
INSERT 0 416
psql:mysql5-dump.sql:30: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 455
INSERT 0 424
INSERT 0 483
INSERT 0 447
INSERT 0 503
psql:mysql5-dump.sql:36: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 502
INSERT 0 507
INSERT 0 318
INSERT 0 284
psql:mysql5-dump.sql:41: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 382
INSERT 0 419
INSERT 0 247
psql:mysql5-dump.sql:45: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 267
INSERT 0 348
^C

One of the tables in question is defined as:

                                        Table "public.attachments"
     Column      |            Type             |                        Modifie
-----------------+-----------------------------+--------------------------------
 id              | integer                     | not null default nextval('atta)
 transactionid   | integer                     | not null
 parent          | integer                     | not null default 0
 messageid       | character varying(160)      |
 subject         | character varying(255)      |
 filename        | character varying(255)      |
 contenttype     | character varying(80)       |
 contentencoding | character varying(80)       |
 content         | text                        |
 headers         | text                        |
 creator         | integer                     | not null default 0
 created         | timestamp without time zone |
Indexes:
    "attachments_pkey" PRIMARY KEY, btree (id)
    "attachments1" btree (parent)
    "attachments2" btree (transactionid)
    "attachments3" btree (parent, transactionid)

I do not have the liberty to change the type for any part of the DB schema. Doing so would likely break future upgrades of the software, etc.

The likely problem column is ‘content’ of type ‘text’ (perhaps others in other tables as well). As I already know from previous research, PostgreSQL will not allow NULL in ‘text’ values. However, please see above where both sed and Perl show no NULL characters, and then further down where I strip all non-ASCII characters from the entire dump file but it still barfs.

Currently, we are facing an issue while inserting a record in PostgresSQL. My data-type is of type TEXT. It throws an error saying Error: invalid byte sequence for encoding «UTF8»: 0x00.

The data that we are trying to insert contains RTF text which contains text, image followed by again text. We also made sure there are no null values passed or inserted.

We are using **PostgresSQL **version 9.6 and 12 with an encoding set as UTF-8.

Any help would be appreciated.

The RTF data with text and images (contains special characters) should insert into PostgresSQL without any issues. Also, the data type should be of type TEXT.

Ken White's user avatar

Ken White

123k14 gold badges224 silver badges441 bronze badges

asked Mar 24, 2022 at 13:20

Hrushi-UMANG's user avatar

3

You cannot store a zero byte as part of a text string in PostgreSQL. You have two options:

  • remove this character from the input string if it is not required

  • use data type bytea, which is suitable for binary data

If you want to stick with text, you should also figure out what the encoding of the file is.

answered Mar 24, 2022 at 13:43

Laurenz Albe's user avatar

Laurenz AlbeLaurenz Albe

202k17 gold badges193 silver badges249 bronze badges

If you must use text type, you could save it base64 encoded.

There is a space overhead in the DB of about 30% over the raw data and a modest amount of CPU in your app code to encode/decode it.

answered Mar 24, 2022 at 14:32

Bohemian's user avatar

BohemianBohemian

409k90 gold badges570 silver badges715 bronze badges

You can use the decode function to insert binary values, including hex 0, into a text column:

postgres=# create table jos(foo text);
CREATE TABLE
postgres=# insert into jos values('foo'||decode('00', 'hex')||'bar');
INSERT 0 1
postgres=# select * from jos;
     foo
------------------
 x666f6f00626172
(1 row)

answered Jul 28, 2022 at 8:23

geert3's user avatar

geert3geert3

7,0501 gold badge33 silver badges49 bronze badges

I get the following error when inserting data from mysql into postgres.

Do I have to manually remove all null characters from my input data?
Is there a way to get postgres to do this for me?

ERROR: invalid byte sequence for encoding "UTF8": 0x00

asked Aug 28, 2009 at 15:13

ScArcher2's user avatar

ScArcher2ScArcher2

84.3k43 gold badges119 silver badges160 bronze badges

PostgreSQL doesn’t support storing NULL (x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field — which should store anything you want, but won’t support text operations on it.

Given that PostgreSQL doesn’t support it in text values, there’s no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it’s likely going to be easier to do that in preprocessing before you load it.

StackzOfZtuff's user avatar

answered Aug 28, 2009 at 18:06

Magnus Hagander's user avatar

Magnus HaganderMagnus Hagander

23.2k4 gold badges54 silver badges43 bronze badges

1

If you are using Java, you could just replace the x00 characters before the insert like following:

myValue.replaceAll("u0000", "")

The solution was provided and explained by Csaba in following post:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

in Java you can actually have a «0x0» character in your string, and
that’s valid unicode. So that’s translated to the character 0x0 in
UTF8, which in turn is not accepted because the server uses null
terminated strings… so the only way is to make sure your strings
don’t contain the character ‘u0000’.

answered Aug 22, 2017 at 6:24

David Dal Busco's user avatar

David Dal BuscoDavid Dal Busco

7,67515 gold badges52 silver badges94 bronze badges

1

Just regex out null bytes:

s/x00//g;

answered Jan 8, 2013 at 16:12

hicham's user avatar

2

Only this regex worked for me:

sed 's/
sed 's///g' 

//g'

So as you get your data do this: $ get_data | sed 's///g' which will output your data without 0x00

answered Oct 5, 2018 at 15:21

techkuz's user avatar

techkuztechkuz

3,3475 gold badges32 silver badges60 bronze badges

You can first insert data into blob field and then copy to text field with the folloing function

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'
CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$ Declare ref record; i integer; Begin FOR ref IN SELECT id, blob_field FROM table LOOP -- find 0x00 and replace with space i := position(E'00'::bytea in ref.blob_field); WHILE i > 0 LOOP ref.bob_field := set_byte(ref.blob_field, i-1, 20); i := position(E'00'::bytea in ref.blobl_field); END LOOP UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id; END LOOP; End; $$ LANGUAGE plpgsql; 

0'::bytea in ref.blob_field);
WHILE i > 0 LOOP
ref.bob_field := set_byte(ref.blob_field, i-1, 20);
i := position(E'

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$ Declare ref record; i integer; Begin FOR ref IN SELECT id, blob_field FROM table LOOP -- find 0x00 and replace with space i := position(E'00'::bytea in ref.blob_field); WHILE i > 0 LOOP ref.bob_field := set_byte(ref.blob_field, i-1, 20); i := position(E'00'::bytea in ref.blobl_field); END LOOP UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id; END LOOP; End; $$ LANGUAGE plpgsql; 

0'::bytea in ref.blobl_field);
END LOOP

UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
END LOOP;

End; $$ LANGUAGE plpgsql;

SELECT blob2text();

Hambone's user avatar

Hambone

15.2k7 gold badges49 silver badges66 bronze badges

answered Oct 13, 2009 at 6:15

Raido's user avatar

RaidoRaido

5715 silver badges11 bronze badges

If you need to store null characters in text fields and don’t want to change your data type other than text then you can follow my solution too:

Before insert:

myValue = myValue.replaceAll("u0000", "SomeVerySpecialText")

After select:

myValue = myValue.replaceAll("SomeVerySpecialText","u0000")

I’ve used «null» as my SomeVerySpecialText which I am sure that there will be no any «null» string in my values at all.

answered Nov 26, 2018 at 10:04

Ismail Yavuz's user avatar

Ismail YavuzIsmail Yavuz

6,4186 gold badges29 silver badges49 bronze badges

This kind of error can also happen when using COPY and having an escaped string containing NULL values(00) such as:

"Hx00x00x00tjxA8x9E#Dx98+xCAxF0xA7xBBlxC5x19xD7x8DxB6x18xEDJx1En"

If you use COPY without specifying the format 'CSV' postgres by default will assume format 'text'. This has a different interaction with backlashes, see text format.

If you’re using COPY or a file_fdw make sure to specify format 'CSV' to avoid this kind of errors.

answered Aug 6, 2019 at 2:36

Steve Chavez's user avatar

I’ve spent the last 8 hours trying to import the output of ‘mysqldump —compatible=postgresql’ into PostgreSQL 8.4.9, and I’ve read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.

MySQL 5.1.52 data dumped:

mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo

PostgreSQL 8.4.9 server as destination

Loading the data with ‘psql -U rt_user -f foo’ is reporting (many of these, here’s one example):

psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

According the following, there are no NULL (0x00) characters in the input file.

database-dumps:rcf-temp1# sed 's/x0/ /g' < foo > nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls

Likewise, another check with Perl shows no NULLs:

database-dumps:rcf-temp1# perl -ne '/00/ and print;' foo
database-dumps:rcf-temp1#

As the «HINT» in the error mentions, I have tried every possible way to set ‘client_encoding’ to ‘UTF8’, and I succeed but it has no effect toward solving my problem.

database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
 client_encoding
-----------------
 UTF8
(1 row)

database-dumps:rcf-temp1#

Perfect, yet:

database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...

Barring the «According to Hoyle» correct answer, which would be fantastic to hear, and knowing that I really don’t care about preserving any non-ASCII characters for this seldom-referenced data, what suggestions do you have?

Update: I get the same error with an ASCII-only version of the same dump file at import time. Truly mind-boggling:

database-dumps:rcf-temp1# # convert any non-ASCII character to a space
database-dumps:rcf-temp1# perl -i.bk -pe 's/[^[:ascii:]]/ /g;' mysql5-dump.sql
database-dumps:rcf-temp1# sum mysql5-dump.sql mysql5-dump.sql.bk
41053 2545611 mysql5-dump.sql
50145 2545611 mysql5-dump.sql.bk
database-dumps:rcf-temp1# cmp mysql5-dump.sql mysql5-dump.sql.bk
mysql5-dump.sql mysql5-dump.sql.bk differ: byte 1304850, line 30
database-dumps:rcf-temp1# # GOOD!
database-dumps:rcf-temp1# psql -U postgres -f mysql5-dump.sql --variable=client_encoding=utf-8 rt3
...
INSERT 0 416
psql:mysql5-dump.sql:30: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 455
INSERT 0 424
INSERT 0 483
INSERT 0 447
INSERT 0 503
psql:mysql5-dump.sql:36: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 502
INSERT 0 507
INSERT 0 318
INSERT 0 284
psql:mysql5-dump.sql:41: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 382
INSERT 0 419
INSERT 0 247
psql:mysql5-dump.sql:45: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 267
INSERT 0 348
^C

One of the tables in question is defined as:

                                        Table "public.attachments"
     Column      |            Type             |                        Modifie
-----------------+-----------------------------+--------------------------------
 id              | integer                     | not null default nextval('atta)
 transactionid   | integer                     | not null
 parent          | integer                     | not null default 0
 messageid       | character varying(160)      |
 subject         | character varying(255)      |
 filename        | character varying(255)      |
 contenttype     | character varying(80)       |
 contentencoding | character varying(80)       |
 content         | text                        |
 headers         | text                        |
 creator         | integer                     | not null default 0
 created         | timestamp without time zone |
Indexes:
    "attachments_pkey" PRIMARY KEY, btree (id)
    "attachments1" btree (parent)
    "attachments2" btree (transactionid)
    "attachments3" btree (parent, transactionid)

I do not have the liberty to change the type for any part of the DB schema. Doing so would likely break future upgrades of the software, etc.

The likely problem column is ‘content’ of type ‘text’ (perhaps others in other tables as well). As I already know from previous research, PostgreSQL will not allow NULL in ‘text’ values. However, please see above where both sed and Perl show no NULL characters, and then further down where I strip all non-ASCII characters from the entire dump file but it still barfs.


In this article, we will see how you can fix error ‘invalid byte sequence for encoding UTF8’ while restoring a PostgreSQL database. At work, I got a task to move DBs which has ASCII encoding to UTF8 encoding. Let me first confess that the ASCII DBs was not created by intention, someone accidentally created it!!! Having a DB ASCII encoded is very dangerous, it should be moved to UTF8 encoding as soon as possible. So the initial plan was to create archive dump of the DB with  pg_dump , create a new DB with UTF8 encoding and restore the dump to the new DB using  pg_restore . The plan worked for most of the DBs, but failed for one DB with below error.

DETAIL: Proceeding with relation creation anyway.
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 35091; 0 2527787452 TABLE DATA my_table release
pg_restore: [archiver (db)] COPY failed for table "my_table": ERROR: invalid byte sequence for encoding "UTF8": 0xa5
CONTEXT: COPY my_table, line 41653
WARNING: errors ignored on res

As the error says, there are some invalid UTF8 characters in table “my_table” which prevents pg_restore from restoring the particular table. I did a lot of research and googling to see what to do. I will list out what all steps I did.

Assume ‘my_db’ and ‘my_table’ is the database name and table name respectively.

Step 1:

Dump the Database excluding particular table ‘my_table’. I would suggest dumping the database in archive format for saving time and disk space.

pg_dump -Fc -T 'my_table' -p 1111  -f dbdump.pgd my_db

Step 2:

Create the new database with UTF8 encoding and restore the dump.

pg_restore -p 2222 -j 8 -d my_new_db dbdump.pgd

The restoration should be successful as we didn’t restore the offending table.

Step 3:

Dump the offending table ‘my_table’ in plain text format.

pg_dump -Fp -t 'my_table' -p 1111 my_db >  my_db_table_only.sql

Step 4:

Now we have table data in plain text. Let’s find invalid UTF8 characters in the file by running below command(make sure locale is set to UTF-8,).

# grep -naxv '.*'   my_db_table_only.sql
102:2010-03-23 ��ԥ�	data1 data2

� represents an invalid UTF8 character and it is present in 102th line of the file.

Step 5:

Find which charset the invalid UTF8 characters belongs to.

#grep -naxv '.*' my_db_table_only.sql > test.txt 
#file -i test.txt
test.txt: text/plain; charset=iso-8859-1

As per the output, those characters belongs to iso-8859-1. The charset may be different in your case.

Step 6:

Let’s convert  iso-8859-1  to  UTF8  using  iconv  command.

#grep -naxv '.*' my_db_table_only.sql |  iconv --from-code=ISO-8859-1 --to-code=UTF-8
102:2010-03-23 ¥Êԥ¡ data1 data2

Now you got the characters in UTF8 encoding. So you can just replace  ��ԥ� with  ¥Êԥ¡   in 102th line of dump file(I used  nano  editor to do this, faced issues with  Vim .)

I know that replacing characters manually could be a pain in the ass if there are lot of invalid UTF8 characters. We can run  iconv  on the whole file as shown below.

iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_db_table_only.sql  > my_db_table_only_utf8.sql

But I won’t recommend this as it may change valid characters(eg: Chinese characters ) to some other characters. If you plan to run iconv on the file, just make sure only invalid UTF8 characters are converted by taking  diff  of both files.

Step7.

Once the characters are replaced. Restore the table to the database.

psql -p 2222 -d my_new_db -f my_db_table_only.sql

No more “Invalid byte sequence for encoding UTF8” error. Thanks for the time taken to read my blog. Subscribe to this blog so that you don’t miss out anything useful   (Checkout Right Sidebar for the Subscription Form and Facebook follow button)  . Please also put your thoughts as comments .

Если вам нужно хранить данные UTF8 в своей базе данных, вам нужна база данных, которая принимает UTF8. Вы можете проверить кодировку своей базы данных в pgAdmin. Просто щелкните правой кнопкой мыши базу данных и выберите «Свойства».

Но эта ошибка, похоже, говорит вам о некоторых недопустимых данных UTF8 в исходном файле. Это означает, что утилита copy обнаружила или предположила, что вы загружаете файл UTF8.

Если вы работаете под некоторым вариантом Unix, вы можете проверить кодировку (более или менее) с помощью file.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(Я думаю, что это будет работать и на Mac в терминале.) Не уверен, как это сделать в Windows.

Если вы используете ту же самую утилиту для файла, который поступает из систем Windows (то есть файла, который не закодирован в UTF8), он, вероятно, будет показывать что-то вроде этого:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

Если ситуация остается странной, вы можете попытаться преобразовать свои входные данные в известную кодировку, изменить свою клиентскую кодировку или и то, и другое. (Мы действительно растягиваем пределы моих знаний о кодировках.)

Вы можете использовать утилиту iconv для изменения кодировки входных данных.

iconv -f original_charset -t utf-8 originalfile > newfile

Вы можете изменить кодировку psql (клиент), следуя инструкциям Поддержка набора символов. На этой странице найдите фразу «Включить автоматическое преобразование набора символов».

Describe what is not working as expected.
while inserting a data from a file there we are getting the error msg of this.

Description of the Issue
we have many columns in a table and like that their many tables we cannot add on each and every table

Is there any other way we can do it As if when we do it in MSSQL it automatically removes the space. But in Postgres, we have the issue I think.

Workaround
we have to replace the space in the column

Exception message:
Stack trace:
at Npgsql.NpgsqlConnector.<>c__DisplayClass161_0.<g__ReadMessageLong|0>d.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Npgsql.NpgsqlConnector.<>c__DisplayClass161_0.<g__ReadMessageLong|0>d.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Threading.Tasks.ValueTask`1.get_Result()
at Npgsql.NpgsqlDataReader.d__46.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.d__100.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet)

C# .NET
Npgsql version:4.0.4
PostgreSQL version:11.1
Operating system:Windows R2 2012

Symptoms

  • When migrating Stash’s datastore to a PostgreSQL database, the following error is shown in the administration web interface:

    Stash could not be migrated to the new database. PostgreSQL does not allow null characters (U+0000) in text columns. See the following knowledge base to solve the problem: https://confluence.atlassian.com/x/OwOCKQ
  • When restoring a backup to a Stash instance that uses a PostgreSQL database, the restore fails and the following error appears in the atlassian-stash.log:

    Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2198) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1927) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:365) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:203) ~[bonecp-0.7.1.RELEASE.jar:0.7.1.RELEASE]
        at com.atlassian.stash.internal.backup.liquibase.DefaultLiquibaseDao.insert(DefaultLiquibaseDao.java:272) ~[stash-dao-impl-3.6.0-SNAPSHOT.jar:na]
        ... 39 common frames omitted

Cause

This problem occurs because PostgreSQL does not allow null characters (U+0000) in its text data types. As a result, when migrating or restoring a backup to a PostgreSQL database, the operation can fail with the error above. This problem is restricted to PostgreSQL.  Other databases supported by Stash are not affected by null characters.

Resolution

Follow the steps below to sanitize the source database and then re-run the migration or restore.

  1. Stop Stash.
  2. Find and remove the null characters (U+0000) in the source database text columns. Most likely candidates are comments (sta_comment table) or plugin settings (plugin_setting table).
    To remove the null characters on those tables, run the following SQL queries on the source database.
      1. If the source database is MySQL:

        SELECT * FROM sta_comment WHERE comment_text like concat('%', 0x00, '%');
        UPDATE sta_comment SET comment_text = replace(comment_text, 0x00, '') WHERE comment_text like concat('%', 0x00, '%');
        SELECT * FROM plugin_setting WHERE key_value like concat('%', 0x00, '%');
        UPDATE plugin_setting SET key_value = replace(key_value, 0x00, '') WHERE key_value like concat('%', 0x00, '%');
      2. If the source database is Oracle:

        SELECT * FROM sta_comment WHERE instr(comment_text, unistr('000')) > 0;
        UPDATE sta_comment SET comment_text = replace(comment_text, unistr('000')) WHERE instr(comment_text, unistr('000')) > 0;
        SELECT * FROM plugin_setting WHERE instr(key_value, unistr('000')) > 0;
        UPDATE plugin_setting SET key_value = replace(key_value, unistr('000')) WHERE instr(key_value, unistr('000')) > 0;
      3. If the source database is Microsoft SQL Server, execute the following T-SQL code (note that a custom function is used because the built-in REPLACE function cannot replace null characters):

        IF OBJECT_ID (N'dbo.removeNullCharacters', N'FN') IS NOT NULL
            DROP FUNCTION removeNullCharacters;
        GO
        CREATE FUNCTION dbo.removeNullCharacters(@s nvarchar(max))
        RETURNS nvarchar(max)
        AS
        BEGIN
                DECLARE @c nchar(1)
                DECLARE @p int
                DECLARE @ret nvarchar(max)
                IF @s is NULL
                        SET @ret = @s
                ELSE
                BEGIN
                        SET @p = 0
                        SET @ret = ''
                        WHILE (@p <= LEN(@s))
                        BEGIN
                                SET @c = SUBSTRING(@s, @p, 1)
                                IF @c <> nchar(0)
                                BEGIN
                                        SET @ret = @ret + @c
                                END
                                SET @p = @p + 1
                        END
                END
                RETURN @ret
        END;
        SELECT * FROM sta_comment WHERE cast(comment_text AS varchar) like '%' + char(0) +'%';
        UPDATE sta_comment SET comment_text = dbo.removeNullCharacters(comment_text) WHERE cast(comment_text AS varchar) like '%' + char(0) +'%';
        SELECT * FROM plugin_setting WHERE cast(key_value AS varchar) like '%' + char(0) +'%';
        UPDATE plugin_setting SET key_value = dbo.removeNullCharacters(key_value) WHERE cast(key_value AS varchar) like '%' + char(0) +'%';
      4. If the source database is HSQLDB, either:

        • Migrate the database to an intermediate external database (such as MySQL), or

        • Find the problematic rows using the following queries and manually edit them to remove the null characters (U+0000);

          SELECT * FROM sta_comment WHERE comment_text like U&'%000%';
          SELECT * FROM plugin_setting WHERE key_value like U&'%000%';

          Note: Before accessing Stash’s HSQLDB (internal database) with an external tool, ensure Stash is not running.
          Note: Stash’s HSQLDB database (its internal database) can be opened by any database management tool that supports the JDBC protocol (such as DbVisualizer), using the following settings: 

          • Database driver: HSQLDB Server
          • Database driver location: STASH_INSTALL/atlassian-stash/WEB-INF/lib/hsqldb-2.2.4.jar (where STASH_INSTALL is the path to the Stash installation directory)

          • Database user: SA
          • JDBC URL: jdbc:hsqldb:file:STASH_HOME/shared/data/db;shutdown=true;hsqldb.tx=mvlocks (where STASH_HOME is the path to the Stash home directory)

  3. Re-create the PostgreSQL database (using the settings highlighted here) used in the original migration if it is not empty (for safety reasons, Stash blocks any migration to a non-empty database).
  4. Start Stash.
  5. Initiate the migration or the restoration of the backup once more.
  6. If the migration or restoration still fails, use the following instructions to diagnose the cause:
    1. Turn on PostgreSQL statement logging.
    2. Recreate the target PostgreSQL database to ensure it is empty.
    3. Restart the migration or the backup restoration to trigger the error again.
    4. Consult the PostgreSQL statement log to determine which SQL INSERT failed. This will indicate which table still contains the null characters which have to be sanitized as described above.
    5. Restart from step (a) until the migration or restore succeeds.

Last modified on Feb 26, 2016

Related content

  • No related content found

Поскольку это массовый импорт, я предполагаю, что вы создаете файл или какую-то большую строку для отправки в Postgres? В этом случае в строках, вероятно, включены escape-символы, а не выполнение этого, скажем, с помощью подготовленного оператора. Так что, вероятно, это в вашей строке, которую Postgres экранирует и интерпретирует как 0x00.

Из документов: https://www.postgresql.org/docs/8.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS

PostgreSQL also accepts «escape» string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g. E’foo’. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character () begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represents a special byte value. b is a backspace, f is a form feed, n is a newline, r is a carriage return, t is a tab. Also supported are digits, where digits represents an octal byte value, and xhexdigits, where hexdigits represents a hexadecimal byte value. (It is your responsibility that the byte sequences you create are valid characters in the server character set encoding.) Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes (). Also, a single quote can be included in an escape string by writing ‘, in addition to the normal way of ».

Так что, если ваш массовый оператор добавляет к строкам букву E, например E’hello’, не делайте этого.

I have a Java EE app configured to use PicketLink as its security framework. The app is running on JBoss EAP 7 (or Wildfly 10), and is using PostgreSQL 9.4 as its backend datastore. I’m initializing the PicketLink IDM framework to use PBKDF2 password encoding like so:

builder.named("MyApp").stores().jpa()
        .setCredentialHandlerProperty(PasswordCredentialHandler.PASSWORD_ENCODER,
                new PBKDF2PasswordEncoder("salty".getBytes(), 10000, 128))
        .addContextInitializer(new PicketLinkContextInitializer(this.entityManager));

With this configuration, however, I periodically get the following Exception when trying to set a password:

Caused by: org.picketlink.idm.IdentityManagementException: PLIDM000201: Credential update failed for account [net.odyssi.identity.security.model.ExtendedUser@ac8b3d20] and type [org.picketlink.idm.credential.Password@17321a51].
    at org.picketlink.idm.internal.ContextualIdentityManager.updateCredential(ContextualIdentityManager.java:235) [picketlink-idm-impl-2.7.0.Final.jar:]
    at org.picketlink.idm.internal.ContextualIdentityManager.updateCredential(ContextualIdentityManager.java:217) [picketlink-idm-impl-2.7.0.Final.jar:]
    at net.odyssi.identity.services.security.CredentialManager.changeAccountPassword(CredentialManager.java:75) [IdentityPlatform-ejb-0.1-SNAPSHOT.jar:]
    ... 175 more
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.DataException: could not execute statement
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1692) [hibernate-entitymanager-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1602) [hibernate-entitymanager-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1608) [hibernate-entitymanager-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.flush(AbstractEntityManagerImpl.java:1303) [hibernate-entitymanager-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.jboss.as.jpa.container.AbstractEntityManager.flush(AbstractEntityManager.java:459) [wildfly-jpa-7.0.0.GA-redhat-2.jar:7.0.0.GA-redhat-2]
    at org.picketlink.idm.jpa.internal.JPAIdentityStore.storeCredential(JPAIdentityStore.java:912) [picketlink-idm-impl-2.7.0.Final.jar:]
    at org.picketlink.idm.credential.handler.AbstractCredentialHandler.update(AbstractCredentialHandler.java:225) [picketlink-idm-api-2.7.0.Final.jar:]
    at org.picketlink.idm.credential.handler.AbstractCredentialHandler.update(AbstractCredentialHandler.java:51) [picketlink-idm-api-2.7.0.Final.jar:]
    at org.picketlink.idm.internal.AbstractIdentityStore.updateCredential(AbstractIdentityStore.java:145) [picketlink-idm-impl-2.7.0.Final.jar:]
    at org.picketlink.idm.internal.ContextualIdentityManager.updateCredential(ContextualIdentityManager.java:231) [picketlink-idm-impl-2.7.0.Final.jar:]
    ... 177 more
Caused by: org.hibernate.exception.DataException: could not execute statement
    at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:118) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:109) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:95) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.executeUpdate(ResultSetReturnImpl.java:207) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.jdbc.batch.internal.NonBatchingBatch.addToBatch(NonBatchingBatch.java:45) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2886) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3386) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:89) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:560) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:434) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:337) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1282) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.flush(AbstractEntityManagerImpl.java:1300) [hibernate-entitymanager-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    ... 183 more
Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2198)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1927)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:365)
    at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeUpdate(WrappedPreparedStatement.java:537)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.executeUpdate(ResultSetReturnImpl.java:204) [hibernate-core-5.0.9.Final-redhat-1.jar:5.0.9.Final-redhat-1]
    ... 193 more

I say periodically, because the error does not appear to occur every single time. I can set small numbers of passwords (<5) with no problem. However, if I try to create user accounts in bulk, the error happens every time. I also see it crop up occasionally, regardless of how many accounts I am creating.

Any idea why this error is occurring or how I can correct it?

Если вам нужно хранить данные UTF8 в своей базе данных, вам нужна база данных, которая принимает UTF8. Вы можете проверить кодировку своей базы данных в pgAdmin. Просто щелкните правой кнопкой мыши базу данных и выберите «Свойства».

Но эта ошибка, похоже, говорит вам о некоторых недопустимых данных UTF8 в исходном файле. Это означает, что утилита copy обнаружила или предположила, что вы загружаете файл UTF8.

Если вы работаете под некоторым вариантом Unix, вы можете проверить кодировку (более или менее) с помощью file.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(Я думаю, что это будет работать и на Mac в терминале.) Не уверен, как это сделать в Windows.

Если вы используете ту же самую утилиту для файла, который поступает из систем Windows (то есть файла, который не закодирован в UTF8), он, вероятно, будет показывать что-то вроде этого:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

Если ситуация остается странной, вы можете попытаться преобразовать свои входные данные в известную кодировку, изменить свою клиентскую кодировку или и то, и другое. (Мы действительно растягиваем пределы моих знаний о кодировках.)

Вы можете использовать утилиту iconv для изменения кодировки входных данных.

iconv -f original_charset -t utf-8 originalfile > newfile

Вы можете изменить кодировку psql (клиент), следуя инструкциям Поддержка набора символов. На этой странице найдите фразу «Включить автоматическое преобразование набора символов».

Понравилась статья? Поделить с друзьями:
  • Ошибка неверные данные для печати масса к
  • Ошибка не удалось обнаружить vcruntime140 dll
  • Ошибка неверное число sql
  • Ошибка не удается проверить имя пользователя
  • Ошибка неверное имя пользователя или пароль ржд