Ошибка illegal character u00bb java - Не ошибается лишь тот, кто ничего не делает!

I have a program that allows a user to type java code into a rich text box and then compile it using the java compiler. Whenever I try to compile the code that I have written I get an error that says that I have an illegal character at the beginning of my code that is not there. This is the error the compiler is giving me:

C:UsersTravis Michael>"Program FilesJavajdk1.6.0_17binjavac" Test.java
Test.java:1: illegal character: 187
∩╗┐public class Test
 ^
Test.java:1: illegal character: 191
∩╗┐public class Test
  ^
2 errors

Chad Carisch

2,4123 gold badges22 silver badges30 bronze badges

asked Jan 2, 2010 at 21:33

The BOM is generated by, say, File.WriteAllText() or StreamWriter when you don’t specify an Encoding. The default is to use the UTF8 encoding and generate a BOM. You can tell the java compiler about this with its -encoding command line option.

The path of least resistance is to avoid generating the BOM. Do so by specifying System.Text.Encoding.Default, that will write the file with the characters in the default code page of your operating system and doesn’t write a BOM. Use the File.WriteAllText(String, String, Encoding) overload or the StreamWriter(String, Boolean, Encoding) constructor.

Just make sure that the file you create doesn’t get compiled by a machine in another corner of the world. It will produce mojibake.

answered Jan 2, 2010 at 22:11

Hans PassantHans Passant

918k145 gold badges1681 silver badges2525 bronze badges

That’s a byte order mark, as everyone says.

javac does not understand the BOM, not even when you try something like

javac -encoding UTF8 Test.java

You need to strip the BOM or convert your source file to another encoding. Notepad++ can convert a single files encoding, I’m not aware of a batch utility on the Windows platform for this.

The java compiler will assume the file is in your platform default encoding, so if you use this, you don’t have to specify the encoding.

answered Jan 2, 2010 at 22:30

zneozneo

5883 silver badges10 bronze badges

If using an IDE, specify the java file encoding (via the properties panel)
If NOT using an IDE, use an advanced text-editor (I can recommend Notepad++) and set the encoding to «UTF without BOM», or «ANSI», if that suits you.

answered Jan 2, 2010 at 21:43

BozhoBozho

586k144 gold badges1057 silver badges1137 bronze badges

In this case do the following Steps 1-7

In Android Studio

1. Menu -> Edit -> Select All
2. Menu -> Edit -> Cut

Open new Notepad.exe

In Notepad

4. Menu -> Edit -> Paste
5. Menu -> Edit -> Select All
6. Menu -> Edit -> Copy

Back In Android Studio

7. Menu -> Edit -> Paste

answered Jan 21, 2018 at 17:16

IngoIngo

5,1591 gold badge29 silver badges24 bronze badges

http://en.wikipedia.org/wiki/Byte_order_mark

The byte order mark (BOM) is a Unicode
character used to signal the
endianness (byte order) of a text file
or stream. Its code point is U+FEFF.
BOM use is optional, and, if used,
should appear at the start of the text
stream. Beyond its specific use as a
byte-order indicator, the BOM
character may also indicate which of
the several Unicode representations
the text is encoded in.

The BOM is a funky-looking character that you sometimes find at the start of unicode streams, giving a clue what the encoding is. It’s usually handles invisibly by the string-handling stuff in Java, so you must have confused it somehow, but without seeing your code, it’s hard to see where.

You might be able to fix it trivially by manually stripping the BOM from the string before feeding it to javac. It probably qualifies as whitespace, so try calling trim() on the input String, and feeding the output of that to javac.

answered Jan 2, 2010 at 21:42

skaffmanskaffman

398k96 gold badges816 silver badges768 bronze badges

That’s a problem related to BOM (Byte Order Mark) character. Byte Order Mark BOM is an Unicode character used for defining a text file byte order and comes in the start of the file. Eclipse doesn’t allow this character at the start of your file, so you must delete it. for this purpose, use a rich text editor like Notepad++ and save the file with encoding «UTF-8 without BOM». That should remove the problem.

I have copy pasted the some content from a website to a Notepad++ editor,
it shows the "LS" with black background. Have deleted the "LS" content and 
have copy the same content from notepad++ to java file, it works fine.

answered Mar 15, 2016 at 14:10

anand krishanand krish

4,2114 gold badges42 silver badges47 bronze badges

I solved this by right clicking in my textEdit program file and selecting [substitutions] and un-checking smart quotes.

answered Nov 11, 2016 at 18:53

instead of getting Notepad++,
You can simply
Open the file with Wordpad
and then
Save As — Plain Text document

answered Sep 6, 2016 at 15:29

Even I was facing this issue as am using notepad++ to code. It is very convenient to type the code in notepad++. However after compiling I get an error » error: illegal character: ‘u00bb'».
Solution :
Start writing the code in older version of notepad(which will be there by default in your PC) and save it. Later the modifications can be done using notepad++.
It works!!!

answered Jul 3, 2016 at 5:15

I had the same problem with a file i generated using the command echo echo "" > Main.java in Windows Powershell. I searched the problem and it seemed to have something to do with encoding. I checked the encoding of the file using file -i Main.java and the result was text/plain; charset=utf-16le.

Later i deleted the file and recreated it using git bash using touch Main.java and with this the file compiled successfully. I checked the file encoding using file -i command and this time the result was Main.java: text/x-c; charset=us-ascii.

Next i searched the internet and found that to create an empty file using Powershell we can use the Cmdlet New-Item. I create the file using New-Item Main.java and checked it’s encoding and this time the result was Main.java: text/x-c; charset=us-ascii and this time it compiled successully.

answered Apr 10, 2021 at 6:01

velocityvelocity

1,55019 silver badges24 bronze badges

Источник

Действительно, в моей системе всё содержимое файлов, которые я подавал на вход стандартному компилятору Java (Javac), воспринималось в кодировке Windows-1251, в качестве исходной. Это было выяснено экспериментальным путём и несколько раз подтверждено. Что интересно, именно эта кодировка является стандартной в моём случае, даже несмотря на то, что моя операционная система, а у меня стоит Windows 10 Enterprise x86-64 version 10.0.18363.778 (Win10 19H2 [1909] November 2019 Update), устанавливалась с английского дистрибутива и не содержала дополнительных языковых пакетов. Кроме того, в региональных настройках у меня включена опция Beta: Use Unicode UTF-8 for worldwide language support, которая предполагает использование UTF-8 в качестве стандартной кодировки, если я правильно понял смысл этого параметра. Тем не менее, полагаю, что кодировка Windows-1251 используется по той причине, что на этапе первоначальной установки операционный системы, в качестве текущего формата времени и денежных единиц, была выбрана Россия с её параметром «Русский (Россия)».

Когда я запускал программу, которая была сохранена в кодировке UTF-8 с маркером последовательности байтов, то происходило примерно следующее. Поскольку я был уверен в том, что в качестве исходной кодировки и так будет использован UTF-8, то не стал использовать параметр -encoding и явно указывать на эту кодировку. На самом же деле файл обрабатывался в кодировке Windows-1251, что немного ввело в ступор, так как та ошибка, на которую указывал компилятор, была не совсем очевидной, если его поведение понимается ошибочно. Поскольку по умолчанию использовалась кодировка Windows-1251, представляющая собой однобайтовую кодовую страницу, то компилятор воспринимал нашу 3-байтовую последовательность в самом начале, как 3 отдельных символа, а не как один, что собственно и было задумано. Итак, последовательность EF BB BF читалась компилятором, как следующий набор символов: п»ї. К символам п и ї особых вопросов у нашего транслятора не возникло, так как эти символы могут быть частью (и даже началом) вполне корректного токена. Речь идёт об идентификаторах, которые могут состоять, в том числе и из символов различных кириллических алфавитов. А вот с правой французской кавычкой (right-pointing double angle quotation mark, », U+00BB) дела обстоят немного иначе. Дело в том, что этот символ не может быть частью корректных лексических токенов и может использоваться, разве что, как составляющая строковой константы. Именно по этой причине, находясь почти в любом месте в коде, он вызовет ошибку error: illegal character: 'u00bb', а про другие символы и их не совсем удачные сочетания, компилятор предпочтёт позабыть до лучших времён.

Что же касается самого бага, который действительно имеет место, то он распространяется не только на UTF-8, но и на другие кодировки. К примеру, если мы используем кодировку с указанием последовательности байтов (UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE) и сам маркер последовательности для соответствующей кодировки (тот самый BOM или U+FEFF), то можно увидеть, что компилятор всё равно не хочет его никак понимать и будет целенаправленно ругаться. Даже если мы всегда будем указывать верную кодировку и использовать параметр -encoding с соответствующим значением, то всё равно будем получать одну и ту же ошибку (разумеется в том случае, если в самом начале файла у нас будет идти BOM): error: illegal character: 'ufeff' (или 'ufffe', в случае перестановки байтов местами).

Ну и выводы, которые я для себя сделал. Не стоит использовать BOM в Java-файлах, очевидно, что компилятор с ним не дружит. Всегда использовать параметр -encoding, даже в тех случаях, когда этого можно было бы избежать, так будет намного спокойнее и понятнее. А ещё я понял, что не стоит надеяться на то, что данный баг хоть когда-нибудь пофиксят. Если этого не сделали за 19 лет, то скорее всего не сделают уже никогда.

Хочу поблагодарить @SergeyGornostaev за правильную наводку.

P.S. Хотелось внести небольшое дополнение относительно стандартов кодирования Unicode UTF-16 и UTF-32. Похоже, что вышеуказанный баг распространяется только на UTF-8, а с UTF-16 и UTF-32 немного иная ситуация. Провёл небольшую серию тестов и вроде бы сумел разобраться в этом деле. Если мы используем маркер последовательности байтов (U+FEFF) и в качестве декодирования предлагаем ориентироваться компилятору на UTF-16 или UTF-32 (без указания порядка следования байтов), то компилятор адекватно будет обрабатывать наш BOM, так как мы не указали в наших кодировках порядок байтов, а определять его как-то нужно (если нет маркера в начале потока, то по умолчанию используется порядок от старшего к младшему (англ. big-endian — большим концом)). Другое дело, когда мы ясно даём понять, что декодирование необходимо производить из UTF-16BE/LE и UTF-32BE/LE. В этом случае никакой BOM уже не нужен и вне зависимости от того, как он будет записан и будет ли присутствовать вообще, компилятор будет декодировать байты в том порядке, который диктует ему конкретная кодировка. Вот в этих случаях начинают вылазить ошибки. Дело в самом символе 'ufeff', который в противном случае считается уже не маркером последовательности байтов, а неразрывным пробелом нулевой ширины (zero-width non-breaking space). Использовать его можно в качестве какого-нибудь строкового литерала, но если он появляется не там где нужно, то это вызывает ошибку. В отличие от пробельных символов, он не является «незначащим» и уже не игнорируется просто так компилятором. Так что данный баг целиком и полностью имеет отношение исключительно к UTF-8.

Источник

Отзывы
О нас
CS50

java выдаёт ошибку / u00bb. при кампиляции public class test2{
public static void main(String[]args){
System.out.println(«введите число»);
Scanner scan = new Scanner(System.in);
int num = scan.nextint();
System.out.println(«ваше число» + num);
}
}
спасибо.

Этот веб-сайт использует данные cookie, чтобы настроить персонально под вас работу сервиса. Используя веб-сайт, вы даете согласие на применение данных cookie. Больше подробностей — в нашем Пользовательском соглашении.

Источник

IntellIJ Idea 2019

In my java file:

package com.java.testproject.java.TestProjectJava.leetcode;

public class TwoSum {

    public TwoSum() {

    }
}

Here client code:

public class Main {
    public static void main(String[] args) {
        System.out.println("Java version: " + System.getProperty("java.version"));
        System.out.println("Current date: " + new Date());
        System.out.println();

        new TwoSum();
}

But When compile I get error:

what is wrong with my code?

asked Oct 13, 2019 at 11:06

Load 7 more related questions

Show fewer related questions

Источник

java
javac

22-09-2019

Question

C:UsersTravis Michael>"Program FilesJavajdk1.6.0_17binjavac" Test.java
Test.java:1: illegal character: 187
∩╗┐public class Test
 ^
Test.java:1: illegal character: 191
∩╗┐public class Test
  ^
2 errors

Solution

Just make sure that the file you create doesn’t get compiled by a machine in another corner of the world. It will produce mojibake.

OTHER TIPS

That’s a byte order mark, as everyone says.

javac does not understand the BOM, not even when you try something like

javac -encoding UTF8 Test.java

You need to strip the BOM or convert your source file to another encoding. Notepad++ can convert a single files encoding, I’m not aware of a batch utility on the Windows platform for this.

The java compiler will assume the file is in your platform default encoding, so if you use this, you don’t have to specify the encoding.

If using an IDE, specify the java file encoding (via the properties panel)
If NOT using an IDE, use an advanced text-editor (I can recommend Notepad++) and set the encoding to «UTF without BOM», or «ANSI», if that suits you.

http://en.wikipedia.org/wiki/Byte_order_mark

The byte order mark (BOM) is a Unicode
character used to signal the
endianness (byte order) of a text file
or stream. Its code point is U+FEFF.
BOM use is optional, and, if used,
should appear at the start of the text
stream. Beyond its specific use as a
byte-order indicator, the BOM
character may also indicate which of
the several Unicode representations
the text is encoded in.

I have copy pasted the some content from a website to a Notepad++ editor,
it shows the "LS" with black background. Have deleted the "LS" content and 
have copy the same content from notepad++ to java file, it works fine.

I solved this by right clicking in my textEdit program file and selecting [substitutions] and un-checking smart quotes.

In Android Studio

1. Menu -> Edit -> Select All
2. Menu -> Edit -> Copy

Open new Notepad.exe

In Notepad

4. Menu -> Edit -> Paste
5. Menu -> Edit -> Select All
6. Menu -> Edit -> Copy

Back In Android Studio

7. Menu -> Edit -> Paste

instead of getting Notepad++,
You can simply
Open the file with Wordpad
and then
Save As — Plain Text document

Источник

Question

Solution

OTHER TIPS

Не пропустите эти материалы по теме: