Illegal opcode hp proliant ошибка - Не ошибается лишь тот, кто ничего не делает!

[Update]
As per Jason’s comment, with a new ILO4 update HP apparently has fixed an issue related to booting from SD cards. Whether this is the same issue is unclear though since the original KB article I linked to has not been updated.
[/Update]

Important note: The general symptom of such a Red Screen of Death described here is NOT specific to ESXi or booting from SD cards in general. It can happen with Windows, Linux or any other OS as well as other boot media such as normal disks/RAID arrays, if the server has a problem booting from this device (broken boot sector/partition/boot loader etc).

A couple of weeks ago I was updating a few HP Proliant DL360p Gen8 servers running ESXi on a local SD card with ESXi patches via VUM, so business as usual. Almost, because on one of the servers I ran into the following issue:
After rebooting the host, the BIOS POST completed fine and the Proliant DL360p Gen8 server should now boot the ESXi OS from it’s attached USB SD card where ESXi was installed; but instead it displayed this unsightly screen telling something went very, very wrong:

I reset the server several times via iLO but the issue persisted and I had no idea what exactly went bonkers here. Then I decided to boot a Linux live image, which worked fine, narrowing down the issue to the OS installation (device) itself. I thought the updates corrupted the installation but that actually wasn’t the case.
When attempting to mount the SD card USB drive from within the live Linux I noticed it was actually completely absent from the system. The USB bus was still ok, but lsusb showed no SD card reader device in the system at all!

Just to make sure I wasn’t imagining things I booted an ESXi installation medium too and likewise, it didn’t detect the local SD card but only the local RAID controller volume:

So the Illegal OpCode Red Screen of Death was probably the result of the server trying to force a boot from the local RAID array volume, which is a pure GPT VMFS5 volume without a proper boot partition.

I first thought the SD card reader or SD card was faulty but after googling around for a while I stumbled upon this article:
HP Advisory: ProLiant DL380p Gen8 Server -Server May Fail to Boot From an SD Card or USB Device After Frequent Reboots While Virtual Media Is Mounted in the HP Integrated Lights-Out 4 (iLO 4) Integrated Remote Console (IRC)

DESCRIPTION
In rare instances, a ProLiant DL380p Gen8 server may fail to boot from an SD card or a USB device after frequent reboots while Virtual Media is mounted in the HP Integrated Lights-Out 4 (iLO 4) Integrated Remote Console (IRC).
This issue can occur if the server is rebooted approximately every five minutes. If this occurs, the following message will be displayed: Non-System disk or disk error-replace and strike any key when ready
SCOPE
Any HP ProLiant DL380p Gen8 server with HP Integrated Lights-Out 4 (iLO 4).
RESOLUTION
If a ProLiant DL380p Gen8 server fails to boot from an SD card or a USB device, cold boot the server to recover from this issue.

The article only mentions DL380p Gen8 servers but I imagine the same could apply to DL360p Gen8 or other servers as well. The problem description doesn’t really fit all that well either to my case but I tried cold booting the server as instructed. And this did the trick. After leaving the server powered-off for about 5 minutes and powering it on again, it detected the SD card again and booted up the ESXi installation on it fine.
For good measure I rebooted the server another time, which also went without a hitch.

The key takeaway here:
1. As per the mentioned HP Advisory, the USB SD card device of a Proliant 380/360 Gen8 server might randomly disappear during a reboot, so be aware of that and try cold booting the server in that case.
2. When dealing with an Illegal OpCode boot error on a HP Proliant server like shown above, make sure you have a valid boot device and the BIOS is properly configured to boot from this device.
On a physical Linux host for example the grub boot loader might be corrupted, which can easily be fixed by re-installing grub with a live Linux. I’ve had that happen to me with physical Linux servers before.

Источник

HP ProLiant DL580 G5: Illegal Opcode on boot
After a nightmarish 24 hours, I’ve successfully installed Ubuntu 10.04 x86-64 Server on an HP ProLiant DL580 G5 server with an added P800 Raid Controller device. I wanted to make a public record on the steps to finish the installation in case it helps anyone in the future.

The issue we were running in to was the message «Illegal Opcode» given after BIOS startup before the OS could load, even after a successful OS installation. HP support confirmed that this message is given when the MBR on the boot controller does not refer to a valid bootable partition.

First, our configuration (after several troubleshooting iterations — I’ll leave out those steps):

HP ProLiant DL580 G5 with 32GB ECC-RAM, storage:
* 2x 70GB SAS storage via p400 Raid Controller, configured as 140GB RAID 0 device («/dev/cciss/c1d0»), designated in the raid configuration manager (ORCA) as the Boot Controller
* 24x 1TB SAS storage via p800 Raid Controller, configured as 2x 10TB RAID 6 devices («/dev/cciss/c0d0» and «/dev/cciss/c0d1»)

In BIOS, the boot settings were:

Boot order was set to CD, USB, Floppy, Hard Drive, Ethernet
Hard Drive order was set to p400, IDE, p800 (note that there were no IDE drives, but for some reason the BIOS wasn’t allowing us to move that device in the order.)

c1d0 was partitioned as:
* primary #1 — «/boot» — 2GB — ext2
* extended #5 — «/» — 100GB — ext4
* extended #6 — swap — remainder (~38GB) — swap

c0d0 and c0d1 were partitioned partitioned with lvm. Note that small chunks (~1MB) were left ‘free’ on either side of the 10TB lvm partitions, I assume this is a ‘parted’ or an lvm issue — it did not effect final performance.

These partitions were then linked in lvm as a single 20TB JFS partition mounted inside of the root file system. (JFS because e2fsprogs doesn’t handle creation of ext4 drives larger than 16TB… still… more than a year after listing this as a ‘top priority’.)

Installation then proceeded as expected, but note that

grub-install uses the wrong drive. Specifically, grub-install (as executed by the install script) was installing grub on to /dev/cciss/c0d0, I assume because it was detecting that drive as hd(0). Because the p400 was addressed as /dev/cciss/c1d0 and was also set as the boot controller, grub was sent to the wrong drive, and thus the explanation for the «Illegal Opcode» error on boot.

The Fix:

(First I should mention that right before fixing the issue, we also updated all the firmware on the server at HP Support’s suggestion. I cannot rule out that this did not cause the success, although I personally feel it did not make the difference.)

As the very last step, when the install script ejects the install CD and asks you to press enter to reboot,

do not press enter, and instead press alt+F2 to go to the install CD console. This screen should say «press enter to use this console» or something like that. Press enter, and use the following commands:
Code:

# chroot ./target # grub-install /dev/cciss/c1d0
A large volume of text will scroll across the screen including a lot of what looks like bad errors — don’t worry, this is just grub polling devices that don’t exist. I think you can use something like «—no-floppy» to suppress those warnings, but don’t worry about it. The last message should be «Installation successful» or something like that — that is your indication that the grub-install succeeded.

Press alt+f1 to return to the install script and press enter to reboot the machine. Your ProLiant DL580 with p800 raid controller should now boot without an illegal opcode exception.

Источник

ProLiant Servers (ML,DL,SL)

- Forums
- - Advancing Life & Work
  - Alliances
  - Around the Storage Block
  - HPE Ezmeral: Uncut
  - OEM Solutions
  - Servers & Systems: The Right Compute
  - Tech Insights
  - The Cloud Experience Everywhere
  - HPE Blog, Austria, Germany & Switzerland
  - Blog HPE, France
  - HPE Blog, Italy
  - HPE Blog, Japan
  - HPE Blog, Latin America
  - HPE Blog, Poland
  - HPE Blog, Hungary
  - HPE Blog, Turkey
  - HPE Blog, UK, Ireland, Middle East & Africa
- Blogs
- Information
Forums

Blogs
- Advancing Life & Work
- Alliances
- Around the Storage Block
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Latin America
- HPE Blog, UK, Ireland, Middle East & Africa
- HPE Blog, Poland
- HPE Blog, Hungary
- HPE Blog, Turkey
Information
English

Источник

I was running ESXI 5.5 but this issue will affect most dl360p and/or dl380p, my ESXI instance is on my SD card, the server randomly threw an error which resulted in a purple error, rebooted the server and then this red error appeared. Every reboot, it came back, even a cold reboot.

I had this issue happen to me today, every reboot, it comes back.

I tried;

Cold boot.
Turn off for 5 minutes and turn back on.
Removing each ram stick and booting, one at a time
Disable ILO DHCP and TCPIP
Disabling ILO all together

The only way I got it working past the red illegal opcode error was to change the Boot Order so that USB DriveKey was at the top. C drive doesnt exist, CD-ROM is empty, don’t have Floppy Drive.

I seen this the first time and really thought that it would try 1, 2, 3, and then boot from 4, but it didn’t. I had to bring it to the top.

Источник

After using CDROM media to upgrade the first HP DL580G5 in an ESX cluster from ESX35u2 to ESX35u3, I get an illegal opcode on reboot.

I did not think to detach from the SAN during the upgrade (it’s an upgrade for goodness sake, why should I have to?), so perhaps the system reconfigured itself to use one of the SAN drives to load the system? After detaching from the SAN, and performing the upgrade again, it clearly writes the master boot record and upgrades the system on the correct local drive, but still RSOD’s on reboot.

Question: Is there an easy and safe way I can sneak in and fix this boot configuration? Is this the likely problem, or are there other possibilities?

I had used the same ESX35u3 media to upgrade a SAN-attached DL380G5 (not in an ESX cluster) with no problems, so was quite surprised and disappointed when this one failed.

I have cruised the forums for this, and there are hints of doing stuff to the QLA drivers from 2006 but no concrete pointers. If necessary, I’ll try a full reinstall, but that will be somewhat unpleasant.

Thanks!!!

Источник

i did an apt-get dist-upgrade on my HP Gen8 Microserver, it does not boot afterwards. (it must have updated to Debian 10 Buster, i am using the stable channel.)

The onboard-raid says. disk configured, but not present. obviously it cannot boot, if it does not find the disk

If i try to go to the RAID-Setup, it gives me what they call a Red Screen of Death, basicaly an error «illegal opcode» and a register dump.

The HDD is in the optical-Drive-Bay and configured in the RAID-tool as a single disk to make it selectable as a boot drive.

What did cause that?
Does the raid controller write something to the disk that might have been overwritten by the update?
May the Update have change some form of ID of the disk?
What’s wrong with the RAID-Setup-Thing?
Hints for fixing the issue?

Thanks in advance

Источник

В Enterprise системах надежность и минимизация времени простоя ставятся во главу угла. Кластерные системы это. конечно. замечательно, но в некоторых случаях использование кластеров невозможно. Это могут быть как софтверные ограничения, так и политика компании. Например, наши ДБА отказались от использования кластеров Oracle по причине некоторых ограничений в используемых приложениях (не совсем корректная работа с ДБ в режиме кластера).

Но что делать, если отдельный бокс (blade) вдруг поломается? Техника HP, конечно, весьма надежная, но, в моей практике был случай, когда блейд через почти два года аптайма повис и управлять им было невозможно до тех пор, пока его физически не вытащили из гнезда и не поставили на место. Все бы ничего, но до датацентра из самой ближней точки полтора часа на машине. Даунтайм совершенно неприемлемый.

Конечно, можно (и нужно) ставить «запасной» (standby) сервер. Вот только для его активации нужно время — перенастроить database, перенстроить приложения (а их очень немало).

И дело не только в аварии. Штатная операция по апгрейду (смене блейдов) выливается в массу затраченного времени, связанного с инсталляцией нового сервера. перенастройкой на него приложений, проверке всего и т.д. Идеально было бы просто переставить диски из старого сервера в новый, но так поступить нельзя по ряду причин (например, важная причина: старые диски уже отработали пару лет и неизвестно, сколько они проработают еще)

Ок, что же мы можем предпринять?
У нас есть SAN, который мы и можем использовать вместо локальных дисков! Сразу появляется (в моем окружении, о котором немного ниже) масса плюсов. Сервер (как логическая единица) становится не привязанным к физическому боксу!

Итак, немного об окружении (HW environment):

Имеем enclosure с установленными HP ProLiant BL465c G7 (blade), сетевая инфраструктура блейда обеспечивается модулями Flex-10. В блейде установлены FC адаптеры Emulex. Все это добро подключено к Gigabit Ethernet свичам по меди и Brocade свичам по оптике. В качестве стораджа используется SAN HP EVA.

Использование flex-10 значительно облегчает жизнь админу вообще и данную задачу в частности. Дело в том, что сетевые низкоуровневые настройки проводятся через интерфейс flex-10. Причем, что немаловажно, во-первых, настройки каждлого бокса «собраны» в специальный профайл, который можно легко и быстро присвоить (assign) любому боксу в enclosure, а, во-вторых, такие важные вещи как MAC адрес сетевых карт и WWPN FC адаптеров можно сделать виртуальными и хранить в профайле!
Т.е. «легким движением руки» мы можем переместить сервер с одного блейда на другой, сохранив полностью окружение!

Таким образом, в случае проблем с hardware блейда для рестарта сервера достаточно перенести профайл на другой блейд и загрузить его! Т.е. с точки зрения ОС и приложений — произойдет сброс по питанию, но перенастраивать ничего не нужно. Ну чем не кластер?

Но вернемся к нашей частной задаче. Нам необходимо настроить железо так, чтобы RHEL (а речь пойдет, разумеется, о RedHat Linux, как Enterprise решении) мог загрузиться с SAN. Т.е. как минимум root и boot разделы должны быть на внешнем сторадже. Еще важный момент — к стораджу имеется более одного пути, т.е. мы еще и будем использовать multipath (нам же надежность нужна!). В моем случае имеется 8 путей (4 порта EVA и два порта со стороны blade, всего восемь)

Настройка BIOS:

При загрузке сервера жмем F9 и попадаем в настройки BIOS. Заходим в Boot Controller order и выставляем первыми двумя устройствами Emulex карты. Это ОЧЕНЬ важная операция, т.к. иначе работать не будет! Я потратил день, пока сообразил, почему OS ставится, сторадж видится, с rescue грузится, но загрузчик (grub) крешится
Замечено, что данная настройка иногда может самопроизвольно сбрасываться на блейдах при переносе профайла. Поэтому, если при попытке загрузки бокса наблюдается «красный экран» Illegal opcode или сообщение о невозможности загрузиться, в первую очередь необходимо проверять порядок загрузки

Создание тома на SAN

Необходимо по стандартной процедуре выделить место (виртуальный диск) для инсталляции бокса на SAN. Настройку нужно провести ДО настройки BIOS Emulex адаптеров.
Важно!
У раздела, с которого будет бутиться ОС, должен быть самый малый LUN ID. Лучше всего LUN 1 (LUN 0 зарезервирован самим SAN стораджем).

Настройка Emulex FBA

При загрузке сервера в момент инициализации карт нажимаем Alt-E (согласно подсказки). Попадаем в BIOS Emulex карты. Обе карты настраиваются одинаково.
Заходим в меню первой карты.Видим картинку:

Заходим в первое меню и активируем Boot from SAN (enable)
Заходим в последнее меню Configure Advanced Adapter Parameters.

Выбираем Topology Selection

Выбираем Fabric Point to Point
Выходим из вложенных меню до момента выбора Emulex карт и повторяем все действия для второй карты.
Выходим из настроек карты и перегружаемся.

Внимание! Настройка еще не завершена!
Снова заходим в BIOS Emulex адаптера и выбираем карту (порт)
Теперь выбираем пункт Configure boot Devices. Видим картинку:

Восемь устройств (т.к. восемь путей к SAN)
Нам необходимо настроить только первые четыре, т.к. мы настраиваем одну карту и на четыре порта EVA. На второй нужно будет настроить тоже первые четыре. Итого будет восемь.
Этого достаточно, чтобы в случае проблем с одним-двумя-тремя и т.д. путями сервер все равно можно было бы загрузить.
Заходим в первый пункт. Следует не торопиться и подождать порядка минуты, пока адаптер просканирует доступные разделы на сторадже:

Видим тут четыре порта EVA. Выбираем нужный порт. Вполне логично для первого устройства выбрать первый порт, второго — второй и т.д.
Т.е. в данном случае выбираем первую запись
Подумав (секунд 20-30) карточка предложит выбрать номер LUN:

Выбор осуществляется стрелками курсора «вверх» и «вниз». Вот такая странность фирмвари, цифру там не введешь

Выставляем номер 1 (номер нашей LUN) и жмем Enter. Видим картинку:

Еще раз жмем Enter.

Выбираем Boot this device via WWPN. С первым boot device закончили. Осталось повторить это же для следующих трех.

А потом то же самое для второй карты (порта).

Да, долго получилось. Но кто сказал, что будет легко?

Установка OS

Производится стандартно за исключением одной тонкости. При старте инсталлятора обязательно нужно передать параметр mpath (либо в командной строке при установке с DVD либо в настройке pxeboot/tftpboot при использовании сетевой установки)

Дальше — простор для творчества! Пробуйте — и все получится!

Источник

HP ProLiant DL580 G5: Illegal Opcode on boot

Не пропустите эти материалы по теме: