Vmware проверка диска на ошибки - Не ошибается лишь тот, кто ничего не делает!

RSS

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

Источник

RSS

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

Источник

You would have come across a lot of instances of hard disk failures of your physical servers. It is necessary to identify the exact disk which is failed on the server. It can be easliy checked using hardware managenet tools like HP system Management, HP ILO or even in Hardware status tab of ESXi host from vSphere Client. This post talks about the checking the status of disk failures for esxi host command line utilities. In this post, i am going to discuss about the HP hardware’s and how to check the disk failures from command line in Hp hardware’s. This post will guide you step by step procedure to verify the disk status in ESXi host using HPSSACLI utility which is part of HP ESXi Utilities Offline bundle for VMware ESXi 5.x.

HP ESXi Utilities Offline bundle for VMware ESXi 5.x will be available as part of HP customized ESXi installer image but if it is not a HP customized ESXi image then you may need to download and install HP ESXi Utilities Offline bundle for VMware ESXi 5.x.This ZIP file contains 3 different utilities HPONCFG , HPBOOTCFG and HPSSACLI utilities for remote online configuration of servers.

HPONCFG — Command line utility used for obtaining and setting ProLiant iLO configurations.
HPBOOTCFG — Command line utility used for configuring ProLiant server boot order.
HPSSACLI – Command line utility used for configuration and diagnostics of ProLiant server SmartArrays.

You can download and install HP ESXi utilities offline bundle for ESXi 5.X using below command

esxcli software vib install -f -v /tmp/hp-esxi5.5uX-bundle-1.7-13.zip

You can even directly donwload HPSSACLI utility and Upload the VIB file into your ESXi host and execute the below command to install the HPACUCLI utility.

esxcli software vib install -f -v /tmp/hpssacli-1.60.17.0-5.5.0.vib

Once it is installed. Browse towards the directory /opt/hp/hpssacli/bin and verify the installation.

Check the Disk Failure Status:

Type the below command to check the status of Disks in your ESXi host. It displays the status of the Disk in All Arrays under the Controller.

/opt/hp/hpssacli/bin/hpssacli controller slot=0 physicaldrive all show

Thats it. We identified the disk failure, You may need to generate the HP ADU (Array Diagnostics Utility) report to raise the support case with hardware vendor. Please refer my blog post “How to Generate HP ADU Disk Report in ESXi host” to understand the step by step guide to generate ADU report from ESXi host command line. I hope this is informative for you. Thanks for Reading!!!. Be Social and Share it in Social media, if you feel worth sharing it.

Источник

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

17 Replies

Is this on a SAN/NAS or a local disk on the ESXi Server?

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

What’s running this ESXi Host?

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

This is on a local RAID array on the ESXi server.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

ESXi is running on a dell power edge 2950 server.

Was this post helpful?
thumb_up
thumb_down
Replicate your data and replace your array. As far as I know, VMFS does it’s own housekeeping and there is no way to force a disk check on the VMFS level. An NTFS chkdisk will only be so effective. Because, as you said, it’s sitting on top of VMFS.

Whenever you suspect a bad block on disk in a production environment, it’s always better to replace first ask questions later.

And I would also advise to stay away from RAID 5 if that is what you are using currently:

RAID 5 vs RAID 10 Opens a new window

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Was this post helpful?
thumb_up
thumb_down
Is the hardware under any type of warranty? If so, you can probably get it replaced on that error by talking to a support person. I’ve done it — as far as seeing which one is bad, you will need to go into the RAID controller software.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Yes, the server is under warranty. Ok, I’ll see if Dell will replace the drive. Thanks

Was this post helpful?
thumb_up
thumb_down
Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

Was this post helpful?
thumb_up
thumb_down
Scott does make a point. Can you not see from the health status in vCenter which disk? You still may need the OMSA to rebuild your array and it’s nice to have available.

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

Scott Alan Miller wrote:

Josh@Acts360 wrote:

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

ESXi should be able to tell you (Though I’ve got the «Dell customized» version of ESXi installed, you can get it of vmware’s site.

attach_file
Attachment

vcenterstorage.PNG
112 KB

Was this post helpful?
thumb_up
thumb_down
Yeah, Jaguar nailed it. You should, at minimum, be able to see the state of your storage in vCenter. At that point you can identify the drive. Having OMSA on your host just makes it easier to perform some of your functions, like storage configurations and changes, without having to reboot and go through the bios to get to it.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

No Vcenter all I have is the free stuff. I can see some information listed. It actually shows that there is no problem with the system… I also have an ISCSI device connected so maybe that device is the one throwing the errors.

Oh, looks like that is it…I just looked and the event logs and I see the hard disk is disk 1 which points to the ISCSI disk. I updated the firmware on this device (which is a Synology disk station 1010+) and this fixed the issue with this device.

Log Name:      System
Source:        disk
Date:          9/17/2010 9:11:31 AM
Event ID:      51
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      FBLDC.fbdomain.local
Description:
An error was detected on device DeviceHarddisk1DR4 during a paging operation.
Event Xml:
<Event xmlns=»http://schemas.microsoft.com/win/2004/08/events/event Opens a new window«>
<System>
    <Provider Name=»disk» />
    <EventID Qualifiers=»32772″>51</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime=»2010-09-17T13:11:31.421Z» />
    <EventRecordID>177451</EventRecordID>
    <Channel>System</Channel>
    <Computer>FBLDC.fbdomain.local</Computer>
    <Security />
</System>
<EventData>
    <Data>DeviceHarddisk1DR4</Data>
    <Binary>030080000100000000000000330004802D0100000E0000C0000000000000000000000000000000006262170000000000FFFFFFFF010000005800002100000000BB20101242032040001000003C0000000000000000000000789BF70C80FAFFFF0000000000000000909B010A80FAFFFF0000000000000000E807640000000000880000000000006407E8000000080000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>

attach_file
Attachment

VMWare.png
9.18 KB

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Turns out that if I had just looked at the event log closer I would have noticed that the drive that the event refered to was pointing to my ISCSI, which was offline…

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

should have said it was an iSCSI

Glad you got it fixed.

Was this post helpful?
thumb_up
thumb_down

HPONCFG — Command line utility used for obtaining and setting ProLiant iLO configurations.
HPBOOTCFG — Command line utility used for configuring ProLiant server boot order.
HPSSACLI – Command line utility used for configuration and diagnostics of ProLiant server SmartArrays.

You can download and install HP ESXi utilities offline bundle for ESXi 5.X using below command

esxcli software vib install -f -v /tmp/hp-esxi5.5uX-bundle-1.7-13.zip

You can even directly donwload HPSSACLI utility and Upload the VIB file into your ESXi host and execute the below command to install the HPACUCLI utility.

esxcli software vib install -f -v /tmp/hpssacli-1.60.17.0-5.5.0.vib

Once it is installed. Browse towards the directory /opt/hp/hpssacli/bin and verify the installation.

Check the Disk Failure Status:

Type the below command to check the status of Disks in your ESXi host. It displays the status of the Disk in All Arrays under the Controller.

/opt/hp/hpssacli/bin/hpssacli controller slot=0 physicaldrive all show

Since ESXi 5.1 it is possible to check VMFS for metadata inconsistency with a tool called VOMA (VMware Ondisk Metadata Analyser). With VOMA you can check VMFS3 and VMFS5 datastores.

Please note, that the tool can only identify problems, as it runs in a read-only mode. So it does not help you to fix detected errors.

Reasons to use VOMA:

occurrence of metadata errors in the vmkernel log
if you experience SAN outages
after rebuilding a RAID
if you cannot modify, erase or access files on a VMFS datastore, that is not in use by another host

Before you start VOMA from the CLI of your ESXi host, take care of the following guidelines:

Shut down all virtual machines running on the VMFS datastore (or migrate them)
make sure that the VMFS volume is not in use by other hosts (best practice: unmount the datastore on the other hosts)
make sure that the datastore is not in use by vSphere HA for heartbeating
make sure that the datastore is not in use by other features like Storage I/O control,…
make sure that the volume is not a multi-extent volume

Now log on to your ESXi host and let’s take a look at the available parameters of VOMA (voma -h)

First, you need to know the path to the partition (naa.xxxxxx:1). Run the following command to display a list with Volume Name, VMFS UUID and Device Name:

esxcli storage vmfs extent list

The output should be simular like this:

If we want to scan VMLUN_01 we have to combine the Device Name (naa.60a98000646e6c…) and the partition number (1) with a “:”.

voma -m vmfs -f check -d /vmfs/devices/disks/naa.60a98000646e6c50566f6a6c6a683164:1

If VOMA runs successfully, you should see something like this:

Checking if device is actively used by other hosts

Running VMFS Checker version 0.9 in check mode

Initializing LVM metadata, Basic Checks will be done

Phase 1: Checking VMFS header and resource files

Detected file system (labeled:’VMLUN_01′) with UUID:4fa227b8-8d16cdf3-4816-984be103b9a0, Version 5:54

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

Total Errors Found: 0

What should I do if VOMA detects an error?

The tool can only find errors – but not fix them. So if VOMA detects any errors, please consult VMware support for further help.

Possible reasons/messages for stopping the VOMA scan:

If there is activity on the datastore you try to scan with VOMA, you will see the following output:

Found 1 actively heartbeating hosts on device / 1): MAC address xx:xx:xx:xx:xx:xx

VOMA stops the scan, as there is activity on the VMFS filesystem. The MAC address indicates the management interface of the ESXi host causing the activity.

Reasons for this can be:

a running VM on the scanned datastore
other hosts are accessing the datastore
vSphere HA is using the datastore for heartbeating
Storage I/O Control is turned on

Здравствуйте!
Увидел в логах ошибки

Device tlO.ATA_TO5HIBA_DT01ACA200_X3FBT5LK5 performance has deteriorated. I/O latency increased from average value of 2410 microseconds to 78012 microseconds.

И задумался о состоянии жестких дисков, но в разделе Health Status не отображаются HDD.
На сколько я знаю, это связано с тем, что нету драйвера для этого контроллера.
И я где-то читал, что его нужно скачать у вендора и установить .
Но мои поиски не увенчались успехом.
ESXi 5.5.0, 1331820
Материнская плата P8B-M с контроллером intel c204.
Возможно есть другие способы посмотреть SMART?

з.ы. знаю что ESXi не работает с программный рейдом, но и не стоит задачи его использовать, контроллер работает в режиме SATA

з.ы.ы.
Подключился по ssh, и с помощью скрипта получил вот это:

/usr/lib/vmware/vm-support/bin # ./smartinfo.sh
SMART Information for disks.

Device:  t10.ATA_____TOSHIBA_DT01ACA200_________________________________X3FBTSLKS
Parameter                     Value  Threshold  Worst
-----------------------------------------------------
Health Status                 OK     N/A        N/A
Media Wearout Indicator       N/A    N/A        N/A
Write Error Count             N/A    N/A        N/A
Read Error Count              100    16         100
Power-on Hours                99     0          99
Power Cycle Count             100    0          100
Reallocated Sector Count      100    5          100
Raw Read Error Rate           100    16         100
Drive Temperature             142    0          142
Driver Rated Max Temperature  N/A    N/A        N/A
Write Sectors TOT Count       200    0          200
Read Sectors TOT Count        N/A    N/A        N/A
Initial Bad Block Count       N/A    N/A        N/A


Device:  t10.ATA_____TOSHIBA_DT01ACA200_________________________________739NE17KS
Parameter                     Value  Threshold  Worst
-----------------------------------------------------
Health Status                 OK     N/A        N/A
Media Wearout Indicator       N/A    N/A        N/A
Write Error Count             N/A    N/A        N/A
Read Error Count              100    16         100
Power-on Hours                99     0          99
Power Cycle Count             100    0          100
Reallocated Sector Count      100    5          100
Raw Read Error Rate           100    16         100
Drive Temperature             142    0          142
Driver Rated Max Temperature  N/A    N/A        N/A
Write Sectors TOT Count       200    0          200
Read Sectors TOT Count        N/A    N/A        N/A
Initial Bad Block Count       N/A    N/A        N/A

date 13.04.2022

user

Справочный список полезных консольных команд VMWare ESXi (в том числе ESXCLI), которые часто используется при траблшутинге и тонкой настройке гипервизора. По мере необходимости список команд будет расширяться и обновляется.

Доступные команды консоли ESXi можно посмотреть в каталоге /usr/sbin.

cd /usr/sbin ls

Совет. Обратите внимание, что все команды esxi регистрозависимы.

Полный список команд esxcli можно вывести с помощью команды:

esxcli esxcli command list

Для начала команды ESXi, которые вы можете выполнять через ssh доступ.

reboot
— перезагрузить хост
poweroff
— выключить хост
esxcli system version get
— узнать версию (номер) инсталлированной версии VMware ESXi
uname -a
— так же узнать версию VMware ESXi

vmware –vl
– и еще один способ узнать версию и релиз VMware ESXi

esxcli hardware pci list | more
— полная информация об установленных PCI устройствах
lspci
— краткая информация обо всех установленных PCI устройствах
esxtop
— диспетчер процессов top для vmware esxi (быстрые клавиши для переключения дисплея:
c
:cpu,
i
:interrupt,
m
:memory,
n
:network,
d
:disk adapter,
u
:disk device,
v
:disk VM,
p
:power mgmt)
vmkerrcode -l
— расшифровка кодов ошибок
esxcfg-nics -l
— информация о сетевых картах
esxcfg-vswitch -l
— информация о виртуальных коммутаторах
find . -name libstorelib.so
— найти файл libstorelib.so

dcui
— работа с консолью сервера через ssh сессию
chkconfig -l
— статус работы демонов
esxcli hardware memory get
— размер установленной памяти
esxcli software vib list
— список установленных vib-пакетов
esxcli network ip connection list
— состояние активных соединений (аналог netstat)
esxcli storage vmfs extent list
— информация о примонтированных/подключенных томах VMFS
esxcli hardware clock (get/set)
— отображение/установка времени esxi-хоста
cd -
Смена текущей директории;
cp -
Копирование файла.cp [файл 1] [файл2];
find -
Поиск файлов по критериям;
ls -
Список файлов и директорий в текущей или явно указанной директории.ls /vmfs/volumes/ ключи: -l подробная информация -a отображение скрытых файлов;
mkdir
— Создание директории;
mv
— Перемещение файла. Переименование файла.mv [путь и имя файла] [путь, куда перемещать];
ps
— Информация о запущенных процессах. ps -ef;
rm -
Удаление файлов;
shutdown
— Выключение или перезагрузка сервера shutdown nowshutdown –r now;
vi
— Текстовый редактор;
nano
— Дружелюбный к новичкам текстовый редактор, отсутствует на ESXi;
cat
— Вывод содержимого файла на экран. cat /etc/hosts;
more
— Вывод содержимого файла на экран, по странице за раз. more /etc/hosts;
man
— Справка по командам man <команда, по которой есть вопрос>, для некоторых команд помощь выводится при запуске самой команды без параметров;
useradd
— Создание пользователя. useradd <имя пользователя>;
passwd -
Задание пароля пользователю passwd <имя пользователя>;
esxcli storage nfs list
— список подключеных nfs- хранлилищ на хосте
esxcli software vib list
— cписок установленных vib-пакетов
esxcli hardware memory get
— информация об использовании памяти на хосте ESXi, включая общий объем RAM
esxcli hardware cpu list
— информация о количестве процессоров на хосте ESXi
esxli iscsi adapter list
— список iSCSI-адаптеров и их имена
esxcli network nic list
— список сетевых адаптеров
esxcli network ip interface list
— Информация об IP-интерфейсах хоста
esxcli network ip dns search list
— Информация о настройках DNS
ist
— Состояние активных соединений (аналог netstat)
network neighbors list
— #Вывод ARP-таблицы
esxcli network firewall get
esxcli network firewall ruleset list
— Состояние сетевого экрана (файервола) ESXi и активные правила для портов и сервисов;
esxcli storage vmfs extent list
— Информация о VMFS разделах, подключенных к хосту
esxcli storage filesystem list
— Мапинг VMFS-томов к устройствам
esxcli storage core path list
esxcli storage core device list
— Вывод информации о путях и устройствах Fibre Channel (FC)
esxcli storage core plugin list
— Список плагинов NMP, загруженных в систему
esxcli storage core adapter rescan
– Выполнить рескан HBA-адаптеров
esxcli vm process list
— получаем ID виртуальной машины
esxcli vm process kill --type=[soft,hard,force] --world-id=WorldID
убиваем процесс виртуальной машины ID (помогает от зависших и не отвечающих в vSphere Client ВМ)
esxcli system welcomemsg get
esxcli system welcomemsg set
— Получить текст и изменить приветственное сообщение ESXi
esxcli system settings advanced list | grep smth
— Поискать что-нибудь в Advanced Settings хоста
esxcli hardware clock get
— Текущее аппаратное время хоста
esxcli hardware bootdevice list
— Порядок загрузки с устройств
esxcli hardware pci list
— Список PCI-устройств
esxcli iscsi adapter discovery rediscover
— Сканирование iSCSI-адаптеров
esxcli storage core adapter rescan [-A | -all]
— Рескан iSCSI

Команды для работы с виртуальными машинами:

vim-cmd vmsvc/getallvms
— вывод информации обо всех VM
vim-cmd vmsvc/power.getstate 1
— включена/выключена VM с Vmid 1
vim-cmd vmsvc/power.on 1
— включить VM с Vmid 1
vim-cmd vmsvc/power.off 1
— выключить (по питанию) VM с Vmid 1
vim-cmd vmsvc/power.reset 1
— перезагрузка (аналогично нажатию клавиши RESET на реальном сервере) VM с Vmid 1
vim-cmd vmsvc/power.shutdown 1
— корректное выключение VM с Vmid 1. Действует только, если установлены VMware Tools!
vim-cmd vmsvc/power.reboot 1
— перезагрузка VM с Vmid 1. Действует только, если установлены VMware Tools!
vim-cmd vmsvc/get.summary 1
— получение полной информации о VM с Vmid 1.
vim-cmd vmsvc/get.summary 1 | egrep ‘(name|power|ip)’
— получение отфильтрованной информации о VM с Vmid 1. Выводится имя, состояние питания, IP-адрес

vim-cmd vmsvc

Набрав эту команду, вы увидите все возможные варианты ее использования. Ниже список команд, которые мне показались полезными:

vim-cmd vmsvc/power.getstate <vmid>
статус питания виртуальной машины с указанным ID. Увидеть список ВМ и их ID вы можете при помощи команды;
vim-cmd vmsvc/getallvms
— Выключить питание виртуальной машины;
vim-cmd vmsvc/power.off vmid
— Включить питание виртуальной машины;
vim-cmd vmsvc/power.on vmid
— Перезагрузить виртуальную машину;
vim-cmd vmsvc/power.reboot vmid
— Удалить файлы виртуальной машины;
vim-cmd vmsvc/destroy vmid
— Удалить файлы виртуальной машины;
vim-cmd vmsvc/power.shutdown
<vmid> — Выключение виртуальной машины (shutdown guest);
vim-cmd vmsvc/power.reset <vmid>
— Перезагрузка виртуальной машины;
vim-cmd vmsvc/get.summary <vmid>
— Общая информация о виртуальной машине;
vim-cmd solo/registervm /vmfs/vol/datastore/dir/vm.vmx
— Подключить виртуальную машину;
vim-cmd vmsvc/unregister vmid
— Убрать виртуальную машину из гипервизора;
vim-cmd vmsvc/tools.install vmid
— Установка vmware tools;
vim-cmd hostsvc/net/info
— информация о сети гипервизора;
vim-cmd hostsvc/maintenance_mode_enter
— Переключить хост в режим обслуживания;
vim-cmd hostsvc/maintenance_mode_exit
— Выйти из режима обслуживания;
chkconfig -l
— Показать службы запущенные на гипервизоре;
esxtop
— Список процессов;
vmkerrcode -l
— посмотреть vmkernel ошибки;
esxcfg-info
— Посмотреть информацию о хосте;
esxcfg-nics -l
— Посмотреть информацию о сетевых адаптерах;
esxcfg-vswitch -l
— Посмотреть информацию о виртуальных сетевых адаптерах;
dcui
— Стартовая консоль ESXI по ssh;
vsish
— Vmware интерактивная консоль;
cat /etc/chkconfig.db
— посмотреть состояние сервисов на хосте;
/sbin/services.sh restart
— перезагрузить все сервисы на хосте;
vmkload_mod --list
— Показать загруженные драйвера;
vmkload_mod -s /mod/your_driver
— Показать параметры драйверов;
vmkfstools -i /vmfs/volumes/san_vmfs/my_vm/large_disk.vmdk -d thin /vmfs/volumes/san_vmfs/my_vm/new_thin_disk.vmdk
— Конвертировать существующий диск в thin формат;

The procedure below documents the commands necessary to run a check of the system partitions of ESXi. The below image shows the output of fdisk -l and the partitions which will be checked are circled. The 2 partitions consisting of 49136 blocks are the Hypervisor1 and Hypervisor2 partitions. These are mounted by ESXi as /bootbank and /altbootbank and store the firmware which ESXi boots with. A system backup file state.tgz (local.tgz for ESXi Embedded) is also stored on these partitions.

ESXi will read /bootbank when booting and then will backup it’s configuration once per hour. The last partition consisting of 552944 blocks is Hypervisor3 and is mounted as /store by ESXi. This partition is used to store items like download files for the VI client, VMware Tools ISOs for VMs, and configuration and system files for the vCenter Server agent and the HA agent.

The last partition circle first below will only exist with ESXi Installable. This partition is mounted as /scratch and is where ESXi will place the userworld swap file. This partition will correspond to the location set by the Advanced Setting: ScratchConfig.ConfiguredScratchLocation.

While not necessary for this procedure, you can use the commands esxcfg-vmhbadevs and ls to link the partitions shown by fdisk to the mounts ESXi has made to determine which partition is /altbootbank and which is /bootbank.

~ # esxcfg-vmhbadevs -f

[2009-03-19 01:19:03 ‘StorageInfo’ warning] Skipping dir: /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4. Cannot open volume: /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4
vmhba1:0:0:8 /vmfs/devices/disks/vmhba1:0:0:8 e0a264ee-3bc421b8-cdd5-3a5cb7c2a09f
vmhba1:0:0:2 /vmfs/devices/disks/vmhba1:0:0:2 488fb202-34873070-edd2-00096b63ac0a
vmhba1:0:0:5 /vmfs/devices/disks/vmhba1:0:0:5 9820ef76-fed75a33-f596-a0e3aa642c3a
~ # ls -l | grep vmfs

l——— 0 root root 1984 Jan 1 1970 altbootbank -> /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4
l——— 0 root root 1984 Jan 1 1970 bootbank -> /vmfs/volumes/9820ef76-fed75a33-f596-a0e3aa642c3a
l——— 0 root root 1984 Jan 1 1970 scratch -> /vmfs/volumes/488fb202-34873070-edd2-00096b63ac0a
l——— 0 root root 1984 Jan 1 1970 store -> /vmfs/volumes/e0a264ee-3bc421b8-cdd5-3a5cb7c2a09f
drwxr-xr-x 1 root root 512 Jan 9 02:35 vmfs

Once you have identified the partitions to check you can use the dosfsck command to check a partition. The command has a number of options, but you must at least specify the disk to check. The first example also includes the -v option with provides verbose output. The -a option will automatically try to correct any issues.

dosfsck -v /dev/disks/vmhba1:0:0:5

dosfsck 2.11 (12 Mar 2005)
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "mkdosfs"
Media byte 0xf8 (hard disk)
512 bytes per logical sector
1024 bytes per cluster
2 reserved sectors
First FAT starts at byte 1024 (sector 2)
2 FATs, 16 bit entries
98304 bytes per FAT (= 192 sectors)
Root directory starts at byte 197632 (sector 386)
512 root directory entries
Data area starts at byte 214016 (sector 418)
48927 data clusters (50101248 bytes)
32 sectors/track, 64 heads
0 hidden sectors
98272 sectors total
Checking for unused clusters.
/dev/disks/vmhba1:0:0:5: 10 files, 37485/48927 clusters

You can also use the -V option to run a verification pass of a partition or the -t option to test for bad sectors (this also requires the -a (automatically repair) or -r (interactively repair) options).

dosfsck -t -r /dev/disks/vmhba1:0:0:2

dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Seek to 2147491840:Success

dosfsck -V /dev/disks/vmhba1:0:0:2

dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Starting check/repair pass.
Starting verification pass.
/dev/disks/vmhba1:0:0:2: 8 files, 16390/65515 clusters

All the options for the command dosfsck are shown below.

dosfsck
usage: dosfsck [-aAflrtvVwy] [-d path -d ...] [-u path -u ...]
device
-a automatically repair the file system
-A toggle Atari file system format
-d path drop that file
-f salvage unused chains to files
-l list path names
-n no-op, check non-interactively without changing
-r interactively repair the file system
-t test for bad clusters
-u path try to undelete that (non-directory) file
-v verbose mode
-V perform a verification pass
-w write changes to disk immediately
-y same as -a, for compat with other *fsck

Bits & Bytes
Virtualized Computing

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

ESXi chkdsk equivalent?

Thread starter
Zarathustra[H]
Start date
Oct 4, 2012

Joined: Oct 29, 2000

Messages: 35,405

Hey all,

Had an odd issue on my ESXi 5.1 server yesterday.

I had 4 guests running (Ubuntu 12.04 headless server, pfSense, Windows XP and Windows Vista.)

Suddenly performance became terrible. The client became mostly unresponsive, but would respond in fits and starts. I was able to reboot my Ubuntu guest, and upon boot it was complaining about read errors from /sda, which is just a standard vmware image file on my datastore. (an SSD where all my guest images are stored)

Uh oh, I thought, I must have a failing drive. At this point I had lost the ability to control the server via the Vsphere client and decided to reboot the server by either plugging in a keyboard locally and forcing a restart or just cutting power to it, and starting it back up again.

When the server started back up again, everything worked like normal again. There were no signs of the drive read issues I had had prior to the reboot. All of my guests, except my Vista guest, booted back up normally again.

My Vista guest on the other hand was borked.

The Vista boot loaded could not find an OS to boot, and booting from the Vista CD to do a repair did not help either. It’s like the Vista partition inside the Vmware image file just decided to empty itself.

So, not sure what caused all this, but I suspect a drive problem.

Is there a good way to run some sort of disk diagnostic on datastores within ESXi? How do I go about doing that?

Thanks,
Matt

Hagrid

Joined: Nov 23, 2006

Messages: 9,157

Can you boot a linux cd and run a disk utility from it?

Joined: Oct 29, 2000

Messages: 35,405

Can you boot a linux cd and run a disk utility from it?

I could.

What file system does ESXi format it’s local drugstores in?

dasaint

Joined: Jun 1, 2002

Messages: 1,715

the Filesystem is VMFS, and as far as i am aware there are no such scandisk type tools…

as a backup i would consider migrating the VM data ASAP and then doing HDD Tests with the Company Software you could use ESXtop to see if the drive is acting abnormally but i would also caution that if the drive is going bad and is causing data corruption the first key is to get those vms safe asap…

Joined: Nov 9, 2005

Messages: 2,897

Are you certain you were not over-committed in resources (i.e. have 10GB allocated to VMs and being used, where as host only has 8GB as an example)? What do your log files show you?

Joined: Oct 11, 2001

Messages: 33,250

5.1?

Run:
voma -m vmfs -f check -d /vmfs/volumes/NAA_OF_YOUR_DATASTORE:YOUR_PARTITION_NUMBER

Paste output here.

esxcfg-scsidevs -m will identify the naa -> datastorename mappings.

Joined: Oct 29, 2000

Messages: 35,405

5.1?

Run:
voma -m vmfs -f check -d /vmfs/volumes/NAA_OF_YOUR_DATASTORE:YOUR_PARTITION_NUMBER

Paste output here.

esxcfg-scsidevs -m will identify the naa -> datastorename mappings.

Thank you.

I’ll try this.

While I am rather experienced in unix-like consoles, ESXi seems to have changed a lot, and I haven’t had the opportunity to play around in there yet.

dasaint

Joined: Jun 1, 2002

Messages: 1,715

Oh LOP go enjoy Barcelona!!! TRADE???

Joined: Oct 29, 2000

Messages: 35,405

5.1?

Run:
voma -m vmfs -f check -d /vmfs/volumes/NAA_OF_YOUR_DATASTORE:YOUR_PARTITION_NUMBER

Paste output here.

esxcfg-scsidevs -m will identify the naa -> datastorename mappings.

Question,

Do I have to enter Maintenance mode, or shut down my guests on the drive in order to do this, or can I run it with everything on?

dasaint

Joined: Jun 1, 2002

Messages: 1,715

VOMA – vSphere On-disk Metadata Analyzer

VOMA is a new customer facing metadata consistency checker tool, which is run from the CLI of ESXi 5.1 hosts. It checks both the Logical Volume Manager (LVM) and VMFS for issues. It works on both VMFS-3 & VMFS-5 datastores. It runs in a check-only (read-only) mode and will not change any of the metadata. There are a number of very important guidelines around using the tool. For instance, VMFS volumes must not have any running VMs if you want to run VOMA. VOMA will check for this and will report back if there are any local and/or remote running VMs. The VMFS volumes can be mounted or unmounted when you run VOMA, but you should not analyze the VMFS volume if it is in use by other hosts.

If you find yourself in the unfortunately position that you suspect that you may have data corruption on your VMFS volume, prepare to do a restore from backup, or look to engage with a 3rd party data recovery organization if you do not have backups. VMware support will be able to help in diagnosing the severity of any suspected corruption issues, but they are under no obligation to recover your data.

I’m sure you will agree that this is indeed a very nice tool to have at your disposal.

See http://cormachogan.com/2012/09/04/vsphere-5-1-storage-enhancements-part-1-vmfs-5/

Joined: Oct 11, 2001

Messages: 33,250

Zarathustra[H];1039205238 said:

Question,

Do I have to enter Maintenance mode, or shut down my guests on the drive in order to do this, or can I run it with everything on?

All guests shut down > not. You’ll get a LOT of spurious messages if they’re up and running, but I know to discount those.

Joined: Oct 11, 2001

Messages: 33,250

VOMA vSphere On-disk Metadata Analyzer

VOMA is a new customer facing metadata consistency checker tool, which is run from the CLI of ESXi 5.1 hosts. It checks both the Logical Volume Manager (LVM) and VMFS for issues. It works on both VMFS-3 & VMFS-5 datastores. It runs in a check-only (read-only) mode and will not change any of the metadata. There are a number of very important guidelines around using the tool. For instance, VMFS volumes must not have any running VMs if you want to run VOMA. VOMA will check for this and will report back if there are any local and/or remote running VMs. The VMFS volumes can be mounted or unmounted when you run VOMA, but you should not analyze the VMFS volume if it is in use by other hosts.

If you find yourself in the unfortunately position that you suspect that you may have data corruption on your VMFS volume, prepare to do a restore from backup, or look to engage with a 3rd party data recovery organization if you do not have backups. VMware support will be able to help in diagnosing the severity of any suspected corruption issues, but they are under no obligation to recover your data.

Im sure you will agree that this is indeed a very nice tool to have at your disposal.

See http://cormachogan.com/2012/09/04/vsphere-5-1-storage-enhancements-part-1-vmfs-5/

Forgot — we added the flag to only allow if quiesced.

Joined: Oct 29, 2000

Messages: 35,405

Awesome.

Didn’t even need to look up NAA’s.

ESXi was kind enough to add a symlink with the configured datastore name in /vmfs/volumes pointing to it.

I am getting an error message when running this command though:

 # voma -m vmfs -f check -d /vmfs/volumes/505d81bc-9505cd49-2010-6805ca018ab0
Checking if device is actively used by other hosts
         ERROR: Failed to check for heartbeating hosts on device'/vmfs/volumes/505d81bc-9505cd49-2010-6805ca018ab0'

Advice?

I have two drives in the system, and get the same error on both…

Thanks,
Matt

Joined: Oct 29, 2000

Messages: 35,405

Zarathustra[H];1039268053 said:
Awesome.

Didn’t even need to look up NAA’s.

ESXi was kind enough to add a symlink with the configured datastore name in /vmfs/volumes pointing to it.

I am getting an error message when running this command though:
 # voma -m vmfs -f check -d /vmfs/volumes/505d81bc-9505cd49-2010-6805ca018ab0
Checking if device is actively used by other hosts
         ERROR: Failed to check for heartbeating hosts on device'/vmfs/volumes/505d81bc-9505cd49-2010-6805ca018ab0'
Advice?

I have two drives in the system, and get the same error on both…

Thanks,
Matt

FWIW, Rebooting the server or entering maintenance mode does not appear to ahve any effect.

I get the same error message no matter what.

Joined: Oct 11, 2001

Messages: 33,250

Any other hosts talking to that lun?

edit:
Oh, you’re running it on the uuid. You have to run it on /vmfs/devices/disks/naa:1

Bits & Bytes
Virtualized Computing

Источник

17 Replies

Is this on a SAN/NAS or a local disk on the ESXi Server?

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

What’s running this ESXi Host?

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

This is on a local RAID array on the ESXi server.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

ESXi is running on a dell power edge 2950 server.

Was this post helpful?
thumb_up
thumb_down
Replicate your data and replace your array. As far as I know, VMFS does it’s own housekeeping and there is no way to force a disk check on the VMFS level. An NTFS chkdisk will only be so effective. Because, as you said, it’s sitting on top of VMFS.

Whenever you suspect a bad block on disk in a production environment, it’s always better to replace first ask questions later.

And I would also advise to stay away from RAID 5 if that is what you are using currently:

RAID 5 vs RAID 10 Opens a new window

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Was this post helpful?
thumb_up
thumb_down
Is the hardware under any type of warranty? If so, you can probably get it replaced on that error by talking to a support person. I’ve done it — as far as seeing which one is bad, you will need to go into the RAID controller software.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Yes, the server is under warranty. Ok, I’ll see if Dell will replace the drive. Thanks

Was this post helpful?
thumb_up
thumb_down
Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

Was this post helpful?
thumb_up
thumb_down
Scott does make a point. Can you not see from the health status in vCenter which disk? You still may need the OMSA to rebuild your array and it’s nice to have available.

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

Scott Alan Miller wrote:

Josh@Acts360 wrote:

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

ESXi should be able to tell you (Though I’ve got the «Dell customized» version of ESXi installed, you can get it of vmware’s site.

attach_file
Attachment

vcenterstorage.PNG
112 KB

Was this post helpful?
thumb_up
thumb_down
Yeah, Jaguar nailed it. You should, at minimum, be able to see the state of your storage in vCenter. At that point you can identify the drive. Having OMSA on your host just makes it easier to perform some of your functions, like storage configurations and changes, without having to reboot and go through the bios to get to it.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

No Vcenter all I have is the free stuff. I can see some information listed. It actually shows that there is no problem with the system… I also have an ISCSI device connected so maybe that device is the one throwing the errors.

Oh, looks like that is it…I just looked and the event logs and I see the hard disk is disk 1 which points to the ISCSI disk. I updated the firmware on this device (which is a Synology disk station 1010+) and this fixed the issue with this device.

Log Name:      System
Source:        disk
Date:          9/17/2010 9:11:31 AM
Event ID:      51
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      FBLDC.fbdomain.local
Description:
An error was detected on device DeviceHarddisk1DR4 during a paging operation.
Event Xml:
<Event xmlns=»http://schemas.microsoft.com/win/2004/08/events/event Opens a new window«>
<System>
    <Provider Name=»disk» />
    <EventID Qualifiers=»32772″>51</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime=»2010-09-17T13:11:31.421Z» />
    <EventRecordID>177451</EventRecordID>
    <Channel>System</Channel>
    <Computer>FBLDC.fbdomain.local</Computer>
    <Security />
</System>
<EventData>
    <Data>DeviceHarddisk1DR4</Data>
    <Binary>030080000100000000000000330004802D0100000E0000C0000000000000000000000000000000006262170000000000FFFFFFFF010000005800002100000000BB20101242032040001000003C0000000000000000000000789BF70C80FAFFFF0000000000000000909B010A80FAFFFF0000000000000000E807640000000000880000000000006407E8000000080000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>

attach_file
Attachment

VMWare.png
9.18 KB

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Turns out that if I had just looked at the event log closer I would have noticed that the drive that the event refered to was pointing to my ISCSI, which was offline…

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

should have said it was an iSCSI

Glad you got it fixed.

Was this post helpful?
thumb_up
thumb_down

Источник

Check the Disk Failure Status:

17 Replies

Check the Disk Failure Status:

ESXi chkdsk equivalent?

17 Replies

Не пропустите эти материалы по теме: