Ошибка phy sata - Не ошибается лишь тот, кто ничего не делает!

Куратор(ы):

Автор

Сообщение

Добавлено: 05.09.2005 22:35

[профиль]

Member

Статус: Не в сети
Регистрация: 06.05.2005
Откуда: Moldova

ПРОСЯ О ПОМОЩИ, ВЫКЛАДЫВАЙТЕ S.M.A.R.T. ПРОБЛЕМНОГО НАКОПИТЕЛЯ!

Его можно посмотреть программами Everest, AIDA 64, Victoria 4.x, Dtemp, HDDScan, HD Tune, Crystal Disk Info, SpeedFan… Обращайте внимание на DATA/RAW-параметры, это главные и основные показатели здоровья диска.

>>>При использовании Crystal Disk Info в меню Сервис>Дополнительно>Raw-значения выберите вариант «10 [DEC]» это несколько упростит восприятие информации утилиты форумчанами.<<<

<<Скриншоты>>

При выкладке скриншотов не забываем ограничения накладываемы пунктом 3.12 правил конференции. А именно: «Размещать в тегах «Img» картинки объемом свыше 500 кБ на сообщение. Допускаются картинки до 2 МБ под тегом «spoiler=«, а также прямые ссылки на картинки любого размера. Ссылки на страницы, где картинка отображается среди рекламы, запрещены, применяющие их сайты блокируются автоцензором.»
Немного о том, как ПРАВИЛЬНО создавать скриншоты для выкладки в форуме: http://forums.overclockers.ru/viewtopic … 4&t=373001

Для лучшего понимания сути вопроса смотрите информацию на первой странице темы, составленную камрадом Ing-Syst.

Так же помочь разобраться в показаниях СМАРТ может очень подробный материал размещенный на сайте ixbt.com: Оцениваем состояние дисков при помощи S.M.A.R.T.

Возможно, для решения Вашей проблемы потребуется провести цикл процедур утилитами Виктория и MHDD. Ссылки на инструкции по работе с программами можно найти на первой странице темы.

Связанные темы

[FAQ] Всё о винчестерах Western Digital
[FAQ] по винчестерам Seagate
[FAQ] Всё о винчестерах Hitachi
[FAQ] Всё о винчестерах Samsung

Восстановление данных
Ремонт HDD

Сигейт официально признал проблему с 7200.11

Полезные сообщения участников этой темы:

Обнуление некоторых параметров СМАРТ на винчестерах Samsung
Как найти файлы на которые приходятся кандидаты на ремап.
Отключение парковки на винчестерах Seagate 7200.14 без батников и автозагрузки (нуждается в проверке)
Remap и Advanced remap, erase и erase delays — назначение команд утилиты Victoria (1,2)
Смена режима контроллера с IDE на AHCI при наличии уже установленной операционной системы Win 7 Win XP

ShutUp — программа камрада CoolCMD для предотвращения частых парковок HDD.

https://disk.yandex.ru/d/x3UITAgo3EGqub

Программа считывает один сектор через определенный пользователем промежуток времени.

Учёт и поиск запчастей к жестким дискам — R.baza.

Последний раз редактировалось KT 29.11.2021 18:36, всего редактировалось 15 раз(а).

Реклама
Партнер

vensant_jarden

Member

Статус: Не в сети
Регистрация: 28.01.2016
Фото: 1

Sania. ясно. Значит — просто установить и если никаких явных проблем не возникнет — следить за ситуацией на дистанции.
Глупый вопрос: а во время/из-за установки нового драйвера на SATA — не может, случайно, возникнуть проблем, которые приведут к потере информации на дисках?

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Да, глупый вопрос, а если вы установите драйвер на видеокарту, это лишает её работоспособности? Если бы драйвер к такому мог приводить, как вы думаете, вам бы не написали этого, а остальные не засудили бы интел за такой кривой драйвер?

Sinestery

Junior

Статус: Не в сети
Регистрация: 09.06.2018

Tomset писал(а):

Помер и смарт у него явно слетел.

А что конкретно не так?
Вот например полностью рабочий диск, такой же

img

Вложение:

2d.PNG [ 112.89 КБ | Просмотров: 686 ]

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Sinestery писал(а):

А что конкретно не так?

В том что чушь половина смарта отображает, возьмите современную прогу по чтению смарта.

kolyan1980-08-11

Member

Статус: Не в сети
Регистрация: 17.07.2011
Откуда: Нижний Новгород
Фото: 139

userID Я не утверждаю, но мне как-то раз помогла.

_________________
i5 9600k/PrimeZ390-A/4*8GB@3GHz/AsusTurboRTX3070/Samsung830(256GB)/M9PeGN(512GB)/6TB+6TB+4TBhdd/HAF932/RM650x(CP-9020091)/SonyKDL-42W705B/Win10insider

RuckusDJ

Junior

Статус: Не в сети
Регистрация: 26.10.2019
Откуда: Ангарск

Здравствуйте!
Как-то недавно писал тут с проблемой WD Black 2 Tb. Посоветовали мне скопировать нужную информацию и произвести remap в виктории или полнейшее форматирование. Сделал полный формат через HDD Low Level Format. Система теперь не просит исправить ошибки, но смарт меня пугает. Диск смело можно выбрасывать?

Вложения:

Снимок.JPG [ 129.13 КБ | Просмотров: 579 ]

fixit

Member

Статус: Не в сети
Регистрация: 15.10.2014

RuckusDJ писал(а):

Диск смело можно выбрасывать?

Теперь remap под DOS

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Охлаждение ему организовать, он сейчас 52 греется, это очень плохо.

RuckusDJ

Junior

Статус: Не в сети
Регистрация: 26.10.2019
Откуда: Ангарск

Sania.
Это после форматирования я сделал скрин.
fixit
Ремап в виктории не помог.

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Не доводите диск до перегрева в любом случаи.
Ремапить пока там нечего по смарту.
Что вам там ещё не нравится?

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

Всем доброго времени суток!
Стал фризить при записи SSD KINGSTON SHSS37A240G. Винт используется в качестве системного.
Вот СМАРТ и график записи:

Код:

ID Описание атрибута Порог Значение Наихудшее Данные Статус
01 Raw Read Error Rate 50 100 100 0 OK: Значение нормальное
02 Throughput Performance 50 100 100 0 OK: Значение нормальное
03 Spinup Time 50 100 100 0 OK: Значение нормальное
05 Reallocated Sector Count 50 100 100 0 OK: Значение нормальное
07 Seek Error Rate 50 100 100 0 OK: Значение нормальное
08 Seek Time Performance 50 100 100 0 OK: Значение нормальное
09 Power-On Hours Count 0 100 100 14582 OK: Всегда пройдено
0C Power Cycle Count 0 100 100 1559 OK: Всегда пройдено
A8 SATA PHY Error Count 0 100 100 108 OK: Всегда пройдено
AA Bad Block Count (Early / Later) 10 100 100 315 / 0 OK: Значение нормальное
AD Erase Count (Average / Max) 0 100 100 59 / 144 OK: Всегда пройдено
AF Bad Cluster Table Count 50 100 100 0 OK: Значение нормальное
BB Reported Uncorrectable Errors 0 100 100 0 OK: Всегда пройдено
C0 Unsafe Shutdown Count 0 100 100 354 OK: Всегда пройдено
C2 Temperature 30 64 56 44, 17, 36 OK: Значение нормальное
C4 Later Bad Block Count 10 100 100 0 OK: Значение нормальное
C5 Current Pending Sector Count 0 100 100 0 OK: Всегда пройдено
C7 CRC Error Count 50 100 100 47 OK: Значение нормальное
DA CRC Error Count 50 100 100 47 OK: Значение нормальное
E7 SSD Life Left 0 100 100 99% OK: Всегда пройдено
E9 Lifetime Writes to Flash 0 100 100 11.47 ТБ OK: Всегда пройдено
F0 Write Head 0 100 100 0 OK: Всегда пройдено
F1 Host Writes (Sector Unit) 0 100 100 6.27 ТБ OK: Всегда пройдено
F2 Host Reads (Sector Unit) 0 100 100 12.94 ТБ OK: Всегда пройдено
F4 Average Erase Count 0 100 100 59 OK: Всегда пройдено
F5 Max Erase Count 0 100 100 144 OK: Всегда пройдено
F6 Total Erase Count 0 100 100 4122880 OK: Всегда пройдено

Вложение:

ssd.png [ 37.62 КБ | Просмотров: 514 ]

График чтения не сохранил, но он был ровный на 540МБ/с.
На SATA PHY Error Count не смотрите, в самом начале был плохой кабель, за последний месяц этот параметр не менялся.
Гарантия закончилась в сентябре.
Что можете посоветовать?

Последний раз редактировалось 7Gluk7 20.01.2020 15:24, всего редактировалось 1 раз.

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7 писал(а):

Что можете посоветовать?

Очистить диск в нулину и проделать этот тест с другого диска.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

Очистить диск в нулину

Secure Erase — понимаю; а вот «в нулину» — это куда, зачем и кому?..

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Там скорее удаление MBR хватит, но можно ещё чего, что придумает автор, главное пустой стал.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

удаление MBR хватит

А, так это оно самое

Sania. писал(а):

Очистить диск в нулину

и есть?
А я-то уж было задумался…

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

Sania. писал(а):

Очистить диск в нулину и проделать этот тест с другого диска.

AIDA64 вроде при тесте записи как раз нулями и заполняет?
Но попробую еще раз другой прогой.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

Да вот не » «, а пиши уже внятными терминами; а то словоблудием своим только людей с пути истинного сбиваешь… 8-)

Добавлено спустя 46 секунд:

7Gluk7 писал(а):

AIDA64 вроде при тесте записи как раз нулями и заполняет?

Лучше всё-же Secure Erase.

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7 писал(а):

AIDA64 вроде при тесте записи как раз нулями и заполняет?

На не пустой дмск, который не нулями и единицами заполнен, а конкретными файлами, которые винда не даст айде переписать,что бы вы не плакались как пол винда с фотками куда то пропали.

Добавлено спустя 2 минуты 7 секунд:

O Smirnoff писал(а):

Да вот не » «, а пиши уже внятными терминами; а то словоблудием своим только людей с пути истинного сбиваешь…

Да так меньше приходится писать, нужно же выяснить подкованность спрашивающего.

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

O Smirnoff писал(а):

Лучше всё-же Secure Erase.

Попробую.

Sania. писал(а):

Я с LiveUSB, а винду пока на vhd переместил

Последний раз редактировалось 7Gluk7 20.01.2020 15:49, всего редактировалось 1 раз.

—

Кто сейчас на конференции

Сейчас этот форум просматривают: нет зарегистрированных пользователей и гости: 3

Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Лаборатория

Новости

Модераторы: Trinity admin`s, Free-lance moderator`s

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

что за ошибка? Phy is bad on enclosure.

HARDWARE—
Controller: Controller0: LSI MegaRAID SAS 9280-8e(Bus 5,Dev 0,Domain 0)
Status: Optimal
Firmware Package Version:12.15.0-0239
Firmware Version: 2.130.403-4660
BBU: NO
Enclosure(s): 1
Drive(s): 13
Virtual Drive(s): 3

Enclosures—
PRODUCT NAME TYPE STATUS
SAS2X28 Ses OK

Drives—
CONNECTOR PRODUCT ID VENDOR ID STATE DISK TYPE CAPACITY POWER STATE
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 ST1000NM0001 SEAGATE Online SAS 931.000 GB On
null x0 & null x0 MG03SCA200 TOSHIBA Online SAS 1.819 TB On
null x0 & null x0 MG03SCA200 TOSHIBA Online SAS 1.819 TB On
null x0 & null x0 ST1000NM00339ZM ATA Dedicated Hot Spare SATA 931.000 GB Powersave
null x0 & null x0 ST1000NM00339ZM ATA Dedicated Hot Spare SATA 931.000 GB Powersave
null x0 & null x0 MAXTORSTM316081 ATA Online SATA 148.531 GB On
null x0 & null x0 MAXTORSTM316081 ATA Online SATA 148.531 GB On

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 25 апр 2017, 09:08

pinkzebra
Что-то я не вижу лога и ошибки в нем. Ошибка на каком-то диске или целиком на всех?

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 25 апр 2017, 11:46

Stranger03 писал(а):pinkzebra
Что-то я не вижу лога и ошибки в нем. Ошибка на каком-то диске или целиком на всех?

на 4 последних диска в режиме ATA
все это благополучно работает, только сразу после загрузки это сообщение и красный сигнал на корзине каждого из 4.
9 дисков sas ведут себя прилично.

gs: Сотрудник Тринити; Сообщения: 16650; Зарегистрирован: 23 авг 2002, 17:34; Откуда: Москва; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

gs » 25 апр 2017, 11:48

Контроллер ругается на четыре порта вообще-то. Все саташники? Они есть в компатибилити листе? А то там еще сообщения, что диски переведены в спящий режим, что тоже не всегда гладко работает.

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 25 апр 2017, 13:46

хорошо не 4 диска а 4 порта.
да эти 4 диска sata, два из них в списке совместимости.

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 25 апр 2017, 13:47

pinkzebra писал(а):хорошо не 4 диска а 4 порта.
да эти 4 диска sata, два из них в списке совместимости.

Попробуйте воткнуть любой SAS диск, дабы исключить поломку порта — бекплейна.

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 27 апр 2017, 09:08

вставил sas диск в пустую корзину и в корзину вместо диска sata во всех случаях стартовал нормально без данной ошибки.
вывод данную ошибку вызывают именно диски с sata разъемом…
я так понимаю что проблема в экспандере? нужно его перешить?

Код: Выделить всё

ID = 114
SEQUENCE NUMBER = 1365
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:12  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 247
SEQUENCE NUMBER = 1364
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1363
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:12

ID = 247
SEQUENCE NUMBER = 1362
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   29

ID = 91
SEQUENCE NUMBER = 1361
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:10

ID = 185
SEQUENCE NUMBER = 1360
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   Phy is bad on enclosure:   1  PHY   10

ID = 331
SEQUENCE NUMBER = 1359
TIME = 27-04-2017 10:47:42
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:10  Previous   =   Powersave  Current   =   On

ID = 114
SEQUENCE NUMBER = 1358
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1357
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1356
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:10

ID = 289
SEQUENCE NUMBER = 1355
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port A:1:10  Path :   1  SAS Address :   0x500000E01714D813

ID = 114
SEQUENCE NUMBER = 1354
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 113
SEQUENCE NUMBER = 1353
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:10Power on occurred,   CDB   =    0x28 0x00 0x08 0x8f 0xc1 0xcf 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x28 0x00 0x00 0x00 0x00 0x29 0x01 0x00 0x00 0x00 0x00 0x00 0x28 0x00 0x01 0x04 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x22 0x13 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

ID = 247
SEQUENCE NUMBER = 1352
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1351
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:10

ID = 331
SEQUENCE NUMBER = 1350
TIME = 27-04-2017 10:46:59
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:9  Previous   =   Transition  Current   =   On

ID = 331
SEQUENCE NUMBER = 1349
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:9  Previous   =   Powersave  Current   =   Transition

ID = 113
SEQUENCE NUMBER = 1348
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:7Power on occurred,   CDB   =    0x2e 0x00 0xe8 0xe0 0x62 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x28 0x00 0x00 0x00 0x00 0x29 0x01 0x00 0x00 0x00 0x00 0x00 0x2e 0x01 0x08 0x17 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x19 0x19 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 1347
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:6Mode parameters changed,   CDB   =    0x2e 0x00 0x74 0x70 0x47 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x2a 0x01 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 1346
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:5Mode parameters changed,   CDB   =    0x2e 0x00 0x74 0x70 0x47 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x2a 0x01 0x00 0x00 0x00 0x00

ID = 114
SEQUENCE NUMBER = 1345
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Hot Spare      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1344
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   29

ID = 112
SEQUENCE NUMBER = 1343
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:10

ID = 114
SEQUENCE NUMBER = 1342
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:11  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1341
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1340
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:11

ID = 289
SEQUENCE NUMBER = 1339
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port A:1:11  Path :   1  SAS Address :   0x500000E01714D813

ID = 114
SEQUENCE NUMBER = 1338
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:11  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 247
SEQUENCE NUMBER = 1337
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1336
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:11

ID = 114
SEQUENCE NUMBER = 1335
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:12  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1334
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1333
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:12

ID = 289
SEQUENCE NUMBER = 1332
TIME = 27-04-2017 10:45:20
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port B:1:12  Path :   0  SAS Address :   0x500000E01714D812

ID = 247
SEQUENCE NUMBER = 1331
TIME = 27-04-2017 10:45:01
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1330
TIME = 27-04-2017 10:45:01
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:12

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 27 апр 2017, 09:16

pinkzebra писал(а):я так понимаю что проблема в экспандере? нужно его перешить?

Скорей его нужно менять, либо диски брать NL SAS.

gs: Сотрудник Тринити; Сообщения: 16650; Зарегистрирован: 23 авг 2002, 17:34; Откуда: Москва; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

gs » 27 апр 2017, 10:54

Я что-то сомневаюсь, что дело именно в типе интерфейса. Скорее в несовместимости конкретных саташников с контроллером.

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 27 апр 2017, 11:25

gs писал(а):Я что-то сомневаюсь, что дело именно в типе интерфейса. Скорее в несовместимости конкретных саташников с контроллером.

Можно проверить, подключив напрямую без бекплейна,

Вернуться в «Массивы — Технические вопросы, решение проблем.»

Перейти

Серверы
↳ Серверы — Конфигурирование
↳ Конфигурации сервера для 1С
↳ Серверы — Решение проблем
↳ Серверы — ПО, Unix подобные системы
↳ Серверы — ПО, Windows система, приложения.
↳ Серверы — ПО, Базы Данных и их использование
↳ Серверы — FAQ
Дисковые массивы, RAID, SCSI, SAS, SATA, FC
↳ Массивы — RAID технологии.
↳ Массивы — Технические вопросы, решение проблем.
↳ Массивы — FAQ
Майнинг, плоттинг, фарминг (Добыча криптовалют)
↳ Proof Of Work
↳ Proof Of Space
Кластеры — вычислительные и отказоустойчивые ( SMP, vSMP, NUMA, GRID , NAS, SAN)
↳ Кластеры, Аппаратная часть
↳ Deep Learning и AI
↳ Кластеры, Программное обеспечение
↳ Кластеры, параллельные файловые системы
Медиа технологии, и цифровое ТВ, IPTV, DVB
↳ Станции видеомонтажа, графические системы, рендеринг.
↳ Видеонаблюдение
↳ Компоненты Digital TV решений
↳ Студийные системы, производство ТВ, Кино и рекламы
Инфраструктурное ПО и его лицензирование
↳ Виртуализация
↳ Облачные технологии
↳ Резервное копирования / Защита / Сохранение данных
Сетевые решения
↳ Сети — Вопросы конфигурирования сети
↳ Сети — Технические вопросы, решение проблем
Общие вопросы
↳ Обсуждение общих вопросов
↳ Приколы нашего IT городка
↳ Регистрация на форуме

DISK

DISK Displays information about the disks in the system.

This command is used to change the configuration settings for the disks
in the system and monitor the status of the disk channels. The command
will display the current disk configuration settings and the status of
each disk channel. The INFO= parameter can be used to display all of
the information about a disk in the system. The LIST parameter will
display a list of the disks installed in the system and indicate how
many were found.

AGINGLIMIT=x|OFF

Sets the maximum time a command should wait in the disk command queue
for.
This parameter is for Hitachi SAS drives only.
Each unit of this timer is 50 ms, where 0 is 50 ms.
Range: 0 to 4 (50 to 250 milliseconds) or OFF.
Default is 3, (200 milliseconds)

AUTOREASSIGN=ON|OFF

Allows the user to turn on or off whether bad blocks will be
reassigned when a medium error occurs on a healthy tier.
Default is ON.

CMD_TIMEOUT=x

This parameter sets the retry disk timeout (in seconds) for an I/O
request. The retry timeout value indicates the maximum amount of
time that is allotted to receive a reply for each retry of an I/O
request. If the I/O request does not complete within this time, it is
aborted and potentially retried: if there is still time remaining in
the overall disk timeout to allow for another retry, it is retried;
if not, it completes with an error status.
This parameter must be smaller than or equal to TIMEOUT.
Valid range is 1 to 512 seconds.
Recommended value for SAS drives is 11 seconds.
Recommended value for SATA drives is 31 seconds. Setting the timeout
below the recommended values can cause disk failures.
Default is 31 seconds.

DEFECTLIST[=tc]

Allows the user to display the number of defects in the defect list
for the specified disk. The defect list contains all the physical
sectors on the disk that the drive has identified as bad, and to
which the disk’s hardware prevents access. The list is classified
into two types: the permanent list and the grown list. The permanent
list consists of the bad sectors that are identified by the disk
manufacturer; the grown list consists of the bad sectors that are
found after the disk has left the factory (and which can be added to
at any time).
The disk is specified by its tier and channel locations, ‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

DIAG[=tc]

Performs a series of diagnostics tests on the specified disk.
The disk is specified by its physical tier and channel locations,
‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

FAIL[=tc]

This parameter tells the system to fail the specified disk at the
physical tier and channel locations indicated by ‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.
When a non-SPARE disk is specified:
If failing the disk won’t cause a multi-channel failure, the disk
is marked as failed, and an attempt is made to replace it with a
spare disk.
When a SPARE disk is specified:
If the spare disk is currently in use as a replacement for a
failed disk, then the disk that the spare is replacing is put back to
a failed status, and the spare is released, but it is marked as
unhealthy and unavailable.

FAST_FAIL=[ON|OFF]

This parameter turns on/off the fast fail mode for disks that are
slow to respond to data access commands. The fast fail parameters can
be customized to a particular need. Default is OFF.

FAST_FAIL_THRESHOLD=’num cmds’

This parameter indicates how many consecutive commands in the fast
fail algorithm must occur before failing the drive for this reason.
The default value is 5.
Valid range = 2 — 20.

FAST_FAIL_WINDOW_END=’t’

This parameter indicates the timeout in seconds for when a disk
response is received outside of a window in the future. If the
command finishes outside of this time value, it is not aggregated in
the slow disk algorithm as it is considered a separate instance of
the event and the counter will restart. The default value is 90
seconds.
Valid range = 3 — 180.

FAST_FAIL_WINDOW_START=’t’

This parameter indicates the timeout in seconds for when a disk
response is considered slow and will count against the drive in the
slow disk fail algorithm. The default value is 5 seconds.
Valid range = 2 — 179.

INFO[=tc]

This parameter displays the information and status about a specific
disk in the system.
The disk is specified by its physical tier and channel locations,
‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

LIST[=SAS_ID|SPEED]

This parameter displays a list of all the disks installed in the
system and indicates how many were found of each type.
The optional SAS_ID parameter will display the SAS ID of the device
instead of the serial number.
The optional SPEED parameter will display the link speed of the
device instead of the RPM.

LLFORMAT[=tc]

Allows the user to perform a low level format of a disk drive.
The disk is specified by its tier and channel locations, ‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

MAXCMDS=x

Sets the maximum command queue depth to a tier of disks.
Range: 1 to 32 commands per tier.
Default: 16 commands.
Setting should be as follows:
— 16 if any SATA drives are used.
— 32 for everything else.

MAXREADLEN=x

Sets the maximum read command length for SATA drives in KiB.
This parameter is used to increase throughput on systems with a large
number of SATA tiers by reducing the contention for the SAS lanes.
128K is the recommended setting for systems with 16 tiers or more of
SATA disks.
2048K is the recommended setting for systems with SAS disks.
Range is 128 to 2048.
Default is 128.

MAXWRITELEN=x

Sets the maximum write command length to the drives in KiB.
This parameter is provided for testing only and should normally not
be changed.
Range is 128 to 2048.
Default is 2048.

PLS[=[t][c]]

Requests/displays the PHY Link Error Status Block information for the
specified drive. Note that SATA and SAS drives report PHY errors
differently. The PHY information consists of the following items:

ERROR	SATA AAMUX PHY ERRORS Explanation
H-RX	Number of SATA FIS CRC errors received on the host port of the AAMUX
H-TX	Number of SATA R_ERR primitives received on the host port indicating a problem with the transmitter of the AAMUX
H-Link	Number of times the PHY has lost link on the host port
H-Disp	Number of frame errors for the host port of the AAMUX. These include: code error, disparity error, or realignment
O-RX	Number of SATA FIS CRC errors received on the other host port of the AAMUX
O-TX	Number of SATA R_ERR primitives received on the other host port indicating a problem with the transmitter if the AAMUX
O-Link	Number of times the PHY has lost link on the other host port
O-Disp	Number of frame errors for the other host port of the AAMUX. These include: code error, disparity error, or realignment
D-RX	Number of SATA FIS CRC errors received on the device port of the AAMUX
D-TX	Number of SATA R_ERR primitives received on the device port indicating a problem with the transmitter of the AAMUX
D-Link	Number of times the PHY has lost link on the device port
D-Disp	Number of frame errors for the device port of the AAMUX. These include: code error, disparity error, or realignment

Error	SAS PHY ERRORS Explanation
InvDW	Invalid DWORD Count — The number of invalid dwords received outside of the PHY reset sequence.
RunDis	Running disparity Count — The number of dwords containing running disparity errors received outside of the PHY reset sequence.
LDWSYN	Loss of DWORD synchronization count — The number of times the PHY has lost synchronization and the link reset sequence.
PHYRES	PHY Reset Problem count — The number of times the PHY reset sequence has failed.

The disk is specified by its physical tier and channel locations,
‘tc’,where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

If neither the tier nor the channel are specified, the PLS
information is requested from all drives.
If only the tier is specified, the PLS information is requested from
all the drives on the specified tier.

PMBIT=ON|OFF

When ON this parameter sets the PM (performance mode) bit in Seagate
SAS drives mode pages. When OFF the Seagate drive uses its default
performance mode settings.
Default is OFF.

QUARANTINE

Displays the of number quarantine events on this controller for each
disk in the system. Only tiers with quarantine counts will be
displayed.
Use QUARANTINECLEAR to reset the quarantine counts.

QUARANTINE=[ON|OFF]

Enables/disables the disk quarantine feature for all of the disks. A
disk cannot be quarantined unless FASTAV is enabled for the LUN.
Default is OFF.

QUARANTINECLEAR

Resets the quarantine counts for all of the disks.

QUARANTINECMDLIMIT=x

Sets the maximum number of outstanding disk commands after a good
response before a quarantined disk can be put back into service.
Range 0 to 32 where 0 indicates no delay before putting the disk back
into service.
Default is 0.

QUARANTINETIMEOUT=x

Sets the minimum timeout before a disk can be quarantined in 16.6
millisecond increments. A disk cannot be quarantined unless FASTAV is
enabled and has timed out on the command.
Range 6 to 65535.
Default is 12 (200 milliseconds)

REASSIGN[=tc] [0xh

Allows for the reassigning of defective logical blocks on a disk to
an area of the disk reserved for this purpose.
The disk is specified by its tier and channel locations, ‘tc’, where:
‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.
0xh is the hexadecimal value of the LBA (Logical Block Address) to be
reassigned.

REBUILD[=tc]|ALL

This parameter tells the system to start a rebuild operation on a
(presumably) already failed disk. A rebuild operation restores a
failed disk to a healthy status once it completes. Note that this
operation can take several hours to complete depending on the size of
the disk and the speed of the rebuild operation. The speed of the
rebuild operation can be adjusted with the DELAY and EXTENT
parameters of the TIER command.
In addition, the rebuild operation can be stopped, or paused and
resumed with the TIER STOP, TIER PAUSE, and TIER RESUME commands.
The TIER AUTOREBUILD command can be used to automate the rebuild
process.
Note that SPARE disks are handled slightly differently from other
disks, in that SPARES that are not in use as an active replacement
for a failed disk elsewhere in the system are simply returned to a
normal healthy status by this command; SPAREs that are in use are
already considered healthy and are not rebuilt.
The failed disk to be rebuilt is specified by its physical tier and
channel locations, ‘tc’, where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be rebuilt using the ALL parameter.

REBUILDNOJOURNAL[=tc]|ALL

This parameter tells the system to start a rebuild operation on a
(presumably) already failed disk without using the journal. A
rebuild operation restores a failed disk to a healthy status once it
completes. Note that this operation can take several hours to
complete depending on the size of the disk and the speed of the
rebuild operation. The speed of the rebuild operation can be
adjusted with the DELAY and EXTENT parameters of the TIER command.
In addition, the rebuild operation can be stopped, or paused and
resumed with the TIER STOP, TIER PAUSE, and TIER RESUME commands.
The TIER AUTOREBUILD command can be used to automate the rebuild
process.
Note that SPARE disks are handled slightly differently from other
disks, in that SPARES that are not in use as an active replacement
for a failed disk elsewhere in the system are simply returned to a
normal healthy status by this command; SPAREs that are in use are
already considered healthy and are not rebuilt.
The failed disk to be rebuilt is specified by its physical tier and
channel locations, ‘tc’, where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be rebuilt using the ALL parameter.

REBUILDVERIFY=ON|OFF

This parameter determines if the system will send SCSI Write with
Verify commands to the disks when rebuilding failed disks. This
feature is used to guarantee that the data on the disks is rebuilt
correctly.
Note: This feature will increase the time it takes for rebuilds to
finish.

Default is OFF.

REPLACE[=tc]

This parameter tells the system to replace the specified failed disk
with a spare disk or replace a healthy disk that is believed to be on
the verge of failing. The healthy disk replacement is referred to in
the system as a proactive replacement operation. A replace operation
is used to temporarily replace a disk with a healthy spare disk.
This operation can take several hours to complete depending on the
size of the disk and speed of the replace operation. The speed of
the replace operation can be adjusted with the DELAY and EXTENT
parameters of the TIER command.
The disk to be replaced is specified by its physical tier and channel
locations, ‘tc’, where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHP>.

(Note that spare disks themselves cannot be replaced with this
command).

RESTART[=tc]

This parameter tells the system to start a restart operation on a
(presumably) already failed disk.The failed disk to be restarted is
specified by its physical tier and channel locations, ‘tc’, where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be restart using the ALL parameter.

SCAN

This parameter checks each disk channel in the system for any new
disks and verifies that the existing disks are in the correct
location. It also starts a rebuild operation on any failed disks
which pass the disk diagnostics.

STATUS

Displays the loop status of each disk channel and a count of the
fibre channel errors encountered on each channel.

STATUSCLEAR

Resets the fibre channel error counts on each disk channel.

TIMEOUT=x

This parameter sets the total disk timeout (in seconds) for an I/O
request. The total disk timeout value indicates the total overall
length of time allotted to each I/O request to complete; if an I/O
request has not completed within this time frame, then an error
status is reported for it.
This parameter must be greater than or equal to CMD_TIMEOUT.
Valid range is 1 to 512 seconds.

Recommended value for SAS drives is 27 seconds.
Recommended value for SATA drives is 60 seconds.

Default is 60 seconds.

WRITESAME=ON|OFF

Enable and disables use of the SCSI Write Same command when
formatting LUNs. The SCSI Write Same command is used by the system to
format a LUNs faster. This parameter is provided for backwards
compatibility with disks or enclosures that do not support the SCSI
Write Command.

Default is OFF.

The steps I took to fix it:

updated BIOS
In the BIOS, diabled the SATA IDE Combined Mode with this help
reading the kernel documentation about kernel parameters, since every solution online was about adding parameters to that.
I found out that my SSD actually only supports SATA speed 3.0Gbps with a good shell script

    for i in `grep -l Gbps /sys/class/ata_link/*/sata_spd`; do
     echo Link "${i%/*}" Speed `cat $i`
     cat "${i%/*}"/device/dev*/ata_device/dev*/id | perl -nE 's/([0-9a-f]{2})/print chr hex $1/gie' | echo "    " Device `strings` | cut -f 1-3
    done

In the grub configuration, set the SATA port of the SSD drive to maximum speed 3.0

    vi /etc/default/grub

changed the parameter in this line to allow only 3Gbps for SATA port 7 (my SSD)

    GRUB_CMDLINE_LINUX_DEFAULT="libata.force=7:3.0G quiet"

update grub and reboot

    update-grub
    reboot

The solution to this has come a long long way for me. I basically approached the whole problem every other day from scratch.

The problems I found on the way where:

I checked my SMART stats every day and compared. The error count didn’t increase even though the exceptions kept being thrown.
My SSD was actually the one causing the kernel exceptions, this script helped me lots to understand which ATA device was actually which hard drive in the case
My SSD and two other drives where on a completely wrong speed setting (UDMA)

root@msa-nas1:~# sudo hdparm -I /dev/sd{a,b,c,d,e,f,g} | grep -i udma
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6
DMA: mdma0 mdma1 mdma2 udma0 *udma1 udma2 udma3 udma4 udma5 udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6

The dmesg log showed some strange messages about 40-wire cables, even though those don’t really exist anymore, I bought two different NEW cables, nothing helped.

[    1.193091] ata5.01: ATA-8: SanDisk SD6SF1M128G1022I, X231200, max UDMA/133
[    1.193095] ata5.01: 250069680 sectors, multi 1: LBA48 NCQ (depth 0/32)
[    1.193743] ata5.00: limited to UDMA/33 due to 40-wire cable
[    1.193746] ata5.01: limited to UDMA/33 due to 40-wire cable

Grub loaded a funny kernel for the last two drives: pata_atiixp. I was expecting the AHCI driver.

[    1.022724] scsi4 : pata_atiixp
[    1.022834] scsi5 : pata_atiixp
[    1.022887] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf100 irq 14
[    1.022888] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf108 irq 15

I checked the power consumption and compared if it exceeded the power unit, it did not. Not even close.
I replaced the SSD with exactly the same model from another machine. Excactly the same model. Still the same errors.
The SSD!! was in fact incredibly slow, so the hdparm about the UDMA output was actually correct.

    root@msa-nas1:~# hdparm -t -T /dev/sdf

    /dev/sdf:
     Timing cached reads:   2144 MB in  2.00 seconds = 1072.18 MB/sec
     Timing buffered disk reads:   8 MB in  3.60 seconds =   2.22 MB/sec

I tried reaching out to SandDisk, it was their hard drive giving me the exceptions, without any success. I could really not find anyone with the exact same problem, but many people with similar problems, in the end I tried a few of those suggested solutions and it turned out to be a mix of a few things. Now it all makes perfectly sense to me, afterwards everyone knows better I guess.

The steps I took to fix it:

updated BIOS
In the BIOS, diabled the SATA IDE Combined Mode with this help
reading the kernel documentation about kernel parameters, since every solution online was about adding parameters to that.
I found out that my SSD actually only supports SATA speed 3.0Gbps with a good shell script

    for i in `grep -l Gbps /sys/class/ata_link/*/sata_spd`; do
     echo Link "${i%/*}" Speed `cat $i`
     cat "${i%/*}"/device/dev*/ata_device/dev*/id | perl -nE 's/([0-9a-f]{2})/print chr hex $1/gie' | echo "    " Device `strings` | cut -f 1-3
    done

In the grub configuration, set the SATA port of the SSD drive to maximum speed 3.0

    vi /etc/default/grub

changed the parameter in this line to allow only 3Gbps for SATA port 7 (my SSD)

    GRUB_CMDLINE_LINUX_DEFAULT="libata.force=7:3.0G quiet"

update grub and reboot

    update-grub
    reboot

The solution to this has come a long long way for me. I basically approached the whole problem every other day from scratch.

The problems I found on the way where:

I checked my SMART stats every day and compared. The error count didn’t increase even though the exceptions kept being thrown.
My SSD was actually the one causing the kernel exceptions, this script helped me lots to understand which ATA device was actually which hard drive in the case
My SSD and two other drives where on a completely wrong speed setting (UDMA)

root@msa-nas1:~# sudo hdparm -I /dev/sd{a,b,c,d,e,f,g} | grep -i udma
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6
DMA: mdma0 mdma1 mdma2 udma0 *udma1 udma2 udma3 udma4 udma5 udma6
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6

The dmesg log showed some strange messages about 40-wire cables, even though those don’t really exist anymore, I bought two different NEW cables, nothing helped.

[    1.193091] ata5.01: ATA-8: SanDisk SD6SF1M128G1022I, X231200, max UDMA/133
[    1.193095] ata5.01: 250069680 sectors, multi 1: LBA48 NCQ (depth 0/32)
[    1.193743] ata5.00: limited to UDMA/33 due to 40-wire cable
[    1.193746] ata5.01: limited to UDMA/33 due to 40-wire cable

Grub loaded a funny kernel for the last two drives: pata_atiixp. I was expecting the AHCI driver.

[    1.022724] scsi4 : pata_atiixp
[    1.022834] scsi5 : pata_atiixp
[    1.022887] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf100 irq 14
[    1.022888] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf108 irq 15

I checked the power consumption and compared if it exceeded the power unit, it did not. Not even close.
I replaced the SSD with exactly the same model from another machine. Excactly the same model. Still the same errors.
The SSD!! was in fact incredibly slow, so the hdparm about the UDMA output was actually correct.

    root@msa-nas1:~# hdparm -t -T /dev/sdf

    /dev/sdf:
     Timing cached reads:   2144 MB in  2.00 seconds = 1072.18 MB/sec
     Timing buffered disk reads:   8 MB in  3.60 seconds =   2.22 MB/sec

Источник

Модераторы: Trinity admin`s, Free-lance moderator`s

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

что за ошибка? Phy is bad on enclosure.

Enclosures—
PRODUCT NAME TYPE STATUS
SAS2X28 Ses OK

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 25 апр 2017, 09:08

pinkzebra
Что-то я не вижу лога и ошибки в нем. Ошибка на каком-то диске или целиком на всех?

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 25 апр 2017, 11:46

Stranger03 писал(а):pinkzebra
Что-то я не вижу лога и ошибки в нем. Ошибка на каком-то диске или целиком на всех?

gs: Сотрудник Тринити; Сообщения: 16650; Зарегистрирован: 23 авг 2002, 17:34; Откуда: Москва; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

gs » 25 апр 2017, 11:48

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 25 апр 2017, 13:46

хорошо не 4 диска а 4 порта.
да эти 4 диска sata, два из них в списке совместимости.

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 25 апр 2017, 13:47

pinkzebra писал(а):хорошо не 4 диска а 4 порта.
да эти 4 диска sata, два из них в списке совместимости.

Попробуйте воткнуть любой SAS диск, дабы исключить поломку порта — бекплейна.

pinkzebra: Junior member; Сообщения: 9; Зарегистрирован: 18 апр 2017, 12:58

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

pinkzebra » 27 апр 2017, 09:08

Код: Выделить всё

ID = 114
SEQUENCE NUMBER = 1365
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:12  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 247
SEQUENCE NUMBER = 1364
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1363
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:12

ID = 247
SEQUENCE NUMBER = 1362
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   29

ID = 91
SEQUENCE NUMBER = 1361
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:10

ID = 185
SEQUENCE NUMBER = 1360
TIME = 27-04-2017 10:48:47
LOCALIZED MESSAGE = Controller ID:  0   Phy is bad on enclosure:   1  PHY   10

ID = 331
SEQUENCE NUMBER = 1359
TIME = 27-04-2017 10:47:42
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:10  Previous   =   Powersave  Current   =   On

ID = 114
SEQUENCE NUMBER = 1358
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1357
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1356
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:10

ID = 289
SEQUENCE NUMBER = 1355
TIME = 27-04-2017 10:47:25
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port A:1:10  Path :   1  SAS Address :   0x500000E01714D813

ID = 114
SEQUENCE NUMBER = 1354
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 113
SEQUENCE NUMBER = 1353
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:10Power on occurred,   CDB   =    0x28 0x00 0x08 0x8f 0xc1 0xcf 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x28 0x00 0x00 0x00 0x00 0x29 0x01 0x00 0x00 0x00 0x00 0x00 0x28 0x00 0x01 0x04 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x22 0x13 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

ID = 247
SEQUENCE NUMBER = 1352
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1351
TIME = 27-04-2017 10:47:12
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:10

ID = 331
SEQUENCE NUMBER = 1350
TIME = 27-04-2017 10:46:59
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:9  Previous   =   Transition  Current   =   On

ID = 331
SEQUENCE NUMBER = 1349
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0  Power state change on PD   =   Port B:1:9  Previous   =   Powersave  Current   =   Transition

ID = 113
SEQUENCE NUMBER = 1348
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:7Power on occurred,   CDB   =    0x2e 0x00 0xe8 0xe0 0x62 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x28 0x00 0x00 0x00 0x00 0x29 0x01 0x00 0x00 0x00 0x00 0x00 0x2e 0x01 0x08 0x17 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x19 0x19 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 1347
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:6Mode parameters changed,   CDB   =    0x2e 0x00 0x74 0x70 0x47 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x2a 0x01 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 1346
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   Unexpected sense:   PD       =   Port B:1:5Mode parameters changed,   CDB   =    0x2e 0x00 0x74 0x70 0x47 0x6b 0x00 0x00 0x01 0x00    ,   Sense   =    0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x2a 0x01 0x00 0x00 0x00 0x00

ID = 114
SEQUENCE NUMBER = 1345
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:10  Previous   =   Hot Spare      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1344
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   29

ID = 112
SEQUENCE NUMBER = 1343
TIME = 27-04-2017 10:46:49
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:10

ID = 114
SEQUENCE NUMBER = 1342
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:11  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1341
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1340
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:11

ID = 289
SEQUENCE NUMBER = 1339
TIME = 27-04-2017 10:46:03
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port A:1:11  Path :   1  SAS Address :   0x500000E01714D813

ID = 114
SEQUENCE NUMBER = 1338
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:11  Previous   =   Unconfigured Bad      Current   =   Unconfigured Good

ID = 247
SEQUENCE NUMBER = 1337
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1336
TIME = 27-04-2017 10:45:51
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:11

ID = 114
SEQUENCE NUMBER = 1335
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   Port B:1:12  Previous   =   Unconfigured Good      Current   =   Unconfigured Bad

ID = 248
SEQUENCE NUMBER = 1334
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   31

ID = 112
SEQUENCE NUMBER = 1333
TIME = 27-04-2017 10:45:21
LOCALIZED MESSAGE = Controller ID:  0   PD removed:   Port B:1:12

ID = 289
SEQUENCE NUMBER = 1332
TIME = 27-04-2017 10:45:20
LOCALIZED MESSAGE = Controller ID:  0  Redundant path broken   PD :   Port B:1:12  Path :   0  SAS Address :   0x500000E01714D812

ID = 247
SEQUENCE NUMBER = 1331
TIME = 27-04-2017 10:45:01
LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   31

ID = 91
SEQUENCE NUMBER = 1330
TIME = 27-04-2017 10:45:01
LOCALIZED MESSAGE = Controller ID:  0   PD inserted:   Port B:1:12

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 27 апр 2017, 09:16

pinkzebra писал(а):я так понимаю что проблема в экспандере? нужно его перешить?

Скорей его нужно менять, либо диски брать NL SAS.

gs: Сотрудник Тринити; Сообщения: 16650; Зарегистрирован: 23 авг 2002, 17:34; Откуда: Москва; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

gs » 27 апр 2017, 10:54

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: что за ошибка? Phy is bad on enclosure.

Сообщение

Stranger03 » 27 апр 2017, 11:25

gs писал(а):Я что-то сомневаюсь, что дело именно в типе интерфейса. Скорее в несовместимости конкретных саташников с контроллером.

Можно проверить, подключив напрямую без бекплейна,

Вернуться в «Массивы — Технические вопросы, решение проблем.»

Перейти

Серверы
↳ Серверы — Конфигурирование
↳ Конфигурации сервера для 1С
↳ Серверы — Решение проблем
↳ Серверы — ПО, Unix подобные системы
↳ Серверы — ПО, Windows система, приложения.
↳ Серверы — ПО, Базы Данных и их использование
↳ Серверы — FAQ
Дисковые массивы, RAID, SCSI, SAS, SATA, FC
↳ Массивы — RAID технологии.
↳ Массивы — Технические вопросы, решение проблем.
↳ Массивы — FAQ
Майнинг, плоттинг, фарминг (Добыча криптовалют)
↳ Proof Of Work
↳ Proof Of Space
Кластеры — вычислительные и отказоустойчивые ( SMP, vSMP, NUMA, GRID , NAS, SAN)
↳ Кластеры, Аппаратная часть
↳ Deep Learning и AI
↳ Кластеры, Программное обеспечение
↳ Кластеры, параллельные файловые системы
Медиа технологии, и цифровое ТВ, IPTV, DVB
↳ Станции видеомонтажа, графические системы, рендеринг.
↳ Видеонаблюдение
↳ Компоненты Digital TV решений
↳ Студийные системы, производство ТВ, Кино и рекламы
Инфраструктурное ПО и его лицензирование
↳ Виртуализация
↳ Облачные технологии
↳ Резервное копирования / Защита / Сохранение данных
Сетевые решения
↳ Сети — Вопросы конфигурирования сети
↳ Сети — Технические вопросы, решение проблем
Общие вопросы
↳ Обсуждение общих вопросов
↳ Приколы нашего IT городка
↳ Регистрация на форуме

Источник

Куратор(ы):

Автор

Сообщение

Добавлено: 05.09.2005 22:35

[профиль]

Member

Статус: Не в сети
Регистрация: 06.05.2005
Откуда: Moldova

ПРОСЯ О ПОМОЩИ, ВЫКЛАДЫВАЙТЕ S.M.A.R.T. ПРОБЛЕМНОГО НАКОПИТЕЛЯ!

<<Скриншоты>>

Связанные темы

[FAQ] Всё о винчестерах Western Digital
[FAQ] по винчестерам Seagate
[FAQ] Всё о винчестерах Hitachi
[FAQ] Всё о винчестерах Samsung

Восстановление данных
Ремонт HDD

Сигейт официально признал проблему с 7200.11

Полезные сообщения участников этой темы:

ShutUp — программа камрада CoolCMD для предотвращения частых парковок HDD.

https://disk.yandex.ru/d/x3UITAgo3EGqub

Программа считывает один сектор через определенный пользователем промежуток времени.

Учёт и поиск запчастей к жестким дискам — R.baza.

Последний раз редактировалось KT 29.11.2021 18:36, всего редактировалось 15 раз(а).

Реклама
Партнер

vensant_jarden

Member

Статус: Не в сети
Регистрация: 28.01.2016
Фото: 1

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Sinestery

Junior

Статус: Не в сети
Регистрация: 09.06.2018

Tomset писал(а):

Помер и смарт у него явно слетел.

А что конкретно не так?
Вот например полностью рабочий диск, такой же

img

Вложение:

2d.PNG [ 112.89 КБ | Просмотров: 769 ]

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Sinestery писал(а):

А что конкретно не так?

В том что чушь половина смарта отображает, возьмите современную прогу по чтению смарта.

kolyan1980-08-11

Member

Статус: Не в сети
Регистрация: 17.07.2011
Откуда: Нижний Новгород
Фото: 155

userID Я не утверждаю, но мне как-то раз помогла.

_________________
i5 9600k/PrimeZ390-A/4*8GB@3GHz/AsusTurboRTX3070/Samsung830(256GB)/M9PeGN(512GB)/6TB+6TB+4TBhdd/HAF932/RM650x(CP-9020091)/SonyKDL-42W705B/Win10insider

RuckusDJ

Junior

Статус: Не в сети
Регистрация: 26.10.2019
Откуда: Ангарск

Вложения:

Снимок.JPG [ 129.13 КБ | Просмотров: 662 ]

fixit

Member

Статус: Не в сети
Регистрация: 15.10.2014

RuckusDJ писал(а):

Диск смело можно выбрасывать?

Теперь remap под DOS

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Охлаждение ему организовать, он сейчас 52 греется, это очень плохо.

RuckusDJ

Junior

Статус: Не в сети
Регистрация: 26.10.2019
Откуда: Ангарск

Sania.
Это после форматирования я сделал скрин.
fixit
Ремап в виктории не помог.

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Не доводите диск до перегрева в любом случаи.
Ремапить пока там нечего по смарту.
Что вам там ещё не нравится?

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

Код:

Вложение:

ssd.png [ 37.62 КБ | Просмотров: 597 ]

Последний раз редактировалось 7Gluk7 20.01.2020 15:24, всего редактировалось 1 раз.

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7 писал(а):

Что можете посоветовать?

Очистить диск в нулину и проделать этот тест с другого диска.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

Очистить диск в нулину

Secure Erase — понимаю; а вот «в нулину» — это куда, зачем и кому?..

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

Там скорее удаление MBR хватит, но можно ещё чего, что придумает автор, главное пустой стал.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

удаление MBR хватит

А, так это оно самое

Sania. писал(а):

Очистить диск в нулину

и есть?
А я-то уж было задумался…

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

Sania. писал(а):

Очистить диск в нулину и проделать этот тест с другого диска.

AIDA64 вроде при тесте записи как раз нулями и заполняет?
Но попробую еще раз другой прогой.

O Smirnoff

Member

Статус: Не в сети
Регистрация: 11.11.2010
Откуда: Новосибирск

Sania. писал(а):

Добавлено спустя 46 секунд:

7Gluk7 писал(а):

AIDA64 вроде при тесте записи как раз нулями и заполняет?

Лучше всё-же Secure Erase.

_________________
С уважением,
Олег Р. Смирнов

Sania.

Member

Статус: Не в сети
Регистрация: 22.12.2012
Фото: 1

7Gluk7 писал(а):

AIDA64 вроде при тесте записи как раз нулями и заполняет?

Добавлено спустя 2 минуты 7 секунд:

O Smirnoff писал(а):

Да вот не » «, а пиши уже внятными терминами; а то словоблудием своим только людей с пути истинного сбиваешь…

Да так меньше приходится писать, нужно же выяснить подкованность спрашивающего.

7Gluk7

Junior

Статус: Не в сети
Регистрация: 25.12.2018
Откуда: Санкт-Петербург

O Smirnoff писал(а):

Лучше всё-же Secure Erase.

Попробую.

Sania. писал(а):

Я с LiveUSB, а винду пока на vhd переместил

Последний раз редактировалось 7Gluk7 20.01.2020 15:49, всего редактировалось 1 раз.

—

Кто сейчас на конференции

Сейчас этот форум просматривают: нет зарегистрированных пользователей и гости: 5

Лаборатория

Новости

Источник

I have a CRON job running on my TrueNAS that watches a few of the key SMART
parameters on my boot drives. The count on each of the following parameters:

168|SATA_Phy_Error_Count
218|CRC_Error_Count

incremented by 1 on each of July 13, 14, 15, 19 and 21.

The counts are not super high:
168|SATA_Phy_Error_Count|32
218|CRC_Error_Count|32

but I’m pretty sure some sort of preemptive maintenance is in order.

My boot pool is a mirror of two budget 120GB SSDs running off of
SATA ports on the Motherboard. I have the system database on the
boot pool since I want the system to be functional without the data
pool if I want to troubleshoot the system with the data pool drives
removed.

The drive showing the errors is a KINGSTON Model# SA400S37120G
(Smart Info at the end of this post.)

The other drive is older and is an HP S700 120GB SSD that seems to be
fine.

IIUC this could be a drive problem, a cable probem, a (Motherboard) SATA
Port problem or a powersupply problem.

My question is how to troubleshoot given the intermittent nature of the
problem. Any suggestions would be much appreciated.

DMESG entries pertaining to the fault.

Code:

(ada3:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d0 20 c3 7c 40 04 00 00 00 00 00
(ada3:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada3:ahcich5:0:0:0): Retrying command, 3 more tries remain

(ada3:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 30 08 cb b3 40 04 00 00 00 00 00
(ada3:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada3:ahcich5:0:0:0): Retrying command, 3 more tries remain

(ada3:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 20 9b ae 40 04 00 00 00 00 00
(ada3:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error

SMART Output for drive:





	
	
		
		
			smartctl -x /dev/ada3
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p14 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37120G
Serial Number:    REDACTED
LU WWN Device Id: 5 0026b7 782ea1dc1
Firmware Version: S3500102
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul 22 03:23:00 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (  120) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   000    -    100
  9 Power_On_Hours          -O--CK   100   100   000    -    21839
 12 Power_Cycle_Count       -O--CK   100   100   000    -    31
148 Unknown_Attribute       ------   100   100   000    -    0
149 Unknown_Attribute       ------   100   100   000    -    0
167 Write_Protect_Mode      ------   100   100   000    -    0
168 SATA_Phy_Error_Count    -O--C-   100   100   000    -    33
169 Bad_Block_Rate          ------   100   100   000    -    0
170 Bad_Blk_Ct_Erl/Lat      ------   100   100   010    -    0/0
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
173 MaxAvgErase_Ct          ------   100   100   000    -    0
181 Program_Fail_Count      -O--CK   100   100   000    -    0
182 Erase_Fail_Count        ------   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
192 Unsafe_Shutdown_Count   -O--C-   100   100   000    -    19
194 Temperature_Celsius     -O---K   044   062   000    -    44 (Min/Max 31/62)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
199 SATA_CRC_Error_Count    -O--CK   100   100   000    -    0
218 CRC_Error_Count         -O--CK   100   100   000    -    33
231 SSD_Life_Left           ------   090   090   000    -    90
233 Flash_Writes_GiB        -O--CK   100   100   000    -    8865
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    12051
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    2641
244 Average_Erase_Count     ------   100   100   000    -    202
245 Max_Erase_Count         ------   100   100   000    -    222
246 Total_Erase_Count       ------   100   100   000    -    40787
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xde       GPL     VS       8  Device vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 33 (device log contains only the most recent 4 errors)
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 33 [0] log entry is empty
Error 32 [3] log entry is empty
Error 31 [2] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 40 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d1 01 01 00 00 4f 00 c2 01 40 08     00:00:00.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  2f 00 00 01 01 00 00 00 00 00 03 40 08     00:00:00.000  READ LOG EXT
  2f 00 00 01 01 00 00 00 00 00 00 40 08     00:00:00.000  READ LOG EXT
  b0 00 d5 01 01 00 00 4f 00 c2 00 40 08     00:00:00.000  SMART READ LOG
  b0 00 da 00 00 00 00 4f 00 c2 00 40 08     00:00:00.000  SMART RETURN STATUS

Error 30 [1] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 40 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d1 01 01 00 00 4f 00 c2 01 40 08     00:00:00.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  2f 00 00 01 01 00 00 00 00 00 03 40 08     00:00:00.000  READ LOG EXT
  2f 00 00 01 01 00 00 00 00 00 00 40 08     00:00:00.000  READ LOG EXT
  b0 00 d5 01 01 00 00 4f 00 c2 00 40 08     00:00:00.000  SMART READ LOG
  b0 00 da 00 00 00 00 4f 00 c2 00 40 08     00:00:00.000  SMART RETURN STATUS

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21598         -
# 2  Extended offline    Completed without error       00%     21591         -
# 3  Extended offline    Completed without error       00%     20707         -
# 4  Extended offline    Completed without error       00%     18218         -
# 5  Extended offline    Completed without error       00%     13958         -
# 6  Extended offline    Completed without error       00%      7400         -
# 7  Extended offline    Completed without error       00%      6975         -
# 8  Extended offline    Completed without error       00%      1348         -
# 9  Extended offline    Completed without error       00%         0         -
#10  Short offline       Completed without error       00%         0         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              31  ---  Lifetime Power-On Resets
0x01  0x010  4           21839  ---  Power-on Hours
0x01  0x018  6      3799441290  ---  Logical Sectors Written
0x01  0x020  6         9010952  ---  Number of Write Commands
0x01  0x028  6      1245637882  ---  Logical Sectors Read
0x01  0x030  6         1333687  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1              22  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            7  Command failed due to ICRC error
0x0002  4            7  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4           18  Device-to-host register FISes sent due to a COMRESET

The count on each of the following parameters:

168|SATA_Phy_Error_Count
218|CRC_Error_Count

incremented by 1 on each of July 13, 14, 15, 19 and 21.

Are those linked to either the dates of SMART tests or scrubs?

The drive showing the errors is a KINGSTON Model# SA400S37120G
(Smart Info at the end of this post.)

The other drive is older and is an HP S700 120GB SSD that seems to be
fine.

IIUC this could be a drive problem, a cable probem, a (Motherboard) SATA
Port problem or a powersupply problem.

My question is how to troubleshoot given the intermittent nature of the
problem. Any suggestions would be much appreciated.

You also didn’t mention the 100 read errors reported by SMART… those are from the drive itself, so indicate some level of failure unrelated to cabling.

The CRC errors can be the controller on the drive, the cabling or the SATA controller, so as you say, hard to narrow down unless something obvious like a loose connection or burning smell from the controller chip.

I would generally treat the drive as untrustworthy and consider living with a single boot device (keeping config backups just in case).

Are those linked to either the dates of SMART tests or scrubs?

I don’t think so… no way of finding out. I don’t run regular schedule smart scans, so it’s not likely a smart test. As for scrubs, I know the system does one every few days…. not sure what the default config is set to, but from another report I’m pretty sure the last two issues were not during a scrub. The one on the 21 wasn’t for sure.

The report comes from a CRON job that I run daily that does a smrtctl -a, and compares a bunch of results with the ones from the previous day, and if they don’t match, it spits out a report showing the old/new value. I wrote the script to alert me to just this type of situation. I am not getting any alerts from TrueNAS — just the report I produce.

You also didn’t mention the 100 read errors reported by SMART… those are from the drive itself, so indicate some level of failure unrelated to cabling.

Sorry what 100 read errors???? What am I missing. Are you confusing «Raw_Read_Error_Rate» with read errors?

The CRC errors can be the controller on the drive, the cabling or the SATA controller, so as you say, hard to narrow down unless something obvious like a loose connection or burning smell from the controller chip.

I would generally treat the drive as untrustworthy and consider living with a single boot device (keeping config backups just in case).

I hadn’t though of the controller chip on the drive. I’ll keep an eye on it for now, and an eye out for a sale on a replacement drive. SSDs are pretty cheap… about what a good USB drive used to cost back in day. When I get a moment I will likely open the box an pull all the cables an reset them just in case the contacts have oxidized.

Sorry what 100 read errors???? What am I missing. Are you confusing «Raw_Read_Error_Rate» with read errors?

OK, so it’s not a count of read errors… but it’s also not OK…

That should be 120 (not 100) until something is wrong.

OK, so it’s not a count of read errors… but it’s also not OK…

That should be 120 (not 100) until something is wrong.

Thanks for the reply…. Great idea, wrong data sheet…. Different drives have slightly different interpretations.

I didn’t know Kingston published this info. AFAIK Western Digital Doesn’t, so I didn’t even think to look. I did some additional searching which lead me to a Smartmon Tools page:

https://www.smartmontools.org/ticket/801

which lead me to the correct datasheet.

https://media.kingston.com/support/downloads/MKP_521_Phison_SMART_attribute.pdf

Here are the descriptions for the drive in question:

001 Read Error Rate
Counts the number of uncorrectable errors that accumulate when controller
reads data from Flash and ECC events occur.

168 SATA PHY Error Count
Counts the number of SATA PHY errors. This value includes all PHY error
counts, ex data FIS CRC , code errors, disparity errors, command FIS crc.
Value clears upon system power-down.

218 CRC Error Count
Counts the number of CRC error (read/write data FIS CRC error).

I’m not sure what to think about Read Error Rate — IIUC as the drive wears out, there will be errors, and the drive «handles» them. Since the drive has 90% life left, I would think that there would have been a few errors — but I may well be wrong, and would welcome someone correcting me if I am.

Other than reset or change the cables, swap the drive, is there any meaningful troubleshooting to be done?

Источник

DISK

DISK Displays information about the disks in the system.

AGINGLIMIT=x|OFF

AUTOREASSIGN=ON|OFF

Allows the user to turn on or off whether bad blocks will be
reassigned when a medium error occurs on a healthy tier.
Default is ON.

CMD_TIMEOUT=x

DEFECTLIST[=tc]

DIAG[=tc]

FAIL[=tc]

FAST_FAIL=[ON|OFF]

This parameter turns on/off the fast fail mode for disks that are
slow to respond to data access commands. The fast fail parameters can
be customized to a particular need. Default is OFF.

FAST_FAIL_THRESHOLD=’num cmds’

This parameter indicates how many consecutive commands in the fast
fail algorithm must occur before failing the drive for this reason.
The default value is 5.
Valid range = 2 — 20.

FAST_FAIL_WINDOW_END=’t’

FAST_FAIL_WINDOW_START=’t’

INFO[=tc]

LIST[=SAS_ID|SPEED]

LLFORMAT[=tc]

MAXCMDS=x

MAXREADLEN=x

MAXWRITELEN=x

Sets the maximum write command length to the drives in KiB.
This parameter is provided for testing only and should normally not
be changed.
Range is 128 to 2048.
Default is 2048.

PLS[=[t][c]]

ERROR	SATA AAMUX PHY ERRORS Explanation
H-RX	Number of SATA FIS CRC errors received on the host port of the AAMUX
H-TX	Number of SATA R_ERR primitives received on the host port indicating a problem with the transmitter of the AAMUX
H-Link	Number of times the PHY has lost link on the host port
H-Disp	Number of frame errors for the host port of the AAMUX. These include: code error, disparity error, or realignment
O-RX	Number of SATA FIS CRC errors received on the other host port of the AAMUX
O-TX	Number of SATA R_ERR primitives received on the other host port indicating a problem with the transmitter if the AAMUX
O-Link	Number of times the PHY has lost link on the other host port
O-Disp	Number of frame errors for the other host port of the AAMUX. These include: code error, disparity error, or realignment
D-RX	Number of SATA FIS CRC errors received on the device port of the AAMUX
D-TX	Number of SATA R_ERR primitives received on the device port indicating a problem with the transmitter of the AAMUX
D-Link	Number of times the PHY has lost link on the device port
D-Disp	Number of frame errors for the device port of the AAMUX. These include: code error, disparity error, or realignment

Error	SAS PHY ERRORS Explanation
InvDW	Invalid DWORD Count — The number of invalid dwords received outside of the PHY reset sequence.
RunDis	Running disparity Count — The number of dwords containing running disparity errors received outside of the PHY reset sequence.
LDWSYN	Loss of DWORD synchronization count — The number of times the PHY has lost synchronization and the link reset sequence.
PHYRES	PHY Reset Problem count — The number of times the PHY reset sequence has failed.

The disk is specified by its physical tier and channel locations,
‘tc’,where:

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

PMBIT=ON|OFF

When ON this parameter sets the PM (performance mode) bit in Seagate
SAS drives mode pages. When OFF the Seagate drive uses its default
performance mode settings.
Default is OFF.

QUARANTINE

Displays the of number quarantine events on this controller for each
disk in the system. Only tiers with quarantine counts will be
displayed.
Use QUARANTINECLEAR to reset the quarantine counts.

QUARANTINE=[ON|OFF]

Enables/disables the disk quarantine feature for all of the disks. A
disk cannot be quarantined unless FASTAV is enabled for the LUN.
Default is OFF.

QUARANTINECLEAR

Resets the quarantine counts for all of the disks.

QUARANTINECMDLIMIT=x

QUARANTINETIMEOUT=x

REASSIGN[=tc] [0xh

REBUILD[=tc]|ALL

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be rebuilt using the ALL parameter.

REBUILDNOJOURNAL[=tc]|ALL

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be rebuilt using the ALL parameter.

REBUILDVERIFY=ON|OFF

Default is OFF.

REPLACE[=tc]

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHP>.

(Note that spare disks themselves cannot be replaced with this
command).

RESTART[=tc]

‘t’ indicates the tier in the range <1..128>, and
‘c’ indicates the channel in the range <ABCDEFGHPS>.

All failed and replaced disks can be restart using the ALL parameter.

SCAN

STATUS

Displays the loop status of each disk channel and a count of the
fibre channel errors encountered on each channel.

STATUSCLEAR

Resets the fibre channel error counts on each disk channel.

TIMEOUT=x

Recommended value for SAS drives is 27 seconds.
Recommended value for SATA drives is 60 seconds.

Default is 60 seconds.

WRITESAME=ON|OFF

Default is OFF.

Источник

Кто сейчас на конференции

Лаборатория

Новости

что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

DISK

что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Re: что за ошибка? Phy is bad on enclosure.

Кто сейчас на конференции

Лаборатория

Новости

DISK

Не пропустите эти материалы по теме: