Firebird ошибка 10054 - Не ошибается лишь тот, кто ничего не делает!

Firebird log file (firebid.log) can contain a lot of various messages, here you can find the list of most frequent of them, with the explanation of the error.

INET/inet_error: read errno = 10054

Short: Software caused connection abort.

The disconnect of the client from the server. If error text contains with (Client), it means that the client application lost its connection to the server and wrote down this fact to the log.
If error text contains (Server), it means that server lost the connection to the client and reported it to the firebird.log.
The usual reason of 10054 error is an unstable connection, for example, weak Wi-Fi.
Also, it is possible to see this error if a client application doesn’t explicitly close the database connection, i.e., there is no explicit command like «MyDB.Active:=false» on closing the software.

INET/inet_error: read errno = 104

Short: Software caused connection abort.
The same as 10054, but on Linux.

WNET/wnet_error: ReadFile end-of-file errno = 109

In short: Software caused connection abort.

The same as 10054, but this error occurs when client application uses the WNET connection path to the Firebird server instance on Windows, something like this:
\serverpathdatabase.fdb
This is not recommended, better use TCP/IP connections for network connections (in the format server:pathdatabase.fdb or, on Firebird 3, inet://servername:pathdatabase.fdb), and XNET for local connections (local path on 2.5 and xnet://pathdatabase.fdb).
Consider to disable WNET connections, look here how to disable connection protocols for Firebird on Windows.

INET/inet_error: send errno = 10053 (on Windows)
or INET/inet_error: send errno = 103 (on Linux)

Also means broken connection, but WinSock error is 10053.

INET/inet_error: connect errno = 10060 (Windows)
or INET/inet_error: connect errno = 10061 (Windows)

In short: 10061 — Connection refused, 10060 — Connection timed out

In general, this error means that it is not possible to establish a connection between the server and client application.

In case of this error with (Client), It means that the client application tried to connect to Firebird through network connection string, but failed, either Firebird server is not running, or access closed by a firewall.

More details about common Winsock errors is here.

Источник

/en/articles/how-to-check-ram-and-avoid-database-corruptions/Alexey Kovyazin, 31-03-2014
Below is the description of common errors and problems in InterBase/Firebird databases and their recovery chances.

To get exact recovery price and time please contact us via email.

For approximate pricing please see «Firebird and InterBase Recovery» service description. There is no 100% warranty that described errors exactly correspond to the described reasons.

Internal gds software consistency check (cannot find tip page (165))

Database cannot be opened using Firebird or InterBase engine, and the following message appears: Internal gds software consistency check (cannot find tip page (165))

Abnormal shutdown or physical database file corruption. Transaction inventory page has been lost (TIP). Corruption area can vary from several pages to the whole database, so additional investigation needed.

The most probable reasons are abnormal server shutdown (using Reset button), wrong backup approach or backup tools. On Windows XP such corruption can be caused by «System Restore» feature for «gdb» files.

First, the database should be scanned with FirstAID Diagnostician. If FirstAID does not warn about serious corruption, corruption can be fixed with the full version of FirstAID.
In the case of serious corruption, the custom recovery needed.

99%

Database file appears corrupt. Wrong page type. Page NNN is of wrong type (expected X, found Y)

Error message appeared in standard output or in firebird.log or interbase.log:
Database file appears corrupt. Wrong page type. Page NNN is of wrong type (expected X, found Y)

Due to the physical corruption or another reason, the sequence of database file pages has been changed, or wrong values appeared on pointer pages or index root pages, etc.

The most probable reasons are abnormal server shutdown (using Reset button), wrong backup procedure or wrong backup tools/approach.

95%.

Unknown database I/O error for file «*.gdb». Error while trying to read from file

The database cannot be open, and the following error message appears: Unknown database I/O error for file «*.gdb». Error while trying to read from the file.

Due to the abnormal server shutdown, the most recent database pages were not written to the disk.

Custom recovery process. Database is checked by gfix and backup/restore.

95%

Decompression overran buffer

Error message appears: Internal gds software consistency check (decompression overran buffer (179))

It is a serious database corruption: system tables could be damaged. Sometimes this error occurs after database transfer to the new server/computer. Investigation needed.

Database structure analysis, generation of new pages, several iterations needed.

95%

Wrong record length

An error message appears: Internal gds software consistency check. Wrong record length

Most often «Wrong record length» error are caused by bad RAM. We strongly recommend checking memory (RAM) at the server.

Locate and delete wrong records using IBSurgeon’s low-level tools. Several iterations needed.

97%

Database file appears corrupt. Bad checksum

Database file appears corrupt. Bad checksum. Checksum error on database page XX.

Bad RAM. We strongly recommend checking memory (RAM) at the server.

Custom recovery process. Several iterations needed.

99%

Cannot find record back version

The database seems to be working, but gbak cannot complete backup.
Error text:
Internal gds software consistency check (cannot find record back version (291))
gds_$receive failed. Exiting before completion due to errors. internal gds software consistency check (can’t continue after bug check).

Most probable reason is wrong transaction management. Transactions’ performance investigation.

The database requires detailed analysis, and usually the solution is to find and delete problem database objects and then recreate them. Sometimes it is necessary to transfer data to the new database.

99%

Next transaction older than oldest active transaction

Internal gds software consistency check (next transaction older than oldest active transaction (266))

This seldom error occurs in InterBase 4.x-5.x, it’s a bug.

Custom recovery process

99%

Corrupted header

The database cannot be opened and Firebird/InterBase does not consider it as a valid database.

Physical corruption, HDD crash.

Custom recovery process

80%

Database file size exceeds implementation limit

It happens on InterBase 4.x-5.x servers and early Firebird (0.9.x) betas. The database cannot be opened, database file size is 4Gb.

Implementation limit of InterBase 4.x-5.x-6.0.x, and early Firebird 0.9.x.

Custom recovery process.

Usually, we can save all data (i.e., 100%), but sometimes it can be less than 70%.

Conversion error from string

Error text: Conversion error from string «XXX».

Preliminary diagnosis is impossible, on-site investigation needed.

Custom recovery process.

99%

INET/inet_error: read errno = 10054 or 10038 or 10093

Multiple entries in firebird.log or interbase.log with errors 10054, 10038, 10093, etc.

These errors are caused by network problems — check your hubs, network adapters, etc. It is not a Firebird/InterBase error itself, but it may impact Firebird/InterBase.

We offer FBScanner tool to solve «10054 errors» problem (among other issues). See details here.

Not applicable.

Partner index description not found (175))

Error messages text: internal gds software consistency check (partner index description not found (175))
Missed index for a primary or foreign key.

It may be caused by physical corruption or internal server bugs.

Custom recovery process.

100%

Other errors

Below there is a list of seldom Firebird/InterBase errors, which can be caused by different reasons. Do not hesitate to send us the description of your problem — we can help you.

Wrong UDF may cause the following errors in interbase.log:
SCH_validate — not entered
SCH_validate — wrong thread

Index corruption may cause the following message in interbase.log:
Page 34672 is an orphan

And this error can occur during intensive inserts/update/delete during the single transaction:
internal gds software consistency check (Too many savepoints (287))

It is hard to recognize the reason without investigation of database in case of the following errors:
internal gds software consistency check (error during savepoint backout (290))
internal gds Software consistency check (size of opt block exceeded (286))
internal gds software consistency check (invalid SEND request (167))

Different reasons. We need to investigate corrupted database.

Custom recovery process.

Various

Источник

Commented by: @mrotteveel

The problem can only be fixed by the client application closing the connection correctly (assuming the problem is not in a connection library not closing the connection properly).

In any case, error 10054 (connection reset by peer) is generally not a severe error, so if the only problem you have is that this error is logged, you could just ignore it, although it would be better of course if the client correctly closes the connection.

Источник

Using KEEPALIVE-sockets to avoid 10054 errors

by Vasiliy Ovchinnikov

Introduction

In the systems within InterBase or Firebird databases, which are intended for working in either real-time or near-real-time modes, there is a problem of client connection status tracking on the server side, and of forced disconnection in case the client becomes inaccessible due to connection release.

It is important to promptly release the resources busy with such phantom connections, especially when using servers with Classic architecture. If some users connect to the server through an unstable modem connection, then the risk of disconnection becomes rather high.

For instance, a client saves a modified record set, and after UPDATE is executed (while COMMIT is not) the connection is released.

As a rule, client applications in such situations reconnect to the server, but the client (as he/she continues working with the data, after saving which one received error message due to connection fail) will be unable to save changes, since he/she will receive a message about lockout conflict (”lock conflict on update”). The previous connection, which opened the transaction (in the context of which UPDATE was executed, while COMMIT wasn’t), still holds these records.

Connection failures may occur in a local network too, if the hardware (netcards, hubs, commutators) is out of order or not adapted well, and/or due to clutter in the network. In Interbase and Firebird logs, failures of tcp connections are displayed as error 10054 in Windows and 104 in Unix; netbeui failures are displayed as 108/109 errors.

Hung connections control methods

In InterBase and Firebird, the mechanisms of DUMMY-packets or KEEPALIVE-sockets are used for tracking and disabling of such “dead” connections.

In InterBase 5.0 and higher, the mechanism of DUMMY-packets is implemented at the application layer between an InterBase/ Firebird server and a gds32/fbclient client library. It is included in ibconfig/ firebird. conf and is not examined in the present article.

Note

As we know from previous experience, stability of the dummy-packet mechanism (the one implemented in InterBase 5.0 and repeatedly corrected in Firebird 1.5.x) strongly depends on server’s and client’s operating systems, tcp stack versions, and many other conditions. That is to say, effectiveness of such system in a real network tends to zero.

KEEPALIVE-sockets are a more interesting mechanism. Implemented in InterBase 6.0 and higher, it is intended for connection failure tracking. KEEPALIVE is enabled by setting the SO_KEEPALIVE socket option at the opening. There’s no need to manually set it if you use Firebird 1.5 or higher, since it is implemented in the program code of the Firebird server, both for Classic, and for Superserver.

For Interbase and Firebird versions lower than 1.5, in the variant with Classic architecture, an additional setting is necessary. This setting is described below.

In this case, the operating system TCP stack (instead of the Firebird server) becomes responsible for connection status. However, to enable this mechanism, one must adjust KEEPALIVE parameters.

KEEPALIVE description

KEEPALIVE-sockets behavior is controlled by the parameter presented in the following table.

Parameter	Description
KEEPALIVE_TIME	Time interval, on expiry of which KEEPALIVE-probes start
KEEPALIVE_INTERVAL	Time interval between KEEPALIVE-probes
KEEPALIVE_PROBES	Number of KEEPALIVE-probes

The TCP stack tracks the moment when packets stop transmit between the client and the server, by launching the KEEPALIVE timer. As soon as the timer reaches the KEEPALIVE_TIME point, the server TCP stack would execute the first KEEPALIVE probe. Probe is an empty packet with ACK flag sent to a user. If everything is alright on the client side, then the TCP stack on client side sends a response packet with ACK flag, and the server TCP stack resets the KEEPALIVE timer as soon as it receives a response.

If the client does not response to the probe, the probes from the server continue to be sent. Their quantity equals to the KEEPALIVE_PROBES value; they are executed at the KEEPALIVE_INTERVAL time interval. If the client does not respond to the last probe, then after another KEEPALIVE_INTERVAL time expires, the operating system TCP stack closes the connection, and the server (in this case, instance of InterBase or Firebird server) releases all resources busy with provision of this connection.

Thus, a failed client connection will be closed after the following time interval:

KEEPALIVE_TIME+ ( KEEPALIVE_PROBES+1)* KEEPALIVE_INTERVAL.

By default, the parameters values are rather big, and this makes use of them ineffective. For example, the default value of KEEPALIVE_TIME parameter is “2 hours,” both in Linux and in Windows. Actually, 1-2 minutes would be enough to make a decision about forced disconnection of an inaccessible client. On the other hand, KEEPALIVE default settings sometimes cause forced disconnections in Windows networks, which are stay inactive during these 2 hours (of course, one may cast doubt on necessity of such connections in the applications, but this is a different matter).

Below adjustment of these parameters for Windows and Linux operating systems is described.

Setting KEEPALIVE in Linux

KEEPALIVE parameters in Linux can be changed either by file system direct editing / proc, or by calling sysctl.

For the first case, the following lines should be edited:

/proc/sys/net/ipv4/tcp_keepalive_time
/proc/sys/net/ipv4/tcp_keepalive_intvl
/proc/sys/net/ipv4/tcp_keepalive_probes

For the second case, the following commands should be executed:

sysctl -w net.ipv4.tcp_keepalive_time=value
sysctl -w net.ipv4.tcp_keepalive_intvl=value
sysctl -w net.ipv4.tcp_keepalive_probes=value

Time value is expressed in seconds.

For automatic setting of these parameters in case of server restarting, add the following should be added:

net.ipv4.tcp_keepalive_intvl = value
net.ipv4.tcp_keepalive_time = value
net.ipv4.tcp_keepalive_probes = value

Substitute the <value> word with necessary values.

If you use version of Firebird Classic lower than 1.5, then in /etc/xinet.d/firebird the following should be added:

FLAGS=REUSE KEEPALIVE

Adjusting KEEPALIVE in Windows 95/98/ME

HKEY_ LOCAL_ MACHINE System CurrentControlSet Services VxD MSTCP

Everything about adjustment of TCP can be found here:

http://support.microsoft.com/default.aspx?scid=kb;en-us;158474

Parameters:

KeepAliveTime = milliseconds

Type: DWORD

For Windows 98, type STRING.

Defines connection inactivity time in milliseconds.

When it expires, KEEPALIVE-probes start executing.

Default value is 2 hours (7200000).
KeepAliveInterval = 32-digit value

Type: DWORD

For Windows 98, STRING type.

Defines time between KEEPALIVE-probes (in milliseconds).

As soon as the specified KeepAliveTime interval expires,

after each KeepAliveInterval time (in milliseconds)

KEEPALIVE-probes are sent with maximum number

of MaxDataRetries. If no response comes, the connection

closes. Default value is 1 second (1000).
MaxDataRetries = 32-digit value

Type: STRING

Defines maximum number of KEEPALIVE-probes.

Default value is 5.

Setting KEEPALIVE in Windows 2000/NT/XP

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters.

Everything about TCP adjustment:

2000/ NT: http://support.microsoft.com/kb/120642

XP: http://support.microsoft.com/kb/314053

The MaxDataRetries parameter is substituted by TCPMaxDataRetransmissions.

All other parameters have the same names as in Windows 9x

Setting KEEPALIVE in Windows (for clients)

This setting is optional, but it possibly will reduce number of messages about connection failure if one uses unreliable communications channels. Insert to the register branch:

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters

parameter DisableDHCPMediaSense=1. See a description of this parameter here:

http://support.microsoft.com/?scid =kb%3Bru%3B239924&x=13&y=14

Example

Let’s consider adjustment of Firebird SQL Server 1.5.2 CS under Linux OS.

Make sure that the DUMMY-packets mechanism is disabled in firebird.conf

(the parameter is commented-out)

……………..

#DummyPacketsInterval=0

…………….
Make sure there is the /etc/xinet.d/firebird configuration file

We kept everything unchanged, as it was registered during installation. Nothing needs to be added.

Change the TCP stack parameters:

sysctl -w net.ipv4.tcp_keepalive_time = 15
sysctl -w net.ipv4.tcp_keepalive_intvl = 10
sysctl -w net.ipv4.tcp_keepalive_probes = 5

Connect to any database on the server from any network client
Check traffic on the server using any packet filter.

If parameters specified as /proc/sys/net/tcp_ keepalive_*, within 15 seconds after everything stops in the channel, the server creates a probe. If the client is “alive,” the server receives a response packet. 15 seconds after that, checking repeats, and so on.
If a client is physically turned off (either the multiplexer or the modem unexpectedly turns off — anything is possible), then the server does not receive a response, and the server begins to send probes with 10 seconds interval. If the client does not respond to the fifth probe, then 10 seconds after that, the server process discharges, and releases resources and blockings lockouts. If the client gives any signals and responses at least to the fifth probe (if worst comes to worst), then, after another 15 seconds time-out, the server will begin send probes. And so on.

Guidelines

In conclusion, we would like to give you some advice about how KEEPALIVE values should be selected.

Firstly, determine necessary value of KEEPALIVE_TIME. The more the value is, the later KEEPALIVE-probes would start. If you constantly see 10054/104 errors in the log of the server, and you have to delete them manually, it is recommended to increase the KEEPALIVE_TIME value.

Secondly, the values of the KEEPALIVE_INTERVAL and KEEPALIVE_PROBES should meet your needs concerning before-the-fact release of already hung connections. If your users connect to the server through unreliable channels, then you probably would want to increase number of probes and the interval between them, in order to give the user a chance to detect the failure and reconnect to the server. In case clients use a DSL connection to the Internet, or access a SQL-server through a local network, it is possible to decrease the interval between KEEPALIVE-probes.

General recommendations: if you for no particular reason receive from the clients many error messages, concerning results saving, due to lockout conflict (i.e. there are no concurrent connections working with the same data), then you need to increase system’s reaction to the hung connections release. Practically, the KEEPALIVE_TIME value may be above or equal 1 min. You should yourself estimate the time the longest transaction executes, so that traffic would not be overloaded by KEEPALIVE-checks of normally working connections, which launched long transactions. The KEEPALIVE_INTERVAL value is above or equal 10 seconds, and the KEEPALIVE_PROBES value is above or equal 5 checks. When many users work simultaneously, remember that if you perform checking too frequently, it may considerably increase network traffic.

Also remember that in case your users actively change common data, lockout errors will occur as a result of opti- mum situation. In this case, you would need a correct lockout error handling in the client applications. At the same time, the application should be able to minimize occurrence of such errors.

Examples of default configuration

Finally, here are some more examples of default configurations. Downtime is the time, within which users will be unable to update data, (which by that moment were updated by the transaction opened by the hung connection). Total time is the time, on the expiry of which the hung connection will be closed.

Clients use modem connections; most of transactions in the system are short; downtime is limited by 3 minutes:
```
KEEPALIVE_TIME 1 minutes
KEEPALIVE_PROBES 3
KEEPALIVE_INTERVAL 30 seconds
TOTAL 3 minutes
```
Clients use LAN connection; most of transactions in the system are short; downtime is limited by 2 minutes:
```
KEEPALIVE_TIME 30 sec
KEEPALIVE_PROBES 5
KEEPALIVE_INTERVAL 10 sec
TOTAL 90 seconds
```
Clients use any connections; downtime is not regulated:
```
KEEPALIVE_TIME12 minutes
KEEPALIVE_PROBES 7
KEEPALIVE_INTERVAL 15 sec
TOTAL 14 minutes
```

We hope that the examples we have shown would be enough for correct adjustment of TCP stack KEEPALIVE mechanism.

Источник

Модераторы: kdv, dimitr

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

INET/inet_error: send errno = 10054

P4 Mon Feb 14 17:35:07 2005
SERVER/process_packet: broken port, server exiting

P4 Mon Feb 14 17:35:07 2005
INET/inet_error: send errno = 10054

P4 Mon Feb 14 17:35:07 2005
SERVER/process_packet: broken port, server exiting

клиентское приложение виснет, на сервере горит индикатор жосткого диска сплошным красным. Приходится через некоторое время срывать программу и подключаться по новой.

Недавно сменил архитектуру с СС на классик версия 1,52
вин2003. В программе изменений никаких не делал.
Может ли это быть из-за того что создал новый индекс во время работы клиентов?

Merlin: Динозавр IB/FB; Сообщения: 1502; Зарегистрирован: 27 окт 2004, 11:44

Сообщение

Merlin » 14 фев 2005, 19:54

Events используются?
Других ошибок в логе нет?
Есть привязка к какому-то конкретному действию в приложении?

10054 — это сервер заметил, что издох клиент, не более того. Шнурок оборвал, ресет нажал и т.п. В сочетании с events возникал всякий разный гемор, но вроде ЦПУ жрал, а не диск. Несколько раз провозглашалось, что наконец обнаружено и уборото, но проверка в поле всегда права. Ужор диска при слабом потреблении процессора может говорить о сборке горы мусора. Само создание индекса при клиентах безвредно, но появление нового индекса может вести к изменению планов выполнения каких-либо запросов, у которых раньше возможности его использовать не было, иной раз к фатально неудачному. Это в плане привязки к действиям. Ну и ещё — если он жутко неселективный, это может способствовать образованию горы мусора при массированных удалениях и апдейтах.

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

Сообщение

DSKalugin » 14 фев 2005, 20:18

Эвентов не использую, приложения как уже говорил действительно срываю «снять задачу» из за беспробудного висяка

насчет мусора точно подметил
я поставил свипинтервал в 0 и вызываю
gfix -sweep каждую ночь по расписанию
попытка установить обратно автосборку

«C:Program FilesFirebird152bingfix.exe» -housekeeping 30000 -user «SYSDBA» -password «masterkey» C:ShopDBU96.GDB
говорит unavailable database

как это понимать?

Merlin: Динозавр IB/FB; Сообщения: 1502; Зарегистрирован: 27 окт 2004, 11:44

Сообщение

Merlin » 14 фев 2005, 20:39

DSKalugin писал(а):Эвентов не использую, приложения как уже говорил действительно срываю «снять задачу» из за беспробудного висяка

Вот тут сервак и пишет 10054. На классике можно бить «зависшие» процессы, если можешь из определить, но риск повредить базу есть, особенно если этот процесс как раз сборкой и занят.

DSKalugin писал(а):
я поставил свипинтервал в 0 и вызываю
gfix -sweep каждую ночь по расписанию
попытка установить обратно автосборку

Не надо. Если мои подозрения верны, то будет только хуже. Индекс, который создал, и его статистику в студию.

DSKalugin писал(а):
«C:Program FilesFirebird152bingfix.exe» -housekeeping 30000 -user «SYSDBA» -password «masterkey» C:ShopDBU96.GDB
говорит unavailable database

как это понимать?

да не знаю я ваших виндовых заморочек

kdv: Forum Admin; Сообщения: 6595; Зарегистрирован: 25 окт 2004, 18:07

Сообщение

kdv » 14 фев 2005, 20:55

насчет unavailable database — это мистическое сообщение лично меня уже достало. ибо у меня на W2000 не воспроизводится, а у людей случается как на FB так и на IB 7.x.

насчет 10054 и т.п. — есть один четкий случай умирания IB 7.1 SP2 на Win2003 Server, с похожими симптомами. Сначала все ОК, потом начинаются 10054, и кончается все это 10093 и неработой IB. На Win2000 все замечательно с тем же IB и тем же приложением. Интересно было бы услышать, не имеет ли кто аналогичных проблем на W2003 + FB 1.5.2, ибо подозрения на какую-то очередную несовместимость с tcp.

Merlin: Динозавр IB/FB; Сообщения: 1502; Зарегистрирован: 27 окт 2004, 11:44

Сообщение

Merlin » 14 фев 2005, 22:01

kdv писал(а):
насчет 10054 и т.п. — есть один четкий случай умирания IB 7.1 SP2 на Win2003 Server, с похожими симптомами.

Дим, 10054 он похоже устраивает сам, когда «зависшее» клиентское приложение снимает. А насчёт зависания — помнишь, я рассказывал, как один орёл у меня с похмела на таблице с 10 миллионами записей организовал индекс по полю «дебет/кредит»? Удалить из такой таблицы при наличии такого индекса тысяч 200 — и свип на сутки Причём, если сделать его уникальным, привесив какой-нибудь ID сзаду, то так в глаза бросаться не будет, но запросы с условиями на первый индекс начнут его хватать и перестраивать привычные планы, включая порядок следования таблиц в inner-ах, со всеми вытекающими последствиями. То есть вместо привычных сотых долей секунды можно получить минут 10. А потом начать валить приложения и усугублять происходящее

Дмитрий: Сообщения: 127; Зарегистрирован: 26 окт 2004, 11:05

Сообщение

Дмитрий » 15 фев 2005, 09:23

У меня на NT 4.0 c IB 7.5 ошибка 10054 лезет постоянно, причем пишет, что с разных компов. Приложение не виснет, коннекты никто не рвет. То же самое, если законекчен один только IBExpert. Такая же фигня была и с IB 6.5, и с IB 5.6.

dimitr: Разработчик Firebird; Сообщения: 888; Зарегистрирован: 26 окт 2004, 16:20

Сообщение

dimitr » 15 фев 2005, 11:53

Unavailable database в данном случае вполне понятно — надо указывать хост в gfix, ибо win32-классик поддерживает локальный протокол только начиная с 2.0.

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

Сообщение

DSKalugin » 15 фев 2005, 12:14

По порядку теперь:
1-в прошлый четверг сменил сервер с Fb SS 1.5.2 на Fb CS 1.5.2
Причину перехода я тут обосновал (из-за глюка в УДФ на одной базе перегрузился ФБ и отключились аварийно другие БД) Хотел сделать независимые процессы. Но в результате эти отдельные стали работать медленней чем на СС, иногда притормаживать.

2-свипинтервал поставил в 0. Сборку мусора делал ежедневно по расписанию gfix -sweep

3-вчера вечером для ускорения выполнения разовой процедуры создал 5 индексов на работающей БД. Процедура занималась массовым обновлением в пределах 3 тыс записей и удалением «лишних».
Сразу после этого начались длительные висяки. Работать не возможно.
Сегодня сутра тоже висяки были железные. Работать не возможно. Загадки да и только. ИБЭксперт подключаться не захотел. Говорит

Unsuccessful execution caused by an unavailable resource.
unavailable database.

-ИБАналист тоже подобным образом ругнулся
-локальный gfix тоже не хочет выполнять ни одно действие (unavailable database).
-Зато удаленно получилось положить базу в даун и подключиться ибэкспертом.
-FirstAID протестировал и сказал все ок, единственное 5 удаленных индексов показал.

Вобщем решил задачу так. Вернул все на место
перегрузил ВинСерв2003 — не помогло.

-Удаленно ибэкспертом удалил эти злополучные индексы.
-сделал бэкап
-деинсталлировал Fb CS
-установил по новой но уже SS
-провелил БД gfix.exe -v -full
на экране ничего в логе нашол потом
P4 (Client) Tue Feb 15 10:27:22 2005
Control services error 1061
-вернул свипинтервал в 30000 (перед нулем было 20000)
-всех включил, жалоб нет на тормоза.

что это было, так и не понял. Но тяга к экспериментам пропала.

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

и еще вопрос

Сообщение

DSKalugin » 15 фев 2005, 12:31

dimitr писал(а):Unavailable database в данном случае вполне понятно — надо указывать хост в gfix, ибо win32-классик поддерживает локальный протокол только начиная с 2.0.

Опа, а я напрямую писал типа C:… без локалхост. В Супере работало наура. Не знал. Спасибо.

Напрашивается еще один вопрос. Не знаю в тему ЛИ? но задам.
Читал

Не надо логиниться к одной базе с разными путями
В этом случае очень вероятно повреждение базы вплоть до ее полного уничтожения. Т.е. не надо использовать link-и на файлы и каталоги БД в unix, и не надо ошибаться и под win писать путь коннекта как c:dirdata.gdb вместо правильного c:dirdata.gdb.
этот совет не относится к разным именам одного и того же сервера в строке коннекта.
http://www.ibase.ru/devinfo/dontdoit.htm

Вопрос : Является ли разными путями подключение
— напрямую через IBObject с указанием P4:C:ShopDBU96.gdb
— через FIBPlus но с использованием алиаса P4:ShopsDB
где в alias.conf четко прописано
ShopsDB = C:ShopDBU96.gdb
А?

kdv: Forum Admin; Сообщения: 6595; Зарегистрирован: 25 окт 2004, 18:07

Сообщение

kdv » 15 фев 2005, 13:38

DSKalugin писал(а):По порядку теперь:
независимые процессы. Но в результате эти отдельные стали работать медленней чем на СС, иногда притормаживать.

а может, надо было сначала выяснить, почему притормаживает? Может ты задал такой кэш в БД, который для классика смерти подобен.

2-свипинтервал поставил в 0. Сборку мусора делал ежедневно по расписанию gfix -sweep

накопление мусора можно увидеть просмотром статистики. Просто так конечно можно свип в 0 установить, но …

Сразу после этого начались длительные висяки. Работать не возможно.

ну вот. сборка мусора в индексах.

-ИБАналист тоже подобным образом ругнулся

забей ты на локальный протокол. что, нельзя указать соединение как localhost:c:dirdata.gdb???

-вернул свипинтервал в 30000 (перед нулем было 20000)

шаманим…

что это было, так и не понял. Но тяга к экспериментам пропала.

желания почитать хелп к IBAnalyst и www.ibase.ru/devinfo/delmany.htm, а также firebird.conf не появилось?[/quote]

dimitr: Разработчик Firebird; Сообщения: 888; Зарегистрирован: 26 окт 2004, 16:20

Сообщение

dimitr » 15 фев 2005, 14:05

Тормоза наверняка из-за кооперативной сборки мусора вместо привычной на супере фоновой… Ну и кеш наверняка влияет.

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

Сообщение

DSKalugin » 15 фев 2005, 14:11

читал и хелп и конфиг. Желание учиться, разбираться всегда было и есть. Но вот со временем — попа . Быстрее было вернуть все к исходному стабильному состоянию.
С хелпом проще, там для людей писано, а доку по конфигу под силу осмыслить только профФфессуре, глубоко ежедневно копающейся в недрах кода Firebird. Общем для людей писать надо, а не для киборгов.
Конфиг у меня по умолчанию как стал так я и не трогал его.
Про Локалхост, согласен. Не придавал значения. буду теперь знать.
Спасибо за поддержку

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

Сообщение

DSKalugin » 15 фев 2005, 14:17

dimitr писал(а):Тормоза наверняка из-за кооперативной сборки мусора вместо привычной на супере фоновой… Ну и кеш наверняка влияет.

Кстати, тезка, растолкуй… Возможно ли попадание классика в ситуацию, когда 2 и более процесса начинают собирать мусор в одной и тойже базе? Они меж собой не будут конфлктовать или такое не возможно? А как насчет работы в период сбора ?

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

Сообщение

DSKalugin » 15 фев 2005, 14:20

а по поводу
Не надо логиниться к одной базе с разными путями см выше
может ктонить прояснить?

kdv: Forum Admin; Сообщения: 6595; Зарегистрирован: 25 окт 2004, 18:07

Сообщение

kdv » 15 фев 2005, 14:46

если не умеешь указать для одного файла БД разный путь, то лучше не надо не знаешь — спишь спокойно

dimitr: Разработчик Firebird; Сообщения: 888; Зарегистрирован: 26 окт 2004, 16:20

Сообщение

dimitr » 15 фев 2005, 15:02

1) Дока по конфигу написано вполне понятно
2) Собирать мусор могут и два процесса классика, никакого конфликта тут нет
3) Алиас — не есть другой пусть к базе

DSKalugin: Сообщения: 212; Зарегистрирован: 27 окт 2004, 13:39

А вот и развязка!!!

Сообщение

DSKalugin » 15 фев 2005, 19:07

Ура! Причина тормозов разгадана!

Когда я переустанавливал Firebird, я поставил его в другой каталог.
А скрипты подправить забыл, сволачь я ) И себе и вам и юзерам неудобства создал…
Поэтому запланированная сборка мусора из скрипта не происходила!
А Автоматическую я отключил. Ну плюс добавление индексов, массовые insert/update/delete — полный висяк.

Неучел особенность локального подключения с localhost впереди пути
Поэтому не мог достучаться к базе утилитами

Спасибо за внимание. Вопрос объявляю закрытым

kdv: Forum Admin; Сообщения: 6595; Зарегистрирован: 25 окт 2004, 18:07

Сообщение

kdv » 15 фев 2005, 22:13

ну так ёклмн. ты тут не первый день. взял бы IBAnalyst, почитал как правильно собирать статистику, зарядил бы, посмотрел, ужаснулся…

andycat: Сообщения: 65; Зарегистрирован: 22 фев 2005, 12:06

Сообщение

andycat » 03 мар 2005, 15:04

В продолжение этой темы и «IB 6.5 Server + W2KSP2 Помогите, плиз, чайнику»:
Вопрос: обязательно/желательно ли ставить IB сервер на отдельную машину.
Проблема продолжается та-же, но реже — ошибки 10053 и 10054 частенько и 2-5 раз в сутки требуется перезапуск IB сервера.
Решал по форуму и докам: обновил все клиентские машины до последних патчей, отключил 2 старые машины на Вынь98 от сервера, поставил ibconsvc на сервер, убрал все не нужное. По логу не получается отследить ошибку — какая-то плавающая: может произойти когда кто-то лезет в 1С или интернет и т.д., от конкретного клиента не зависит, исправил ibconfig:connection_timeout и dummy_packet_intervel — ничего не помогает. Причем если выключить совсем все компьютеры в сети и оставить 3 (основных рабочих включая сервер) и работать только в Interbase программе — все отлично.
Поэтому и вопрос: насколько остальные сетевые программы (The Bat, 1C, EServ/аналог WinGate/ + некоторые MS офисные документы) на сервере сбивают IB сервер?

Источник

INET/inet_error: read errno = 10054

INET/inet_error: read errno = 104

WNET/wnet_error: ReadFile end-of-file errno = 109

INET/inet_error: send errno = 10053 (on Windows) or INET/inet_error: send errno = 103 (on Linux)

INET/inet_error: connect errno = 10060 (Windows) or INET/inet_error: connect errno = 10061 (Windows)

Internal gds software consistency check (cannot find tip page (165))

Database file appears corrupt. Wrong page type. Page NNN is of wrong type (expected X, found Y)

Unknown database I/O error for file «*.gdb». Error while trying to read from file

Decompression overran buffer

Wrong record length

Database file appears corrupt. Bad checksum

Cannot find record back version

Next transaction older than oldest active transaction

Corrupted header

Database file size exceeds implementation limit

Conversion error from string

INET/inet_error: read errno = 10054 or 10038 or 10093

Partner index description not found (175))

Other errors

Using KEEPALIVE-sockets to avoid 10054 errors

Introduction

Hung connections control methods

KEEPALIVE description

Setting KEEPALIVE in Linux

Adjusting KEEPALIVE in Windows 95/98/ME

Setting KEEPALIVE in Windows 2000/NT/XP

Setting KEEPALIVE in Windows (for clients)

Guidelines

Examples of default configuration

INET/inet_error: send errno = 10054

и еще вопрос

А вот и развязка!!!

Не пропустите эти материалы по теме:

INET/inet_error: send errno = 10053 (on Windows)
or INET/inet_error: send errno = 103 (on Linux)

INET/inet_error: connect errno = 10060 (Windows)
or INET/inet_error: connect errno = 10061 (Windows)