Проблемы с listener ORA-12535: TNS:operation timed out в Oracle 9i
Одной из причин таких ошибок в часы пик может быть то, что листенер всего-навсего не успевает обработать поступающие запросы на подключение.
Например, когда одновременно запускаются очень много клиентских джобов, которые все в один момент обрушиваются на листенер, листенер успевает обработать только первые поступившие запросы, а остальные падают с ошибкой ORA-12535.
Для борьбы с ними можно попробовать следующие варианты:
1) Параметр INBOUND_CONNECT_TIMEOUT
Можно попробовать установить параметр в файле listener.ora: INBOUND_CONNECT_TIMEOUT_
= 600
Этот параметр задает кол-во секунд, в течение которых должна быть завершена обработка запроса клиента. Если за указанное время листенер не успевает обработать запрос клиента на подключение, он выдает ошибку ORA-12535 и обрывает соединение с клиентом.
Это аналог параметра CONNECT_TIMEOUT, который является устаревшим в версии 9i.
Еще можно попробовать в файле sqlnet.ora добавить параметр:
SQLNET.INBOUND_CONNECT_TIMEOUT = 700
2) Параметр QUEUESIZEС помощью этого параметра можно установить кол-во запросов, которые может обрабатывать листенер одновременно.
LISTENER_TESTDB =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(Host = myhost)(Port = 2483)(QUEUESIZE=200))
)
SID_LIST_LISTENER_TESTDB =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = TESTDB)
(ORACLE_HOME = /testdb/u01/app/oracle/product/9.2.0.8.0)
(SID_NAME = TESTDB)
)
)
3) Поднятие дополнительных листенеров
Установка параметров INBOUND_CONNECT_TIMEOUT и QUEUESIZE не всегда может помочь, так как листенер работает с той же скоростью, а те запросы, которые ожидают подключения к базе, всего-навсего будут дольше удерживаться, пока не рассосется очередь. Но если очередь все-таки не рассосется за указанное время, клиенты опять упадут с ошибкой ORA-12535.
Чтоб решить проблему с производительностью раз и навсегда можно поднять дополнительные листенеры и настроить tns на стороне клиента, чтоб, если не отвечает первый листенер, запрос шел на второй, третий листенеры.
И так, на сервере поднимаем 2 листенера на 2 разных портах 2483 и 2484.
В файле listener.ora прописываем:
LISTENER_TESTDB =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(Host = myhost)(Port = 2484))
)
SID_LIST_LISTENER_TESTDB =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = TESTDB)
(ORACLE_HOME = /testdb/u01/app/oracle/product/9.2.0.8.0)
(SID_NAME = TESTDB)
)
)
LISTENER_TESTDB2 =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(Host = myhost)(Port = 2483))
)
SID_LIST_LISTENER_TESTDB2 =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = TESTDB2)
(ORACLE_HOME = /testdb/u01/app/oracle/product/9.2.0.8.0)
(SID_NAME = TESTDB2)
)
)
Запускаем листенеры:
oracle@myhost $ lsnrctl
LSNRCTL> start LISTENER_TESTDB
LSNRCTL> start LISTENER_TESTDB2
Теперь нужно на стороне клиента прописать в tnsnames.ora:
TESTDB =
(DESCRIPTION =
(ADDRESS_LIST =
(FAILOVER = ON)
(LOAD_BALANCE = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = myhost)(PORT = 2483))
(ADDRESS = (PROTOCOL = TCP)(HOST = myhost)(PORT = 2484))
)
(CONNECT_DATA = (SERVICE_NAME = TESTDB))
)
Теперь можно в $ORACLE_HOME/network/log посмотреть размеры лог файлов каждого листенера. Они будут расти по мере обработки запросов на подключение.
oracle@myhost $ ls -la *.log
These days everything goes to the cloud or it has been collocated somewhere in a shared infrastructure. In this post I’ll talk about sessions being disconnected from your databases, firewalls and dead connection detection.
Changes
We moved number of 11g databases from one data centre to another.
Symptoms
Now probably many of you have seen the following error in your database alertlog «TNS-12535: TNS:operation timed out» or if you haven’t you will definitely see it some day.
Consider the following error from database alert log:
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 12-MAR-2015 10:28:08
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Operation timed out
nt secondary err code: 110
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831))
Thu Mar 12 10:28:09 2015
Now this error indicate timing issues between the server and the client. It’s important to mention that those errors are RESULTANT, they are informational and not the actual cause of the disconnect. Although this error might happen for number of reasons it is commonly associated with firewalls or slow networks.
Troubleshooting
The best way to understand what’s happening is to build a histogram of the duration of the sessions. In particular we want to understand whether disconnects are sporadic and random or they follow a specific pattern.
To do so you need to parse the listener log and locate the following line from the above example:
(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831))
Since the port is random you might not get same record or if you do it might be days apart.
Here’s what I found in the listener:
12-MAR-2015 08:16:52 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831)) * establish * ORCL * 0
In other words — at 8:16 the user scott established connection from host 192.168.0.10.
Now if you compare both records you’ll get the duration of the session:
Established: 12-MAR-2015 08:16:52
Disconnected: Thu Mar 12 10:28:09 2015
Here are couple of other examples:
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=20620))
Thu Mar 12 10:31:20 2015
listener.log:
12-MAR-2015 08:20:04 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=20620)) * establish * ORCL * 0
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=48157))
Thu Mar 12 10:37:51 2015
listener.log:
12-MAR-2015 08:26:36 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=48157)) * establish * ORCL * 0
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.11)(PORT=42618))
Tue Mar 10 19:09:09 2015
listener.log
10-MAR-2015 16:57:54 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=root))(SERVICE_NAME=ORCL1)(SERVER=DEDICATED)) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.11)(PORT=42618)) * establish * ORCL1 * 0
As you may have noticed the errors follow very strict pattern — each one gets disconnect exactly 2hrs 11mins after it has been established.
Cause
Given the repetitive behaviour of the issue and that it happened for multiple databases and application servers we can conclude that’s definitely a firewall issue.
The firewall recognizes the TCP protocol and keeps a record of established connections and it also recognizes TCP connection closure packets (TCP FIN type packet). However sometimes the client may abruptly end communication without closing the end points properly by sending FIN packet in which case the firewall will not know that the end-points will no longer use the opened channel. To resolve this problem firewall imposes a BLACKOUT on those connections that stay idle for a predefined amount of time.
The only issues with BLACKOUT is that neither or the sides will be notified.
In our case the firewall will disconnect IDLE sessions after around 2hrs of inactivity.
Solution
The solution for database server is to use Dead Connection Detection (DCD) feature. DCD detects when a connection has terminated unexpectedly and flags the dead session so PMON can release the resources associated with it.
DCD sets a timer when a session is initiated and when the timer expires SQL*Net on the server sends a small 10 bytes probe packet to the client to make sure connection is still active. If the client has terminated unexpectedly the server will get an error and the connection will be closed and the associated resources will be released. If the connection is still active then the probe packet is discarded and the timer is reset.
To enable DCD you need to set SQLNET.EXPIRE_TIME in sqlnet.ora of you RDBMS home!
cat >> $ORACLE_HOME/network/admin/sqlnet.ora
SQLNET.EXPIRE_TIME=10
This will set the timer to 10 minutes. Remember that sessions need to reconnect for the change to take place it won’t work for existing connections.
Firewalls become smarter and they can now inspect packages even deeper. Make sure the following settings are also disabled:
- SQLNet fixup protocol
- Deep Packet Inspection (DPI)
- SQLNet packet inspection
- SQL Fixup
I had similar issue with Dataguard already, read more here:
Smart Firewalls
How to test Dead Connection Detection
You might want to test or make sure DCD really works. You’ve got multiple options here — Oracle SQL client trace, Oracle SQL Server Trace, Sniff the network with packet analyzer OR using strace to trace the server process. I used strace since I had access to the database server and it was non intrusive.
-
Establish a connection to the database through SQL*Net
-
Find the processes number for your session:
SQL> select SPID from v$process where ADDR in (select PADDR from v$session where username='SVE');
SPID
------------------------
62761
- Trace the process:
[oracle@dbsrv ~]$ strace -tt -f -p 62761
Process 62761 attached - interrupt to quit
11:36:58.158348 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:36:58.158485 rt_sigprocmask(SIG_BLOCK, [], NULL, 8) = 0
....
11:46:58.240065 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:46:58.240211 rt_sigprocmask(SIG_BLOCK, [], NULL, 8) = 0
...
11:46:58.331063 write(20, "n620", 10) = 10
...
What I did was to attach to the process, simulate some activity at 11:36 and then leave the session IDLE. Then 10 minutes later the server process sent an empty packet to the client to check if the connection is still alive.
Conclusion
Errors in alertlog disappeared after I enabled the DCD.
Make sure to enable DCD if you host your databases in a shared infrastructure or there are firewalls between your database and application servers.
References
How to Check if Dead Connection Detection (DCD) is Enabled in 9i ,10g and 11g (Doc ID 395505.1)
Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (Doc ID 1628949.1)
Resolving Problems with Connection Idle Timeout With Firewall (Doc ID 257650.1)
Dead Connection Detection (DCD) Explained (Doc ID 151972.1)
ORA-12535 is the same as TNS-12535, they all indicate that the connection between the client and the listener has been timed out.
First, let’s see how we reproduce TNS-12535.
C:Usersed>tnsping compdb
TNS Ping Utility for 64-bit Windows: Version 12.1.0.1.0 - Production on 22-JUL-2014 19:03:21
Copyright (c) 1997, 2013, Oracle. All rights reserved.
Used parameter files:
C:oracleappclientedproduct12.1.0client_1networkadminsqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = primary01)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = compdb)))
TNS-12535: TNS:operation timed out
Causes
TNS-12535 means the connect identifier can be found in tnsnames.ora and the hostname is resolvable by DNS or /etc/hosts, but somehow network problems make the connection timed out.
There could be several causes of TNS-12535 that I listed below.
- The hostname in the connect descriptor of compdb points to a wrong IP address.
- The port specified in the connect descriptor of compdb points to a wrong one.
- The port specified in the connect descriptor of compdb is not opened on firewall of the database server.
- DNS or /etc/hosts points to a wrong IP address while resolving domain name.
- Any other anti-virus software on clients.
- The server has been shutdown, there’s no way to go.
Solutions
- Inspect the content of tnsnames.ora closely once again to make sure there’s no any typo or mistake.
- Check your DNS, /etc/hosts or C:WindowsSystem32driversetchosts file to make sure the IP mapping is correct. Additionally, make sure the format of IP mapping is correct.
- Open the port 1521.
- For iptables, you may refer to: Open Port 1521 on Linux 6 for Database Server.
- For firewalld, you may refer to: Open Port 1521 on Linux 7 for Database Server.
- For Windows platform, you may refer to: How to Open Port 1521 on Windows Server.
- Revert any changes of security policy on network appliances including IPS, IDS and firewall. I have ever met a case related to IPS.
- If the server is down, then recover server’s availability.
- Usually, TNS-12535 accompanies with ORA-3136 (or ORA-03136), you may consider to raise timeout period on the server.
For more connection troubleshooting, you may refer to Oracle 19c Net Services Administrator’s Guide: 15 Testing Connections.
Пользователи не могут подключиться к базе. Обычно при этом они получают ошибки: ORA-12547: TNS:lost contact или ORA-12637: Packet receive failed.
В sqlnet.log на сервере сообщения об ошибке ORA-12170: TNS:Connect timeout. Еще для версий 10g и выше, в alert.log могут быть сообщения WARNING: inbound connection timed out (ORA-3136).
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for 64-bit Windows: Version 10.2.0.4.0 - Production
Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 10.2.0.4.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 10.2.0.4.0 - Production
Time: 13-FEB-2013 13:47:12
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.77.116.47)(PORT=2971))
***********************************************************************
Для разных ОС, параметр ‘nt secondary err code‘ может быть разным
For the Solaris system: nt secondary err code: 145:
ETIMEDOUT 145 /* Connection timed out */
For the Linux operating system: nt secondary err code: 110
ETIMEDOUT 110 Connection timed out
For the HP-UX system: nt secondary err code: 238:
ETIMEDOUT 238 /* Connection timed out */
For AIX: nt secondary err code: 78:
ETIMEDOUT 78 /* Connection timed out */
For Windows based platforms: nt secondary err code: 60 (which translates to Winsock Error: 10060)
Выдержка из документации
ORA-12170: TNS:Connect timeout occurred
Cause: The server shut down because connection establishment or communication with a client failed to complete within the allotted time interval. This may be a result of network or system delays; or this may indicate that a malicious client is trying to cause a Denial of Service attack on the server.
Action: If the error occurred because of a slow network or system, reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT, SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values. If a malicious client is suspected, use the address in sqlnet.log to identify the source and restrict access. Note that logged addresses may not be reliable as they can be forged (e.g. in TCP/IP).
———————————————————————————
Смысл этой ошибки в том что соединение не может быть установлено в течение отведенного интервала времени. А вот причин по которым это происходит может быть великое множество.
Как видно, основная рекомендация — увеличить параметры SQLNET.INBOUND_CONNECT_TIMEOUT, SQLNET.SEND_TIMEOUT и SQLNET.RECV_TIMEOUT. Можно попробовать сделать это, но это может не помочь. Поэтому лучше попробовать разобраться в корне проблемы.
Несколько основных причин ошибки и способы их решения
1) Серверные ресурсы перегружены.
Проверить насколько загружен сервер (процессор, диски, сеть). Выявить причину утечки ресурсов и устранить её. Большая загрузка сети может косвенно указывать на DoS. Если вы обнаружили высокую нагрузку сервера, но она оказалась полезной — то это указывает на нехватку мощности сервера и пора задуматься об его обновлении или замене.
©Bobrovsky Dmitry
2) Проблемы сети.
©Bobrovsky Dmitry
Если есть возможность проверить сетевой канал от клиента до сервера с помощью специализированых аппаратных или программных средств. Если нет, то можно запуститесь ping на некоторое время и посмотреть, какое время отклика и нет ли потерь пакетов.
ping server1 -t
3) База данных и Listener не функционируют.
Проверить что сама база данных и Listener запущены и работают нормально, что к ним можно подключиться локально или с других компьютеров сети.
Dmitry Bobrovsky
4) Проблемы из-за антивируса или firewall.
Dmitry Bobrovsky
Отключить или даже полностью деинсталировать антивирус или firewall.
5) Проблемы с DNS.
Либо прописать соответствующие записи в файл host либо во всех конфигурационных файлах oracle net использовать вместо имен — ip-адреса.
Запись fatal ni connect error 12170 ns main err code 12535 впервые появилась Dmitry Bobrovsky Blog
These days everything goes to the cloud or it has been collocated somewhere in a shared infrastructure. In this post I’ll talk about sessions being disconnected from your databases, firewalls and dead connection detection.
Changes
We moved number of 11g databases from one data centre to another.
Symptoms
Now probably many of you have seen the following error in your database alertlog «TNS-12535: TNS:operation timed out» or if you haven’t you will definitely see it some day.
Consider the following error from database alert log:
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 12-MAR-2015 10:28:08
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Operation timed out
nt secondary err code: 110
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831))
Thu Mar 12 10:28:09 2015
Now this error indicate timing issues between the server and the client. It’s important to mention that those errors are RESULTANT, they are informational and not the actual cause of the disconnect. Although this error might happen for number of reasons it is commonly associated with firewalls or slow networks.
Troubleshooting
The best way to understand what’s happening is to build a histogram of the duration of the sessions. In particular we want to understand whether disconnects are sporadic and random or they follow a specific pattern.
To do so you need to parse the listener log and locate the following line from the above example:
(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831))
Since the port is random you might not get same record or if you do it might be days apart.
Here’s what I found in the listener:
12-MAR-2015 08:16:52 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=49831)) * establish * ORCL * 0
In other words — at 8:16 the user scott established connection from host 192.168.0.10.
Now if you compare both records you’ll get the duration of the session:
Established: 12-MAR-2015 08:16:52
Disconnected: Thu Mar 12 10:28:09 2015
Here are couple of other examples:
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=20620))
Thu Mar 12 10:31:20 2015
listener.log:
12-MAR-2015 08:20:04 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=20620)) * establish * ORCL * 0
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=48157))
Thu Mar 12 10:37:51 2015
listener.log:
12-MAR-2015 08:26:36 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORCL)(CID=(PROGRAM=app)(HOST=apps01)(USER=scott))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.10)(PORT=48157)) * establish * ORCL * 0
alertlog:
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.11)(PORT=42618))
Tue Mar 10 19:09:09 2015
listener.log
10-MAR-2015 16:57:54 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=root))(SERVICE_NAME=ORCL1)(SERVER=DEDICATED)) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.11)(PORT=42618)) * establish * ORCL1 * 0
As you may have noticed the errors follow very strict pattern — each one gets disconnect exactly 2hrs 11mins after it has been established.
Cause
Given the repetitive behaviour of the issue and that it happened for multiple databases and application servers we can conclude that’s definitely a firewall issue.
The firewall recognizes the TCP protocol and keeps a record of established connections and it also recognizes TCP connection closure packets (TCP FIN type packet). However sometimes the client may abruptly end communication without closing the end points properly by sending FIN packet in which case the firewall will not know that the end-points will no longer use the opened channel. To resolve this problem firewall imposes a BLACKOUT on those connections that stay idle for a predefined amount of time.
The only issues with BLACKOUT is that neither or the sides will be notified.
In our case the firewall will disconnect IDLE sessions after around 2hrs of inactivity.
Solution
The solution for database server is to use Dead Connection Detection (DCD) feature. DCD detects when a connection has terminated unexpectedly and flags the dead session so PMON can release the resources associated with it.
DCD sets a timer when a session is initiated and when the timer expires SQL*Net on the server sends a small 10 bytes probe packet to the client to make sure connection is still active. If the client has terminated unexpectedly the server will get an error and the connection will be closed and the associated resources will be released. If the connection is still active then the probe packet is discarded and the timer is reset.
To enable DCD you need to set SQLNET.EXPIRE_TIME in sqlnet.ora of you RDBMS home!
cat >> $ORACLE_HOME/network/admin/sqlnet.ora
SQLNET.EXPIRE_TIME=10
This will set the timer to 10 minutes. Remember that sessions need to reconnect for the change to take place it won’t work for existing connections.
Firewalls become smarter and they can now inspect packages even deeper. Make sure the following settings are also disabled:
- SQLNet fixup protocol
- Deep Packet Inspection (DPI)
- SQLNet packet inspection
- SQL Fixup
I had similar issue with Dataguard already, read more here:
Smart Firewalls
How to test Dead Connection Detection
You might want to test or make sure DCD really works. You’ve got multiple options here — Oracle SQL client trace, Oracle SQL Server Trace, Sniff the network with packet analyzer OR using strace to trace the server process. I used strace since I had access to the database server and it was non intrusive.
-
Establish a connection to the database through SQL*Net
-
Find the processes number for your session:
SQL> select SPID from v$process where ADDR in (select PADDR from v$session where username='SVE');
SPID
------------------------
62761
- Trace the process:
[oracle@dbsrv ~]$ strace -tt -f -p 62761
Process 62761 attached - interrupt to quit
11:36:58.158348 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:36:58.158485 rt_sigprocmask(SIG_BLOCK, [], NULL, 8) = 0
....
11:46:58.240065 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:46:58.240211 rt_sigprocmask(SIG_BLOCK, [], NULL, 8) = 0
...
11:46:58.331063 write(20, "n620", 10) = 10
...
What I did was to attach to the process, simulate some activity at 11:36 and then leave the session IDLE. Then 10 minutes later the server process sent an empty packet to the client to check if the connection is still alive.
Conclusion
Errors in alertlog disappeared after I enabled the DCD.
Make sure to enable DCD if you host your databases in a shared infrastructure or there are firewalls between your database and application servers.
References
How to Check if Dead Connection Detection (DCD) is Enabled in 9i ,10g and 11g (Doc ID 395505.1)
Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (Doc ID 1628949.1)
Resolving Problems with Connection Idle Timeout With Firewall (Doc ID 257650.1)
Dead Connection Detection (DCD) Explained (Doc ID 151972.1)
My goal is to connect to an Oracle 9i instance from my OS X machine. I’ve followed the setup instructions here and got through them with no errors (eventually). However, I’m finding that sqlplus is unable to connect:
[ ethan@gir ~ ]$ sqlplus xxx/yyy@zzz
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Apr 17 10:13:08 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Looooong wait…
ERROR:
ORA-12170: TNS:Connect timeout occurred
Enter user-name: xxx
Enter password:
ERROR:
ORA-12162: TNS:net service name is incorrectly specified
Enter user-name:
My tnsnames.ora
file…
zzz =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = dbhost)
(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = zzz)
)
)
Maybe there’s an env variable that needs to be set?
UPDATE
Able to ping DB host machine no problem.
Tried…
sqlplus xxx/yyy@//dbhost/zzz
Got…
ERROR:
ORA-12170: TNS:Connect timeout occurred
Tried using SID
instead of SERVICE_NAME
in tnsnames.ora. Did not seem to change the result. Reverted back to SERVICE_NAME
.
Last couple entries in sqlnet.log…
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for MacOS X Server: Version 10.2.0.4.0 - Production
TCP/IP NT Protocol Adapter for MacOS X Server: Version 10.2.0.4.0 - Production
Time: 17-APR-2009 10:33:06
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: Message 12535 not found; No message file for product=network, facility=TNS
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Message 505 not found; No message file for product=network, facility=TNS
nt secondary err code: 60
nt OS err code: 0
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for MacOS X Server: Version 10.2.0.4.0 - Production
TCP/IP NT Protocol Adapter for MacOS X Server: Version 10.2.0.4.0 - Production
Time: 17-APR-2009 11:24:08
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: Message 12535 not found; No message file for product=network, facility=TNS
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Message 505 not found; No message file for product=network, facility=TNS
nt secondary err code: 60
nt OS err code: 0
PARTIAL ANSWER
Thanks everyone for your answers. They were helpful. I found that there was a DNS issue. I was able to ping by hostname, so thought that should work fine. I also tried I.P. address. Turned out that I needed the internal «10.1.x.x» I.P. address for it to work on this OS X machine (but hostname is fine on Windows).
At this point, I can connect with…
sqlplus xxx/yyy@//INTERNAL_IP/zzz
However, with those values entered into tnsnames.ora, this still doesn’t work…
sqlplus xxx/yyy@zzz
…
ORA-12154: TNS:could not resolve the connect identifier specified
I searched for a sample tnsnames.ora file that was close to what I needed and copied the contents into my file. Changed the params and now everything works. Not sure why mine wasn’t working.