学习:DB Hang AWR High event 'block change tracking buffer space'

1.1现象

学习一个朋友遇到的一个DB hang的情况,创建问题时段AWR报告后,发现非常高的block change tracking buffer space 等待事件;

1.2问题分析

High Waits On 'block change tracking buffer space' - Checkpoint Contention With BLOCK CHANGE TRACKING or RMAN Incremental Backup (Doc ID 2094946.1

1)情况一,符合如下预期的情况下,建议就是加大块改变跟踪buffer的大小

1) Checkpoint gets stuck with the CKPT process waiting for CTWR. CKPT waits for "rdbms ipc reply" from CTWR (see p1 value)
Other possible symptoms:
Checkpoint not completing.
"block change tracking buffer space" wait events seen in the instance where CKPT is stuck.
CTWR is mostly showing normal I/O wait events.
GV$ views hang waiting for 'DFS lock handle' with id1=0xa.
"reliable message" waits on the "obj broadcast channel"

Cause
Using block change tracking and the number of blocks changed has increased.
Changes
CTWR is not keeping up with the number of buffers changed.

Solution
There are several workarounds to improve the throughput of CTWR. Increasing the 'buffer space' should be tried first.
Namely increasing parameters:
_bct_public_dba_buffer_size
_bct_buffer_allocation_max

# In one case we set these as high
as _bct_public_dba_buffer_size=100000000
and _bct_buffer_allocation_max=1G

Capture the output from the following sql:
select event, state, count(*) from v$session_wait group by event, state order by 3 desc;
Check if the event column contains something like “rdbms ipc reply” which has a STATE like “WAITING”. If so, check the BCT status:
select status FROM v$block_change_tracking;
You might encounter the following wait event “rdbms ipc reply”.

a. To avoid the "block change tracking buffer space" wait event, we suggest to set the following hidden parameters, as shown here.

_bct_public_dba_buffer_size = total size of all public change tracking dba buffers, in bytes
_bct_buffer_allocation_max = maximum size of all change tracking buffer allocations, in bytes


b. To determine the block change tracking memory buffer, issue the following query.

SQL> select dba_buffer_count_public*dba_entry_count_public*dba_entry_size from X$KRCSTAT;


c. We recommend to set the following hidden parameters, as shown here.

set _bct_public_dba_buffer_size with 2 times the value in item# a
set _bct_buffer_allocation_max with four times the value in item#a above.

SQL> Alter system set "_bct_buffer_allocation_max"=1073741824 scope=both sid='*';
System altered.
To check the value after the above command, execute:


set lines 132
column Parameter format a60
column "Session Value" format a30
column "Instance Value" format a30

select a.ksppinm "Parameter",
b.ksppstvl "Session Value",
c.ksppstvl "Instance Value"
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx and a.indx != 0
and a.ksppinm like '%&name%';

To impromve CTWR performance,
to improve CTWR performance.
If you increase these parameters, make sure you have large_pool_size configured large enough to accommodate these entries.

The following query will indicate the amount of memory that is currently allocated to the change tracking buffers (_bct_public_dba_buffer_size):
select dba_buffer_count_public*dba_entry_count_public*dba_entry_size from X$KRCSTAT;


建议就是设置三个参数:1.large_pool_size 要足够大,change tracking buffers是从large_pool分配的空间;
2.# In one case we set these as high
as _bct_public_dba_buffer_size=100000000
3.and _bct_buffer_allocation_max=1G
NOTE:  1G is the maximum value which can be allocated to "_bct_public_dba_buffer_size".

2)情况二,RMAN增量备份时或IO性能不好时出现大量event

2) RMAN Incremental Backup High Waits On "block change tracking buffer space"
When executing RMAN incremental backup we can see high waits on event "block change tracking buffer space", which are affecting database performance.
Changes
Block Change Tracking enabled
Cause
This wait event indicates that there is contention for space in the CTWR dba buffer.
If contention for space in the CTWR dba buffer occurs, the wait event called, 'block change tracking buffer space' is recorded.
The two possible causes for this wait event are:
1. Poor I/O performance on the disk where the change-tracking file resides.
2. The CTWR dba buffer is too small to record the number of concurrent block changes.
If this happens too often the performance of Backups and/or the entire database can suffer.

 
Solution
1. Review the location of the change-tracking file to ensure that it's not located on disks with other heavily used or "HOT" files. For example,
do not locate the change-tracking file on the same disk with your redo or archive files.
2. Consider increasing the value of Large_pool_size 3. Increase the the hidden parameter "_bct_public_dba_buffer_size" to a larger value. Capture the output from the following sql: select event, state, count(*) from v$session_wait group by event, state order by 3 desc; Check if the event column contains something like “Block Change Tracking Buffer” which have a STATE like “WAITING”. If so, check the BCT status: select status FROM v$block_change_tracking; If the value is ‘enabled’ you might encounter the following wait event “Block Change Tracking Buffer”. a. Review the location of the change-tracking file to ensure that it's not located on disks with other heavily used or "HOT" files. For example,
do not locate the change-tracking file on the same disk with your redo or archive files.
b. Consider increasing the value of Large_pool_size c. Increase the the hidden parameter "_bct_public_dba_buffer_size" to a larger value. The following query will indicate the amount of memory that is currently allocated to the change tracking buffers: select dba_buffer_count_public*dba_entry_count_public*dba_entry_size from X$KRCSTAT; Take that value, multiply it by 2, and use the new value for the parameter "_bct_public_dba_buffer_size". This should reduce the amount
of waits on the change tracking buffer event. NOTE: 1G is the maximum value which can be allocated to "_bct_public_dba_buffer_size". Reference BUG:11888729 - BLOCK CHANGE TRACKING BUFFER WAIT EVENT - HIGH CTWR DBA BUFFER UTILIZATION

1.3问题分析

注意:DBA朋友遇到的就是在业务高峰期,备份软件执行增量备份,消耗大量的IO带宽,后续导致IO性能效率严重下降

  出现大量log file parallel write 写入慢=>引发大量log file sync等待事件=>引发大量业务IO效率 buffer busy waits 等;  

                                                              =>引发大量block change tracking buffer space,加剧了数据库资源的争用,DB性能受到比较严重的影响。

解决方法:1.尽量在夜间执行增量备份操作;

                   2.有条件配置DG,在备库执行备份操作;

       注意DG备份,如果可以脚本执行,DG备份的控制文件恢复时使用如下方式

                       RMAN> restore primary controlfile from '/data/rman/ctl_bak_20190527.bak';

 --不支持语法BACKUP CURRENT CONTROLFILE FOR PRIMARY,因此DG备份的控制文件是属于standby 角色,作为恢复需要转换为主库角色,否则咋恢复DB让业务使用!

参考http://blog.itpub.net/30126024/viewspace-2128413/

                  3.上述建议均无法执行,增加内存区域,减少备份导致的等待时机增加,最终影响数据库性能。