我有一个批量脚本,共计4条sql,每天执行一次。
症状:有30%的发生死锁,且99%的几率被锁在第二条sql中,一直等待,最长超过40个小时,人工force application以后重跑正常执行,请高手帮忙诊断是什么原因,不胜感激。
正常情况下该脚本的执行时间如下:
2011-09-17 186秒
2011-09-18 172秒
2011-09-19 189秒
2011-09-20 182秒
经常被锁的sql:
我有一个批量脚本,共计4条sql,每天执行一次。
症状:有30%的发生死锁,且99%的几率被锁在第二条sql中,一直等待,最长超过40个小时,人工force application以后重跑正常执行,请高手帮忙诊断是什么原因,不胜感激。
正常情况下该脚本的执行时间如下:
2011-09-17 186秒
2011-09-18 172秒
2011-09-19 189秒
2011-09-20 182秒
经常被锁的sql:
insert into gzcrm.kpi_grid_d_102_usys_values_ds
select '2011-09-21' as op_time,
coalesce(c.reg_city_id, 0) as cityid,
coalesce(c.reg_county_id, 0) as countyid,
coalesce(c.reg_region_id, '0') as reg_region_id,
coalesce(c.reg_grid_id, c.reg_region_id ) as reg_grid_id,
coalesce(d.bass1_value, 0),
410012 as kpi_id,
count(a.user_id)
from (select user_id
from gzcrm.brk_d_psn_td_ds
where test_mark <> 1
and op_time = '2011-09-21') a
left join gzcrm.brk_d_psn_base_ds b on a.user_id = b.user_id
inner join gzcrm.brk_d_psn_asc_dt c on a.user_id = c.user_id
left join (select dw_value, bass1_value
from gzcrm.map_pub_brand
group by dw_value, bass1_value) d on b.brand_id =
d.dw_value
group by coalesce(c.reg_city_id, 0),
coalesce(c.reg_county_id, 0),
coalesce(c.reg_region_id, '0'),
coalesce(c.reg_grid_id, c.reg_region_id ),
coalesce(d.bass1_value, 0) with ur
被锁的表brk_d_psn_td_ds,brk_d_psn_base_ds,brk_d_psn_asc_dt 都建对字段user_id建立了索引。
限于字数的原因,用db2 get snapshot for locks on dbname
获得锁的信息描述在图片中
被锁的表brk_d_psn_td_ds,brk_d_psn_base_ds,brk_d_psn_asc_dt 都建对字段user_id建立了索引。
用db2 get snapshot for locks on dbname查看到的锁信息如下:
Application handle = 39427
Application ID = 135.10.44.138.43758.11092201183
Sequence number = 00004
Application name = tclsh
CONNECT Authorization ID = GZCRM
Application status = Pending remote request
Status change time = Not Collected
Application code page = 1386
Locks held = 5
Total wait time (ms) = 0
List Of Locks
Database partition = 0
Lock Name = 0x0000D2C5000000010001F40056
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 0
Object Type = Internal Variation Lock
Mode = S
Database partition = 0
Lock Name = 0x535953534832303028EFECDC41
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 0
Object Type = Internal Plan Lock
Mode = S
Database partition = 0
Lock Name = 0x000A02BE000000000000000054
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 702
Object Type = Table
Tablespace Name = TBS_DATA_01
Table Schema = GZCRM
Table Name = BRK_D_PSN_BASE_DS
Mode = IN
Database partition = 0
Lock Name = 0x000A040F000000000000000054
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1039
Object Type = Table
Tablespace Name = TBS_DATA_01
Table Schema = GZCRM
Table Name = BRK_D_PSN_ASC_DT
Mode = IN
Database partition = 0
Lock Name = 0x000A020D000000000000000054
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 525
Object Type = Table
Tablespace Name = TBS_DATA_01
Table Schema = GZCRM
Table Name = BRK_D_PSN_TD_DS
Mode = IN
貌似解决了,给brk_d_psn_base_ds和brk_d_psn_asc_dt对user_id单独建立了索引
一连7天了都没有出现问题,继续监控中...
貌似PARTITIONING KEY和索引还不是一回事。
原来是的建表语句这样的
CREATE TABLE BRK_D_PSN_BASE_DS
(
OP_TIME DATE,
USER_ID VARCHAR(12),
CUST_ID VARCHAR(12)
....
)
IN TBS_DATA_01
INDEX IN TBS_INDX_01
PARTITIONING KEY(USER_ID) USING HASHING