SQL Server 中主键和唯一聚集索引的性能差异
我们有 2 个表,大约有 4000 万行.数据库的大小约为 20GB,大部分是针对这 2 个表的.每天,我们需要删除一些数据,即大约 10M 行.因此,我们使用批量删除将日志文件保持在一定大小.
We have 2 tables that have about 40M rows. the size of database is about 20GB, most are for these 2 tables. Everyday, We need delete some data, i.e. about 10M rows. So, we are using batch delete to keep the log file within certain size.
本来,表没有主键.但是每个表都有唯一的聚集索引.删除需要永远.即在虚拟机上删除 500K 行大约需要 2-3 小时.* 删除前,重建索引.
originally, there is no primary key for the table. But has unique, clustered index for each table. the delete takes for ever. i.e. it takes about 2-3 hours to delete 500K rows on a virtual machine. * before delete, the index was rebuilt.
现在,我们将唯一的聚集索引转换为主键.删除 2M 行大约需要 20-30 分钟.
now, we converted the unique, clustered index to primary key. it takes about 20-30 minutes to delete 2M rows.
我知道主键和唯一聚集索引之间存在差异,但为什么性能如此不同?
I understand there is difference between primary key and unique clustered index, but why the performance is so different?
有人有一些见解吗?
谢谢
Rolling my 8-Ball:如果您声明了一个 非聚集 主键(正如您的帖子中所暗示的那样)然后在每个批次中,您很可能会达到指数临界点.因此,每个批次将执行 40M 行的完整扫描以删除批次大小.然后,在下一批中,再次进行全面扫描.依此类推,直到您的 10M 将被删除.使用聚集键,批次应该只扫描被删除的实际行(当然我假设你的批量删除标准实际上会使用聚集键......).如您所见,当人们开始猜测...
Rolling my 8-Ball: if you declared a non-clustered primary key (as it seems to suggest from your post) then on each batch you would very likely hit the index tipping point. Thus each batch would do a full scan of 40M rows to delete the batch size. Then, on the next batch, again a full scan. And so on until your 10M would be deleted. With a clustered key the batches should scan only the actual rows being deleted (of course I assume your batch delete criteria would actually use the clustered key...). As you see, there are many unknowns when one starts guessing...
但最终...您有一个性能问题,您应该使用性能故障排除技术进行调查.捕获执行计划,等待统计数据,统计数据.遵循等待和队列之类的方法.措施.不要听从互联网上某人的猜测刚刚推出了一个8-Ball...
But ultimately... you have a performance question and you should investigate it using the performance troubleshooting techniques. Capture the execution plans, the wait stats, the statistics io. Follow a methodology like Waits and Queues. Measure. Don't listen to guesses from someone on the internet that just rolled an 8-Ball...