SQL Server 中主键和唯一聚集索引的性能差异

SQL Server 中主键和唯一聚集索引的性能差异

问题描述:

我们有 2 个表,大约有 4000 万行.数据库的大小约为 20GB,大部分是针对这 2 个表的.每天,我们需要删除一些数据,即大约 10M 行.因此,我们使用批量删除将日志文件保持在一定大小.

We have 2 tables that have about 40M rows. the size of database is about 20GB, most are for these 2 tables. Everyday, We need delete some data, i.e. about 10M rows. So, we are using batch delete to keep the log file within certain size.

本来,表没有主键.但是每个表都有唯一的聚集索引.删除需要永远.即在虚拟机上删除 500K 行大约需要 2-3 小时.* 删除前,重建索引.

originally, there is no primary key for the table. But has unique, clustered index for each table. the delete takes for ever. i.e. it takes about 2-3 hours to delete 500K rows on a virtual machine. * before delete, the index was rebuilt.

现在,我们将唯一的聚集索引转换为主键.删除 2M 行大约需要 20-30 分钟.

now, we converted the unique, clustered index to primary key. it takes about 20-30 minutes to delete 2M rows.

我知道主键和唯一聚集索引之间存在差异,但为什么性能如此不同?

I understand there is difference between primary key and unique clustered index, but why the performance is so different?

有人有一些见解吗?

谢谢

Rolling my 8-Ball:如果您声明了一个 非聚集 主键(正如您的帖子中所暗示的那样)然后在每个批次中,您很可能会达到指数临界点.因此,每个批次将执行 40M 行的完整扫描以删除批次大小.然后,在下一批中,再次进行全面扫描.依此类推,直到您的 10M 将被删除.使用聚集键,批次应该只扫描被删除的实际行(当然我假设你的批量删除标准实际上会使用聚集键......).如您所见,当人们开始猜测...

Rolling my 8-Ball: if you declared a non-clustered primary key (as it seems to suggest from your post) then on each batch you would very likely hit the index tipping point. Thus each batch would do a full scan of 40M rows to delete the batch size. Then, on the next batch, again a full scan. And so on until your 10M would be deleted. With a clustered key the batches should scan only the actual rows being deleted (of course I assume your batch delete criteria would actually use the clustered key...). As you see, there are many unknowns when one starts guessing...

但最终...您有一个性能问题,您应该使用性能故障排除技术进行调查.捕获执行计划,等待统计数据统计数据.遵循等待和队列之类的方法.措施.不要听从互联网上某人的猜测刚刚推出了一个8-Ball...

But ultimately... you have a performance question and you should investigate it using the performance troubleshooting techniques. Capture the execution plans, the wait stats, the statistics io. Follow a methodology like Waits and Queues. Measure. Don't listen to guesses from someone on the internet that just rolled an 8-Ball...