Postgres性能问题

Postgres性能问题

问题描述:

我们正在运行Postgres 9.1.3,最近我们开始在其中一台服务器上遇到主要的性能问题。

We are running Postgres 9.1.3 and we have recently started to run into major performance problems on one of our servers.

我们的查询运行了一段时间,但截至8月1日,它们的速度已显着放缓。看来大多数有问题的查询都是Select查询(带有count(*)的查询特别糟糕),但总的来说,数据库的运行速度确实非常慢。

Our queries ran fine for a while, but as of August 1st, they have slowed down dramatically. It would appear that most of the problematic queries are Select queries (queries with count(*) are especially bad), but in general, the database is just running really slow.

我们在服务器上运行了查询,这些是我们对默认配置文件所做的更改(注意:服务器之前通过这些更改运行良好,因此,它们可能没什么大不了的):

We ran this query on the server and these were the changes that we have made to the default config file (Note: The server ran fine with these changes before, so, they likely don't matter much) :

       name            |                                                current_setting
---------------------------+---------------------------------------------------------------------------------------------------------------
version                   | PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by  gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit
autovacuum                | off
bgwriter_delay            | 20ms
checkpoint_segments       | 6
checkpoint_warning        | 0
client_encoding           | UTF8
default_statistics_target | 1000
effective_cache_size      | 4778MB
effective_io_concurrency  | 2
fsync                     | off
full_page_writes          | off
lc_collate                | en_US.UTF-8
lc_ctype                  | en_US.UTF-8
listen_addresses          | *
maintenance_work_mem      | 1GB
max_connections           | 100
max_stack_depth           | 2MB
port                      | 5432
random_page_cost          | 2
server_encoding           | UTF8
shared_buffers            | 1792MB
synchronous_commit        | off
temp_buffers              | 16MB
TimeZone                  | US/Eastern
wal_buffers               | 16MB
wal_level                 | minimal
wal_writer_delay          | 10ms
work_mem                  | 16MB
(28 rows)

Time: 210.231 ms

通常,当出现此类问题时,人们推荐的第一件事就是吸尘,而我们已经尝试过。我们真空分析了大多数数据库,但无济于事。

Normally, when problems like this arise, the first thing people recommend is vacuuming and we have tried that. We vacuum analyzed most of the database, but it didn't help.

我们在某些数据中使用了 Explain 查询并注意到,即使表具有索引,Postgres还是采用顺序扫描。

We used Explain on some of our queries and noticed that Postgres was resorting to sequential scans even though the tables had indexes.

我们关闭了顺序扫描,以强制查询计划程序使用索引,但这也无济于事。

We turned sequential scan off to force the query planner into using indexes, but that did not help either.

然后我们尝试了查询,看看是否有Postgres正在使用大量未使用的磁盘空间以查找所需内容。不幸的是,尽管我们的某些表确实有些庞大,但它似乎不足以降低整体系统性能。

We then tried out this query to see if we had a lot of unused diskspace that Postgres was going through in order to find what it is looking for. Unfortunately, while some of our tables did have a bit of bulk, it did not seem significant enough to slow down overall system performance.

我们认为速度下降可能是由于I / O相关,但我们无法弄清楚细节。 Postgres只是傻吗?如果是,那是什么部分? VM是否有问题,或者物理硬件本身有问题?

We think the slowdown might be I/O related, but we can't figure out the specifics. Is Postgres just being silly and if so, what part of it? Is there something wrong with the VM, or perhaps something wrong with the physical hardware itself?

你们对我们可以尝试或检查的事情还有其他建议吗?

Do you guys have any other suggestions for things that we can try or check out?

编辑:

很抱歉没有尽快更新。我陷入了其他困境。

I am so sorry for not updating this sooner. I got caught up in other things.

在这台特定的计算机上,通过对虚拟机的设置进行一些小的修改,我们的性能大大提高了。

On this particular machine, our performance greatly improved by making one small modification to the Virtual Machine's settings.

有一个用于处理IO缓存的设置。最初设置为ON。我们认为不断缓存内容会使速度变慢,我们是对的。我们将其关闭,情况大为改善。

There is a setting that deals with IO caching. It was originally set to to ON. We figured that constantly caching things was slowing things down and we were right. We turned it OFF, and things improved drastically.

有趣的是,我们大多数其他服务器已经关闭了此设置。

Interestingly enough most of our other servers already had this setting turned off.

还有其他问题,我相信我们会采纳您的许多建议,因此,非常感谢您的帮助。

There are other issues, and I am sure we will take a lot of your suggestions, so, thanks a lot for helping.

很难确定,但是我认为对I / O问题保持怀疑是正确的。可能发生的情况是,随着表的增大或连接的增加,缓存命中率开始下降。这增加了I / O需求,并使一切变慢了。同时,更多的查询到达,使问题变得更糟。对于您而言,情况很复杂,因为虚拟磁盘的行为不一定与物理磁盘相同。

It's difficult to be sure, but I think you are right to be suspicious of I/O issues. What can happen is that as tables get larger or connections are increased then cache hits start to fall. That increases I/o demands and slows everything down. Meanwhile, more queries arrive, making the problem worse. The situation is complicated for you because virtual disks don't necessarily behave the same as physical ones.

首先,您需要测量虚拟机上的实际活动(通过vmstat或iostat)。其次,在实际硬件上执行相同操作。最后,在两者上运行一些标准的磁盘带宽工具(尤其是随机读/写混合)。现在,您可以说出正在使用多少可用的I / o。

Firstly you will need to measure actual activity on the VM (through vmstat or iostat perhaps). Secondly, do the same on the real hardware. Finally, run some standard disk bandwidth tools on both (in particular random read/write mixes). Now you'll be able to say how much of your available I/o is being used.

对于查询计划,没有模式详细信息,并且没有说明分析输出-可以说。

As for query plans, without the schema details and explain analyse output no-one can say.

即使对于存档,您也会发现postgresql.org邮件列表很有用。另外,下面链接的书也很棒。

You will find the postgresql.org mailing list useful even if just for the archives. Also, the book linked below is excellent.

http://www.packtpub.com/postgresql-90-high-performance/book