PostgreSQL9.1Warm and Hot Standby Using Point-In-Time Recovery (PITR)配备

PostgreSQL9.1Warm and Hot Standby Using Point-In-Time Recovery (PITR)配置

配了快一个星期吧。看的英文文档。不停的google。英文真是我的硬伤。

参考资料:http://wiki.postgresql.org/wiki/Streaming_Replication

http://michael.otacoo.com/postgresql-2/postgres-9-1-setup-a-synchronous-stand-by-server-in-5-minutes/

http://hi.baidu.com/magicgrass/blog/item/69fcdb1e297248e9e0fe0bef.html

http://brandonkonkle.com/blog/2010/oct/20/postgres-9-streaming-replication-and-django-balanc/

http://www.postgresql.org/docs/devel/static/#STANDBY-SERVER-OPERATION

 

 

---------------------------------------------------------------

我的主机做master/primary,ip:172.29.0.43.

slave/的ip是:172.29.0.44 ,端口都是5432.

现在我的主机上创建个角色用于Replication时slave的登录role,假设角色名为slave1

 

 

CREATE ROLE name REPLICATION LOGIN

 

然后配置主机postgresql.conf和pg_hba.conf

 

往pg_hba.conf添加如下:

 

 

host    all             all             172.29.0.44/32          trust



host     replication     slave1          172.29.0.44/32         trust

 

 

修改postgresql.conf如下:

 

wal_level = hot_standby

archive_mode = on   

archive_command = 'cp -i %p /opt/PostgreSQL/9.1/archive/%f'

archive_timeout = 600 

max_wal_senders = 10 

wal_keep_segments = 32

 

 

然后创建在archive_command中出现的目录。并给予777权限。

 

接着开始在主机上创建备份

 

 

[root@ShadowFiend data]# su postgres
bash-4.1$ pg_ctl start
bash-4.1$ psql postgres -c "select pg_start_backup('backup',true)"

 

 

然后用scp  传送到 172.29.0.44  或者 用 tar -czvf  ****.tar.gz   data;/打包 传到172.29.0.44 上的 PostgreSQL 下 的 data 下 并且覆盖 除了 postgresql.conf ,pg_hba.conf 的所有文件。删除 data/postmaster.pid。

 

然后主机执行

 

 

bash-4.1 $ psql postgres -c "select pg_stop_backup"

bash-4.1 $ pg_ctl restart

 

 

接着在172.29.0.44上 用root 用户给覆盖过的 data/目录 更改属主

 

 

chown -R postgres data

 

并且修改data下的postgresql.conf :

 

 

wal_level=hot_standby

 hot_standby=on

 

从share/postgresql/复制一个recovery.conf的sample到data/下

 

 

cp share/postgresql/recovery.conf.sample  data/recovery.conf

 

并添加

 

 

restore_command=‘cp  /opt/archive/%f%p’

archive_cleanup_command=‘pg_archivecleanup/archive%r’

standby_mode=‘on’

primary_conninfo=‘host=172.29.0.43 port=5432 user=slave1 password=’
 
trigger_file= '/opt/archive/trigger_file.172.29.0.43'

 

接着启动172.29.0.44的服务吧。

 

ok。

 

 

---------------------------------------------------------------------

有错误看data/pg_log/下的日志。

中间有次不小心吧primary上的/data/pg_xlog/删光了,数据库就启动不起来,重新拷贝了其他机器上的/data 下来覆盖。。。。。

 

 

-----------------------------------------------------------------------

 

:-),出错了,大概是primary 和 slave 都关机后,再启动起来报错

 

 

DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 0/18000000
2011-12-09 11:07:52 CST LOG:  streaming replication successfully connected to primary
2011-12-09 11:07:52 CST FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000000000000018 has already been removed

 

 

 google了一下,解答如下、

 

The cause is very simple - the standby needs all WAL segments created
since the backup started, but the master removes some of them. There are
two ways to fix this:

1) increase the wal_keep_segments so that enough segments is kept

It seems that the slave asked for B110000000D when master already created
B110000000D. That's almost 4000 segments if I'm counting correctly. That
means your database is either quite busy or the backup takes very long
time.

This stores all the data on master, so you'll have to keep that in mind
when planning the capacity. For example the 4000 segments are almost 64GB.

2) Set up a WAL archive - a separate instance where the WAL segments are
kept. See how the archive_command works. And there's pg_archivecleanup for
maintenance of the archive.

Tomas