GBase 8c V5.0.0 主备式部署指北

部署环境说明

  • SSH client:MobaXterm Home Edition v25.4(Portable edition)
  • 部署模式:一主一备(带CM)

安装准备

软硬件环境

硬件规划(VirtualBox虚拟机)

主节点备节点
主机名gbase01gbase02
CPU2core2core
内存2G2G
磁盘20G20G
操作系统openEuler-22.03-LTS-SP4-x86_64openEuler-22.03-LTS-SP4-x86_64
IP地址172.31.100.231/24172.31.100.232/24
网络连接桥接网卡桥接网卡

有条件的可以增大内存剂量(以上配置仅为实验性质配置,实际生产环境需要远高于这个硬件配置)

软件依赖要求

所需软件建议版本
libaio-devel0.3.109-13
flex2.5.31以上
bison2.7-4
ncurses-devel5.9-13.20130511
glibc-devel2.17-111
patch2.7.1-10
redhat-lsb-core4.1(openEuler用openeuler-lsb替代)
readline-devel7.0-13
libnsl2.28-36(openEuler+x86必装)
expect无(非单节点部署必装)
patchelf

验证系统中依赖包是否符合要求

rpm -q libaio-devel flex bison ncurses-devel glibc-devel patch openeuler-lsb  readline-devel libnsl expect patchelf

[root@gbase8c ~]# rpm -q libaio-devel flex bison ncurses-devel glibc-devel patch openeuler-lsb  readline-devel libnsl expect patchelf
package libaio-devel is not installed
flex-2.6.4-5.oe2203sp4.x86_64
bison-3.8.2-2.oe2203sp4.x86_64
package ncurses-devel is not installed
glibc-devel-2.34-152.oe2203sp4.x86_64
patch-2.7.6-14.oe2203sp4.x86_64
package openeuler-lsb is not installed
package readline-devel is not installed
package libnsl is not installed
package expect is not installed
package patchelf is not installed
[root@gbase8c ~]#

说明:

  • 上面的软件依赖要求中大多为 RedHat/CentOS 7.x 等系统上的建议版本,openEuler官方软件仓库中并没有这些版本
  • openEuler-22.03-LTS-SP4系统中已经包含flex,bison,glibc-devel,patch四个依赖
  • redhat-lsb-core,看名称就明,openEuler用openeuler-lsb替代
  • 不走通过源码编译安装指定版本这条道(风险较高,可能破坏系统兼容性)
  • 既然官网中标识支持openEuler-22.03-LTS系统,那就用官方软件仓库中版本,即使高于推荐版本

安装缺失的依赖包

dnf install -y libaio-devel ncurses-devel openeuler-lsb readline-devel libnsl expect patchelf

依赖包安装完成后查看

[root@gbase01 ~]# rpm -q libaio-devel flex bison ncurses-devel glibc-devel patch openeuler-lsb  readline-devel libnsl expect patchelf
libaio-devel-0.3.113-9.oe2203sp4.x86_64
flex-2.6.4-5.oe2203sp4.x86_64
bison-3.8.2-2.oe2203sp4.x86_64
ncurses-devel-6.3-16.oe2203sp4.x86_64
glibc-devel-2.34-170.oe2203sp4.x86_64
patch-2.7.6-14.oe2203sp4.x86_64
openeuler-lsb-5.0-1.oe2203sp4.x86_64
readline-devel-8.1-3.oe2203sp4.x86_64
libnsl-2.34-170.oe2203sp4.x86_64
expect-5.45.4-8.oe2203sp4.x86_64
patchelf-0.16.0-1.oe2203sp4.x86_64

部署环境配置

关闭防火墙

以下操作在主备节点上都需要操作

# 检查firewalld状态
systemctl is-active firewalld && systemctl is-enabled firewalld
# 停止防火墙服务 和 禁用防火墙开机自启
systemctl stop firewalld.service && systemctl disable firewalld.service 

关闭SELINUX

以下操作在主备节点上都需要操作

# 检查SELinux当前状态和配置 
sestatus 
grep -E "^SELINUX=" /etc/selinux/config 

# 直接修改配置文件,设置为永久关闭
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config 

# 重启系统使配置生效(
reboot now 
# 验证永久关闭状态
sestatus

设置时间同步

  • 实验环境可手动校准主备节点时间(确保误差≤3 秒),生产环境推荐配置ntp服务自动同步
  • 主备节点时间误差需控制在3秒内,否则集群部署失败

建立节点互信

部署过程中通过 gs_preinstall 脚本初始化安装环境时,输入yes会自动创建root互信 以下操作在主备节点上都需要操作–>可跳过此步骤

# 主备节点都执行:生成RSA密钥对(无密码,直接回车) 
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa 

# 主节点执行:拷贝公钥到备节点(需要输入root密码)
ssh-copy-id -o StrictHostKeyChecking=no 172.31.100.232 

# 备节点执行:拷贝公钥到主节点(需要输入root密码)
ssh-copy-id -o StrictHostKeyChecking=no 172.31.100.231 

验证互信–>可跳过此步骤

# 备节点测试登录主节点
ssh 172.31.100.231

关闭THP服务

以下操作在主备节点上都需要操作

# 临时关闭
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# echo never > /sys/kernel/mm/transparent_hugepage/defrag

# 验证(显示[never])
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

设置重启后自动关闭

# 开机自启脚本rc.local添加可执行权限(默认无x权限)
chmod +x /etc/rc.d/rc.local

# 设置为开机自启(默认是关闭 和 未开机自启的)
systemctl enable rc-local.service

# 追加命令到rc.local文件末尾,EOF是批量追加标识,不修改原有内容
cat >> /etc/rc.d/rc.local <<EOF
# 开机自动执行永久关闭swap
# 很需要大内存但是……
# swapoff -a

# 判断THP的enabled文件是否存在,存在则写入never关闭
if test -f /sys/kernel/mm/transparent_hugepage/enabled;
then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi

# 判断THP的defrag文件是否存在,存在则写入never关闭
if test -f /sys/kernel/mm/transparent_hugepage/defrag;
then
  echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
EOF
# 手动执行一次 rc.local,让配置「立即生效」
sh /etc/rc.d/rc.local

关闭RemoveIPC

openEuler-22.03-LTS-SP4-x86_64 默认关闭了RemoveIPC 可通过命令验证

[root@gbase01 ~]# cat /etc/systemd/logind.conf | grep -i RemoveIPC
#RemoveIPC=no
[root@gbase01 ~]# systemctl show systemd-logind | grep RemoveIPC
RemoveIPC=no
[root@gbase01 ~]#

其它系统需通过命令验证查看是否关闭,然后决定使用以下命令关闭

# 修改/etc/systemd/logind.conf文件中的“RemoveIPC”值为“no”,执行命令:
vim /etc/systemd/logind.conf

# 修改/usr/lib/systemd/system/systemd-logind.service文件中的“RemoveIPC”值为“no”,执行命令:
vim /usr/lib/systemd/system/systemd-logind.service

# 重启服务:
systemctl daemon-reloadsystemctl restart systemd-logind.service

kernel参数设置

以下操作在主备节点上都需要操作

# 检查当前内核参数 
sysctl kernel.sem 

# 检查sysctl.conf中是否已有配置 
grep "kernel.sem = 40960 2048000 40960 20480" /etc/sysctl.conf 

# 追加参数到配置文件末尾 
echo "kernel.sem = 40960 2048000 40960 20480" >> /etc/sysctl.conf 

# 使配置立即生效 
sysctl -p 

# 验证参数是否生效 
sysctl kernel.sem

添加用户至sudoer

以下操作在主备节点上都需要操作

# 检查sudoers中是否已有配置 
grep "gbase ALL=(ALL) NOPASSWD:ALL" /etc/sudoers 

# 在 root ALL=(ALL) ALL 下一行插入配置 
sed -i '/^root\s*ALL=(ALL)\s*ALL/a gbase ALL=(ALL) NOPASSWD:ALL' /etc/sudoers 

创建gbase用户组和用户

  • gs_preinstall 脚本初始化安装环境时,如果该用户不存在,则会自动创建gbase用户和用户组,也可提前创建好用户

  • gbase用户的免密、创建用户、建目录、传包解压、赋权限 全由gs_preinstall自动完成

以下操作只在主节点上操作,备节点无需要操作–>可跳过此步骤

# 需要手动输入gbase密码
groupadd gbase
useradd -m -d /home/gbase gbase -g gbase
passwd gbase

安装部署

上传并解压安装包

上传安装包到/opt/software/gbase8c目录 以下操作只在主节点上操作,备节点无需要操作

mkdir -p /opt/software/gbase8c
chmod 755 -R /opt/software
cd /opt/software/gbase8c

以下操作只在主节点上操作,备节点无需要操作

# 执行解压
tar -xvf GBase8cV5_S5.0.0B28_centos7.8_x86_64.tar.gz
# 解压om安装包
tar -xvf GBase8cV5_S5.0.0B28_CentOS_x86_64_om.tar.gz

编辑XML配置文件(一主一备带CM)

模板位置:cd /opt/software/gbase8c/script/gspylib/etc/conf/

模板文件:cluster_config_with_cm_template.xml

直接创建文件

# xml创建位置 cd /opt/software/gbase8c
nano cluster_config_with_cm.xml 

然后粘贴以下,注意替换实际IP和主机名

<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
    <CLUSTER>
        <PARAM name="clusterName" value="Cluster_GBase" />
        <PARAM name="nodeNames" value="gbase01,gbase02"/>
        <PARAM name="gaussdbAppPath" value="/opt/database/install/app" />
        <PARAM name="gaussdbLogPath" value="/opt/database/var/log/omm" />
        <PARAM name="tmpMppdbPath" value="/opt/database/tmp"/>
        <PARAM name="gaussdbToolPath" value="/opt/database/install/om" />
        <PARAM name="corePath" value="/opt/database/corefile"/>
        <PARAM name="backIp1s" value="172.31.100.231,172.31.100.232"/>
    </CLUSTER>
    <DEVICELIST>
        <DEVICE sn="gbase01">
            <PARAM name="name" value="gbase01"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <PARAM name="backIp1" value="172.31.100.231"/>
            <PARAM name="sshIp1" value="172.31.100.231"/>
            <!-- dn -->
            <PARAM name="dataNum" value="1"/>
            <PARAM name="dataPortBase" value="15400"/>
            <PARAM name="dataNode1" value="/opt/database/install/data/dn,gbase02,/opt/database/install/data/dn"/>
            <PARAM name="dataNode1_syncNum" value="1"/>
            <!-- cm -->
            <PARAM name="cmsNum" value="1"/>
            <PARAM name="cmDir" value="/opt/database/gbase/install/cm"/>
            <PARAM name="cmServerPortBase" value="15300"/>
            <PARAM name="cmServerListenIp1" value="172.31.100.231,172.31.100.232"/>
            <PARAM name="cmServerHaIp1" value="172.31.100.231,172.31.100.232"/>
            <PARAM name="cmServerlevel" value="1"/>
            <PARAM name="cmServerRelation" value="gbase01,gbase02"/>
        </DEVICE>
        <DEVICE sn="gbase02">
            <PARAM name="name" value="gbase02"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <PARAM name="backIp1" value="172.31.100.232"/>
            <PARAM name="sshIp1" value="172.31.100.232"/>
            <!-- cm -->
            <PARAM name="cmDir" value="/opt/database/gbase/install/cm"/>
            <PARAM name="cmServerPortStandby" value="15300"/>
        </DEVICE>
    </DEVICELIST>
</ROOT>

执行gs_preinstall前置环境检查

以下操作只在主节点上操作,备节点无需要操作

# 切换root用户执行,之前的步骤中如果切换到gbase用户,直接 exit 退到root
# 进入script目录
cd /opt/software/gbase8c/script/
# 执行前置脚本
./gs_preinstall -U gbase -G gbase -X /opt/software/gbase8c/cluster_config_with_cm.xml

中途会有要求输入yes和密码的步骤

……
Are you sure you want to create trust for root (yes/no)?yes
Please enter password for root
Password:
……
Are you sure you want to create the user[gbase] and create trust for it (yes/no)? yes
Please enter password for cluster user.
Password:
……
Please enter password for current user[gbase].
Password:
……
Preinstallation succeeded.

执行gs_install安装

以下操作只在主节点上操作,备节点无需要操作

# 切换gbase用户执行
su - gbase

# 执行安装脚本
gs_install -X /opt/software/gbase8c/cluster_config_with_cm.xml

中途会有要求输入数据库密码的步骤

……
Please enter password for database:
Please repeat for database:
……
Configuration is completed.
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state      : Normal
redistributing     : No
node_count         : 2
Datanode State
    primary           : 1
    standby           : 1
    secondary         : 0
    cascade_standby   : 0
    building          : 0
    abnormal          : 0
    down              : 0

Successfully installed application.
end deploy..

验证安装结果

切换gbase用户执行检查集群状态

gs_om -t status --detail

返回 cluster_state 参数状态为“Normal”,则表示数据库可正常使用

配置 VIP

添加VIP资源和绑定到节点

以下操作在主备节点上都需要操作

# 切换gbase用户执行
# 添加名称为CM_VIP的资源,float_ip设置为172.31.100.230
cm_ctl res --add --res_name="CM_VIP" --res_attr="resources_type=VIP,float_ip=172.31.100.230"

# 切换gbase用户执行
# 绑定主备节点实例
cm_ctl res --edit --res_name="CM_VIP" --add_inst="res_instance_id=6001,node_id=1" --inst_attr="base_ip=172.31.100.231"
cm_ctl res --edit --res_name="CM_VIP" --add_inst="res_instance_id=6002,node_id=2" --inst_attr="base_ip=172.31.100.232"

查看已配置的资源

cm_ctl res --list --res_name="CM_VIP"
cm_ctl list --param --server | grep ip

CM参数修改

  • cms_enable_failover_on2nodes = 1 开启双节点故障自动切换功能

  • cms_network_isolation_timeout = 10 配置网络隔离超时10秒

  • cms_enable_db_crash_recovery = 1 开启数据库崩溃自动恢复

  • third_party_gateway_ip = 172.31.100.254 配置网关检测IP

以下操作只在主节点上操作,备节点无需要操作

# 仅主节点执行,集群信息自动同步
cm_ctl set --param --server -k "cms_enable_failover_on2nodes=1"
cm_ctl set --param --server -k "cms_network_isolation_timeout=10"
cm_ctl set --param --server -k "cms_enable_db_crash_recovery=1"
cm_ctl set --param --server -k "third_party_gateway_ip=172.31.100.254"

重启数据库

gs_om -t restart

查看float_ip状态

[gbase@gbase02 ~]$ cm_ctl show
[  FloatIp Network State  ]

node       instance base_ip        float_ip_name float_ip
----------------------------------------------------------------
1  gbase01 6001     172.31.100.231 CM_VIP        172.31.100.230
----------------------------------------------------------------

[gbase@gbase01 ~]$ cm_ctl show
[  FloatIp Network State  ]

node       instance base_ip        float_ip_name float_ip
----------------------------------------------------------------
1  gbase01 6001     172.31.100.231 CM_VIP        172.31.100.230
[gbase@gbase01 ~]$ ip a | grep 172.31.100.230
    inet 172.31.100.230/24 brd 172.31.100.255 scope global secondary enp0s3:15400
[gbase@gbase01 ~]$ gs_guc check -I all -c "listen_addresses"
The gs_guc run with the following arguments: [gs_guc -I all -c listen_addresses check ].
expected guc information: gbase01: listen_addresses=NULL: [/opt/database/install/data/dn/postgresql.conf]
gs_guc check: gbase01: listen_addresses='localhost,172.31.100.231,172.31.100.230': [/opt/database/install/data/dn/postgresql.conf]

Total GUC values: 1. Failed GUC values: 0.
The value of parameter listen_addresses is same on all instances.
    listen_addresses='localhost,172.31.100.231,172.31.100.230'

从gbase02节点看到当前VIP float_ip所在的节点是gbase01

从gbase01节点看到VIP float_ip 出现在网卡信息里

从gbase01节点通过 gs_guc check命令查看listen_addresses包含float_ip

添加集群CM_VIP地址数据库监听地址

实操发现无需手动添加VIP地址,listen_addresses 会自动包含 VIP,主备倒换后VIP地址会自动加载到postgresql.conf postgresql.conf

文件路径:/opt/database/install/data/dn/postgresql.conf

# 查看节点当前数据库监听地址
gs_guc check -I all -c "listen_addresses"
# 查看备节点数据库监听地址
# ssh 172.31.100.232 "gs_guc check -I all -c 'listen_addresses'"

# 主节点默认值
listen_addresses = localhost,172.31.100.231,172.31.100.230'
# 备节点默认值
listen_addresses = 'localhost,172.31.100.232'

# 使用命令 cm_ctl switchover -A 倒换主备,VIP地址自动切换到主节点中的配置文件里

# 修改数据库监听地址
# gs_guc set -I all -c "listen_addresses='localhost,172.31.100.231,172.31.100.232,172.31.100.230'"

# 重启集群,让监听地址生效
# gs_om -t restart

授权数据库允许通过 VIP 地址访问

文件路径:/opt/database/install/data/dn/pg_hba.conf

以下操作只在主节点上操作,备节点无需要操作

# 重载pg_hba.conf配置,授权数据库允许通过 VIP 地址访问
gs_guc reload -N all -I all -h "host all all 172.31.100.230/32 sha256"
  • VIP 是数据库集群对外提供服务的统一入口 IP,这个 IP 本身不属于任何一台物理服务器,是数据库集群的逻辑服务地址
  • 授权数据库允许通过 VIP 地址访问
  • 让 VIP 地址具备数据库访问的合法身份,是实现高可用的基础

设置备机可读

用于双数据源,主节点负责增删改,备节点负责查

备节点也参与工作,不能闲着

需要在应用代码中定义连接双数据源逻辑或通过中间件实现读写分离

以下操作在主备节点上都需要操作

# 任一节点停止主备实例
gs_om -t stop

# 查看默认值
more /opt/database/install/data/dn/postgresql.conf | grep hot_standby

[gbase@gbase01 ~]$ more /opt/database/install/data/dn/postgresql.conf | grep hot_standby
wal_level = hot_standby                 # minimal, archive, hot_standby or logical
hot_standby = on                        # "on" allows queries during recovery
#hot_standby_feedback = off             # send info from standby to prevent

# 编辑主备节点postgresql.conf,修改 hot_standby_feedback = on
sed -i 's/^#hot_standby_feedback = off/hot_standby_feedback = on/' /opt/database/install/data/dn/postgresql.conf

# 任一节点重启集群
gs_om -t start

数据库登陆

GBase8c数据库登录操作指北

数据库主备倒换

GBase8c主备式部署数据库倒换测试

卸载

主节点执行集群卸载(gbase用户执行)

# 主节点执行集群卸载
gs_uninstall --delete-data

[gbase@gbase01 ~]$ gs_uninstall --delete-data
Checking uninstallation.
Successfully checked uninstallation.
Stopping the cluster.
Successfully stopped the cluster.
Successfully deleted instances.
Uninstalling application.
Successfully uninstalled application.
Uninstallation succeeded.
[gbase@gbase01 ~]$

主节点执行集群清理环境(root用户)

# 清理环境(root用户)
cd /opt/software/gbase8c/script
./gs_postuninstall -U gbase -X /opt/software/gbase8c/cluster_config_with_cm.xml --delete-user --delete-group

[root@gbase01 script]# ./gs_postuninstall -U gbase -X /opt/software/gbase8c/cluster_config_with_cm.xml --delete-user --delete-group
Parsing the configuration file.
Successfully parsed the configuration file.
Creating SSH trust for the root permission user.
Are you sure you want to create trust for root (yes/no)?yes
Please enter password for root.
Password:
Successfully created SSH trust for the root permission user.
Check log file path.
Successfully checked log file path.
Checking unpreinstallation.
Successfully checked unpreinstallation.
Deleting the instance's directory.
Successfully deleted the instance's directory.
Deleting the temporary directory.
Successfully deleted the temporary directory.
Deleting remote OS user.
Successfully deleted remote OS user.
Deleting software packages and environmental variables of other nodes.
Successfully deleted software packages and environmental variables of other nodes.
Deleting logs of other nodes.
Successfully deleted logs of other nodes.
Deleting software packages and environmental variables of the local node.
Successfully deleted software packages and environmental variables of the local nodes.
Deleting local OS user.
Successfully deleted local OS user.
Deleting local node's logs.
Successfully deleted local node's logs.
Successfully cleaned environment.
[root@gbase01 script]#

删除root互信文件(root用户)

# 如果之前删除了root的互信,可跳过
# gbase用户用户已经在上一步骤彻底干没了,无需操作
rm -rf ~/.ssh

关于其它设置项

  • /opt/software/gbase8c/ 所上传的文件不会删除
  • /opt/database/ gabase的安装路径不会删除
  • 安装的依赖包不会删除
  • 部署环境配置步骤中的设置不会恢复默认(除创建的gbase用户和用户组)