How to create a MySQL Read Replica
With services like AWS, creating read replicas is as simple as clicking a button. However, what if you need to create a read replica from a self-hosted MySQL database? In the following article, I will guide you through the entire process of creating a MySQL read replica slave hosted on Amazon RDS from a self-hosted MySQL master hosted on an Amazon EC2 instance backed by EBS block storage. This includes configuring the master and slave, performing a non-blocking dump of your initial mysqldump from the master, importing the initial dump into your slave, and initiating the replication process.
Environment
Master
: Amazon EC2 instance with EBS block-storage
Slave
: RDS (Amazon Relational Database Service)
Overview
- Prerequisites
- Configure the master
- Create a replication user
- Dump initial data from master
- Ensure connection between master and slave
- Import initial dump into replica
- Configure replica
- Start replication
- Conclusion
1. Prerequisites
Check DB file size
Before proceeding with the initial mysqldump, it is crucial to check the size of the database file. This step ensures that you have enough storage space available on your hard drive to accommodate the dump file.
File size per table
SELECT
TABLE_NAME AS 'Table',
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS 'Size (MB)'
FROM
information_schema.TABLES
WHERE
TABLE_SCHEMA = '<database_name>'
ORDER BY
(DATA_LENGTH) + INDEX_LENGTH)
DESC;
For the greatest possible durability and consistency in a replication setup using InnoDB with transactions, you should use
innodb_flush_log_at_trx_commit=1
andsync_binlog=1
in the source's my.cnf file.Ensure that the
skip_networking
system variable is not enabled on your source. If networking has been disabled, the replica cannot communicate with the source and replication fails.
File size of Database
SELECT
SUM(ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 / 1024)) AS 'Size (GB)'
FROM
information_schema.TABLES
WHERE
TABLE_SCHEMA = '<database_name>';
Check MySQL data directory size
Additionally, it is important to verify the MySQL data directory size, typically located at /var/lib/mysql. This directory serves as the storage location for all the table data, indexes, and metadata associated with a specific database. You can check it with following linux command.
du -xh /var/lib/mysql | tail -40
Check available Disk space
Now that you have determined the approximate size of your initial mysqldump based on the database size (typically around 60-80% of the DB size), the next step is to check the amount of available storage space on your storage device. This will help ensure that you have sufficient space to accommodate the mysqldump file.
df -hT
Show important variables
Following MySQL variables are crutial for your replication. The values depend on your specific use case.
SHOW GLOBAL VARIABLES LIKE 'binlog_format'; -- must be set to ROW
SHOW GLOBAL VARIABLES LIKE 'sync_binlog'; -- must be set to 1
SHOW GLOBAL VARIABLES LIKE 'server_id'; -- must be != 0
SHOW GLOBAL VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; -- must be set to 1
SHOW GLOBAL VARIABLES LIKE 'skip_networking'; -- must be set to OFF
SHOW GLOBAL VARIABLES LIKE 'expire_logs_days'; -- set to 3
SHOW GLOBAL VARIABLES LIKE 'binlog_do_db'; -- if you only what to binlog a specific database
SHOW GLOBAL VARIABLES LIKE 'max_binlog_size'; -- 1G
2. Configure the master
Official MySQL Documentation: Setting the Replication Source Configuration.
Check if you have binary logging enabled
SHOW BINARY LOGS;
If your binary logs are enabled, it will show you the binary log list. Otherwise you will see following error message
SQL Error [1381] [HY000]: You are not using binary logging
Edit your MySQL configuration file. In your MySQL configuration directory
nano /etc/mysql/mysql.conf.d/mysql.cnf
Set following variables
server-id = 10
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 3
max_binlog_size = 1G
binlog_do_db = db_name
binlog_format = ROW
server_id
Each server within a replication topology must be configured with a unique server ID, which you can specify using the server_id system variable. This server ID is used to identify individual servers within the replication topology, and must be a positive integer between 1 and (232)−1.
log_bin
Binary logging is required on the source because the binary log is the basis for replicating changes from the source to its replicas. Binary logging is enabled by default (the log_bin system variable is set to ON). The --log-bin option tells the server what base name to use for binary log files.
binlog_do_db
Row-based logging. Logging is restricted to database db_name. Only changes to tables belonging to db_name are logged;
Restart MySQL service
You need to restart MySQL after making above changes to enable binary logging. On linux you can restart your MySQL service with following command.
service mysql status
service mysql restart
After the restart, you should verify the following to ensure that the changes have taken effect.
SHOW GLOBAL VARIABLES LIKE '%log_bin%'
Variable_name | Value |
log_bin | ON |
log_bin_basename | /var/log/mysql/mysql-bin |
SHOW GLOBAL VARIABLES LIKE 'server_id'
Variable_name | Value |
server_id | 10 |
SHOW BINARY LOGS
Log_name | File_size |
mysql-bin.000001 | 157 |
3. Create a replication user
Official MySQL Documentation: Creating a User for Replication.
It is recommended to create a separate user that has privileges only for the replication process, to minimize the possibility of compromise to other accounts.
Show current users
SELECT * FROM mysql.user;
Create a new user on master
CREATE USER 'replication-user'@'%' IDENTIFIED BY 'aSecurePassword1';
GRANT REPLICATION SLAVE ON *.* TO 'replication-user'@'%';
Flush privileges
FLUSH PRIVILEGES;
4. Dump initial data from master
Note: mysqldump
will flush all tables. This could take a lot of time and if you have a lot of query cache locks (MySQL 5.7) it can freeze the DB. If this happens you can try to kill the process or do a service mysql restart
. In our test service mysql restart
took 13min for a 111 GB database. If you are working with a production database with lots of data, you may consider doing the mysqldump on a production clone.
Install pv
The pv command-line tool in Linux stands for "Pipe Viewer." It is a utility that allows you to monitor the progress and status of the mysqldump.
sudo apt-get install pv
Initial mysql dump
mysqldump -u root -p
--databases db_name
--master-data=2
--single-transaction
--quick
--routines
--result-file /opt/initialDump.sql
--master-data=2
Use this option to dump a source replication server to produce a dump file that can be used to set up another server as a replica of the source. It causes the dump output to include a
CHANGE MASTER TO
statement that indicates the binary log coordinates (file name and position) of the dumped server. These are the source server coordinates from which the replica should start replicating after you load the dump file into the replica.If the option value is
2
, theCHANGE MASTER TO
statement is written as an SQL comment, and thus is informative only.
With master-data=2
the resulting mysqldump will look like this
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER_LOG_FILE='mysql-bin-changelog.000001', MASTER_LOG_POS=157;
--single-transaction
This option sets the transaction isolation mode to REPEATABLE READ and sends a START TRANSACTION SQL statement to the server before dumping data. It is useful only with transactional tables such as InnoDB, because then it dumps the consistent state of the database at the time when START TRANSACTION was issued without blocking any applications.
To dump large tables, combine the --single-transaction option with the --quick option.
--quick
This option is useful for dumping large tables. It forces
mysqldump
to retrieve rows for a table from the server a row at a time rather than retrieving the entire row set and buffering it in memory before writing it out.
--routines
mysqldump will backup by default all triggers but NOT the
stored procedures
andfunctions
. This means if you want to include stored procedures and functions you have to add the --routines option.
Note: If you are inserting the mysqldump
with stored procedures
and functions
you first have to set log_bin_trust_function_creators=1
in the parameter group of your RDS instance, otherwise you will receive an error while importing. Or you just import the stored procedures
and functions
manually afterwards.
Check Stored Procedures and Functions
SHOW PROCEDURE STATUS;
SHOW FUNCTION STATUS;
One-Liner with pv
mysqldump -u root -p --databases db_name --master-data=2 --single-transaction --quick --routines --result-file /opt/initialDump.sql | pv --progress --size 65G > /opt/initialDump.sql
For our test the mysqldump of a 100GB Database took 2h 20min with a dump size of 65GB.
5. Ensure connection between master and slave
You have to make sure you have a two-directional connection between master and slave. The connection from master -> slave, which is needed for importing the initial dump and the connection from slave -> master, which is needed for the actual replication in order for the slave to connect to the master.
A) Connection from Master (EBS) to Slave (RDS)
This connection is needed for importing the initial dump into RDS from an EC2 instance.
This step is very simple because during the RDS creation process, you can choose an existing EC2 instance to connect to your RDS instance. Two Security Groups will be automatically created.
rds-ec2-1
: Attached to your RDS instance. Allows Inbound TCP connections via port 3306
from the Source Security Group ec2-rds-1
.
ec2-rds-1
: Attached to your EC2 instance. Allows Outbound TCP connections via port 3306
to the Destination Security Group rds-ec2-1
.
That means if you want another EC2 instance to be able to connect to your RDS instance, you only have to attach the Security Group ec2-rds-1
to your EC2 instance and you are able to connect to RDS.
B) Connection from Slave (RDS) to Master (EBS)
This connection is needed for the actual replication because the Slave has to connect to the master. This is not part of the RDS creation process and need to be done manually.
Configuration on RDS Slave
First create a new Security Group rds-slave-outbound-rule
with following properties.
Rule
: Outbound
Description
: Custom-Rule which allows RDS outbound traffic via port 3306.
VPC
: Choose the VPC where the EC2 instance is located.
Type
: MySQL/Aurora
Destination
: Security Group ec2-rds-1
. This Security group is the same Security Group which was used in Step A)
Attach this new Security Group rds-slave-outbound-rule
to your RDS instance.
Configuration on EBS Master
Originally the automatically created Security Group ec2-rds-1
only had one Outbound rule. But now we add an Inbound rule to this Security Group with following properties.
Rule
: Inbound
Description
: Allows Inbound TCP connections via port 3306
from Source Security Group rds-slave-outbound-rule
.
Type
: MySQL/Aurora
Source
: Security Group rds-slave-outbound-rule
6. Import initial dump into replica
Make sure following variable is set in your RDS parameter group. Otherwise you may receive following error while importing your dump file into RDS.
Access denied; you need (at least one of) the SUPER or SET_USER_ID privileges(s) for this operation
log_bin_trust_function_creators=1
Import dump to replica
We are once again utilizing the 'pv' command to monitor the status and progress of the import. In our test the import of a 65GB dump file took 3h 35min.
pv /opt/initialDump.sql | mysql -u admin -h ${rds_public_dns_address} -f -p ${db_name}
-f
force: Don't stop at the first error but rather log them and continue to import
-p
password: Ask for password
-h
host
-u
user
7. Configure replica
Official MySQL Documentation: Setting up Replicas.
Get dump bin-log and position
In the initialDump.sql file, you can find the values for the LOG_FILE and LOG_POS parameters. These values are essential for setting up the external master in a replication setup.
head -n 50 /opt/initialDump.sql
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER_LOG_FILE='mysql-bin-changelog.000001', MASTER_LOG_POS=157;
MASTER_LOG_FILE
: mysql-bin-changelog.000001
MASTER_LOG_POS
: 157
Set parameter group variables
Set the binlog_format=ROW
. This is very important as the RDS default is binlog_format=MIXED
. MIXED binlog_format usually uses Statement based logging, which has one crucial disadvantage.
Statement-based logging: Only those statements are written to the binary log where the default database (that is, the one selected by USE) is db_name.
That means if you are using a Database Management System, such as DBeaver and did not specify a default database you can make changes in the GUI and those changes are not replicated to the Slave! This includes editing data in DBeaver GUI, as well as importing csv. files. You either have to make sure your specific database is set as the default Database or the safest way is just use binlog_format=ROW
. It uses more space but every single row gets replicated.
Connect to your RDS instance
mysql -u admin -h ${rds_public_dns_address} -p
Set external Master
The IP-address is the internal private IP-address of the RDS instance. Not the public dns address.
CALL mysql.rds_set_external_master(
'172.30.20.10',
3306,
'replication-user',
'aSecurePassword01',
'mysql-bin.000001',
'157',
0
);
Verify Slave status
On your RDS instance execute following command. You should see information about the Master_Host
, Master_User
, Master_Port
, etc.
SHOW SLAVE STATUS;
8. Start replication
CALL mysql.rds_start_replication;
With following commands, you can verify the Slave status.
SHOW SLAVE STATUS;
SELECT * FROM performance_schema.replication_applier_status_by_worker;
The two most important fields are Slave_IO_State
which should have the value 'Waiting for master to send event' and Seconds_behind_Master
which tells you how far behind the Slave is compared to the Master.
Other important commands
Stop replication of a RDS Slave.
CALL mysql.rds_stop_replication;
Reset the external master in RDS.
CALL mysql.rds_reset_external_master;
Reset the external master in MySQL.
RESET SLAVE ALL;
If a replication error occurs but you know it is not a serious one, you an skip the replication error and continue the replication.
CALL mysql.rds_skip_repl_error;
Skip an error manually in MySQL.
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
9. Conclusion
We have learned how to export an initial mysqldump from a self-hosted MySQL Database and importing it into RDS. We also know how to configure the Master and the Slave. Moreover, we have learned how to establish the connection between the Master and Slave and effectively monitor the status of the Slave and the Replication.
Subscribe to my newsletter
Read articles from Dennis Schmidt directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Dennis Schmidt
Dennis Schmidt
Software Engineer at BMW, Oracle Certified Professional Java SE 11 Developer