Quantcast
Channel: XtraDB Cluster – Percona Database Performance Blog
Viewing all 110 articles
Browse latest View live

Changing an async slave of a PXC cluster to a new Master

$
0
0

Async and PXC

A common question I get about Percona XtraDB Cluster is if you can mix it with asynchronous replication, and the answer is yes!  You can pick any node in your cluster and it can either be either a slave or a master just like any other regular MySQL standalone server (Just be sure to use log-slave-updates in both cases on the node in question!).  Consider this architecture:

Canvas 1

However, there are some caveats to be aware of.  If you slave from a cluster node, there is no built in mechanism to fail that slave over automatically to another master node in your cluster.  You cannot assume that the binary log positions are the same on all nodes in your cluster (even if they start binary logging at the same time), so you can’t issue a CHANGE MASTER without knowing the proper binary log position to start at.

Canvas 2

Until recently, I thought it was not possible to easily do this without scanning both the relay log on the slave and the binary log on the new master and trying to match up the (binary) replication event payloads somehow.  It turns out it’s actually quite easy to do without the old master being available and without pausing writes on your cluster.

Galera Seqnos and Binary log Xids

Given the above scenario, we have a 2 node cluster with node3 async replicating from node1.  Writes are going to node2.  If I stop writes briefly and look at node1′s binary log, I can see something interesting:

[root@node1 mysql]# mysqlbinlog node1.000004  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 442923
#130620 11:07:00 server id 2  end_log_pos 442950 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Notice the Xid field. Galera is modifying this field that exists in a standard binlog with its last commited seqno:

[root@node1 ~]# mysql -e "show global status like 'wsrep_last_committed'";
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 86277 |
+----------------------+-------+

If I check node2, sure enough its latest binlog has the same Xid, even though it’s in a different file at a different position:

[root@node2 mysql]# mysqlbinlog node2.000002  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 162077
#130620 11:07:00 server id 2  end_log_pos 162104 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

So, with both nodes doing log-bin (and both with log-slave-updates), even though they are on different log files, they still have the same Xid that is very easy to search for.

But, does our slave (node3) know about these Xids? If we check the latest relay log:

[root@node3 mysql]# mysqlbinlog node3-relay-bin.000002  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 442942
#130620 11:07:00 server id 2  end_log_pos 442950 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Yes, indeed! We’re assuming the SQL thread on this slave has processed everything in the relay log and, if it has, then we know that the Galera transaction seqno 86277 was the last thing written on the slave. If it hasn’t, you’d have to find the last position applied by the SQL thread using SHOW SLAVE STATUS and find the associated Xid with that position.

Testing it out

Now, let’s simulate a failure to see if we can use this information to our advantage. I have write load running against node2 and I kill node1:

root@node2 mysql]# myq_status wsrep
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:13:33 P  24  2 Sync T/T   0   0  2k  11 2.8M  258 0.0   0   0   0   9    0    0    1
11:13:34 P  24  2 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   9    0    0    1
11:13:35 P  24  2 Sync T/T   0   0  14   0  22K    0 0.0   0   0   0   8    0    0    1
11:13:36 P  24  2 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   8    0    0    1
11:13:37 P  24  2 Sync T/T   0   0   7   0  11K    0 0.0   0   0   0   8    0    0    1
11:13:38 P  24  2 Sync T/T   0   0   5   0 7.5K    0 0.0   0   0   0   8    0    0    1
11:13:39 P  24  2 Sync T/T   0   0   6   0 9.3K    0 0.0   0   0   0   8    0    0    1
11:13:40 P  24  2 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   8    0    0    1
11:13:41 P  24  2 Sync T/T   0   0   8   0  12K    0 0.0   0   0   0   8    0    0    1
11:13:42 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:43 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:44 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:45 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:47 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:48 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:13:49 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:50 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:51 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:52 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:53 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:54 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:55 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:56 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:57 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:58 N 615  1 Init F/T   0   0   0   2    0  234 0.0   0   0   0   8    0    0    0
11:13:59 N 615  1 Init F/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:14:00 N 615  1 Init F/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0

The node failure takes a little while. This is because it was a 2 node cluster and we just lost quorum. Note the node goes to the Init state and it is also non-Primary. This also happened to kill our test client. If we check our slave, we can see it is indeed disconnected since node1 went away:

node3 mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: node1
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: node1.000004
          Read_Master_Log_Pos: 3247513
               Relay_Log_File: node3-relay-bin.000002
                Relay_Log_Pos: 3247532
        Relay_Master_Log_File: node1.000004
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
...

Since our remaining node is non-Primary, we need to bootstrap it first and restart our load. (Note: This is only because it was a 2 node cluster, this is not strictly necessary to do this procedure in general).

[root@node2 mysql]# mysql -e 'set global wsrep_provider_options="pc.bootstrap=true"'
[root@node2 mysql]# myq_status wsrep
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:16:46 P  25  1 Sync T/T   0   0  2k  17 3.0M 1021 0.0   0   0   0   5    0    0    1
11:16:47 P  25  1 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   5    0    0    1
11:16:48 P  25  1 Sync T/T   0   0  13   0  20K    0 0.0   0   0   0   5    0    0    1
11:16:49 P  25  1 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   5    0    0    1
11:16:50 P  25  1 Sync T/T   0   0   8   0  11K    0 0.0   0   0   0   5    0    0    1
11:16:51 P  25  1 Sync T/T   0   0  15   0  23K    0 0.0   0   0   0   5    0    0    1
11:16:52 P  25  1 Sync T/T   0   0  11   0  17K    0 0.0   0   0   0   6    0    0    1
11:16:53 P  25  1 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   6    0    0    1

Ok, so our cluster of 1 is back up and taking writes. But, how can we CHANGE MASTER so node3 replicates from node1? First we need to find node3′s last received Xid:

[root@node3 mysql]# mysqlbinlog node3-relay-bin.000002 | tail
MDk5MjkxNDY3MTUtMDkwMzM3NTg2NjgtMTQ0NTI2MDQ5MDItNzgwOTA5MTcyNTc=
'/*!*/;
# at 3247505
#130620 11:13:41 server id 2  end_log_pos 3247513 	Xid = 88085
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

We can tell from SHOW SLAVE STATUS that all of the relay log was processed, so we know the last applied Xid was 88085. Checking node2, we can indeed tell the cluster has progressed further:

[root@node2 mysql]# mysql -e "show global status like 'wsrep_last_committed'"
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 88568 |
+----------------------+-------+

It’s easy to grep for node3′s last Xid in node2′s binlog (if the slave has been disconnected a while, you may have to search a bit more through the binlogs):

[root@node2 mysql]# mysqlbinlog node2.000002 | grep "Xid = 88085"
#130620 11:13:41 server id 2  end_log_pos 2973899 	Xid = 88085

So, on node2, Xid 88085 is in binary log node2.000002, position 2973899.  2973899 is the “end” position, so that should be where we start from on the slave.  Back on node3 we can now issue a correct CHANGE MASTER:

node3 mysql> slave stop;
Query OK, 0 rows affected (0.01 sec)
node3 mysql> change master to master_host='node2', master_log_file='node2.000002', master_log_pos=2973899;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> slave start;
Query OK, 0 rows affected (0.00 sec)
node3 mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: node2
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: node2.000002
          Read_Master_Log_Pos: 7072423
               Relay_Log_File: node3-relay-bin.000002
                Relay_Log_Pos: 4098705
        Relay_Master_Log_File: node2.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
....

Of course, we are still scanning relay logs and binary logs, but matching Xid numbers is a lot easier than trying to match RBR payloads.

Summary

  • Galera writes the seqno (part of its GTID) to node’s binary logs (if enabled) in the Xid field
  • Xid is consistent across all nodes in a cluster, regardless of individual binary log position
  • This Xid field is propagated to slaves and can be matched up in their relay logs
  • Those slaves that are also binary logging with log-slave-updates will overwrite Xid, so you this isn’t a true GTID for async in 5.5.

The post Changing an async slave of a PXC cluster to a new Master appeared first on MySQL Performance Blog.


2 new features added to Percona XtraDB Cluster (PXC) since 5.5.31

$
0
0

With the last Percona XtraDB Cluster (PXC) release, two major features were added:

* a new command to bootstrap the cluster was added to the init script
* SST via Xtrabackup now supports Xtrabackup 2.1 features

In this post, I want to explain the benefits of these added features and how to use them.

If you follow the mysqlperformanceblog regularly, you’ve already noticed that there are several ways to start a node and that it’s not always easy to start the node that is supposed to bootstrap the entire cluster.

See :
- http://www.mysqlperformanceblog.com/2013/01/29/how-to-start-a-percona-xtradb-cluster/
- http://www.percona.com/doc/percona-xtradb-cluster/manual/bootstrap.html
- http://www.lefred.be/node/167
- new undocumented option –wsrep-new-cluster

With the new option, bootstrap-pxc, there is no need to modify my.cnf or add parameters (when supported) to the init script. Now you can simply bootstrap (start the first node that will initiate the cluster) using one of these commands :

/etc/init.d/mysql bootstrap-pxc

or

service mysql bootstrap-pxc

Much easier, isn’t it ?

Now the second added feature is the support of xbstream from Xtrabackup 2.1.

This is useful as it will help to speed up SST operations as the backup could be performed in parallel, compressed and compacted (without copying the secondary indexes).

To configure this, you just need to add two sections to my.cnf: [sst] and [xtrabackup] like this:

[sst]
   streamfmt=xbstream
   [xtrabackup]
   compress
   compact
   parallel=2
   compress-threads=2
   rebuild-threads=2

Some example to compare the effect of these parameters.

We have one database with 4 InnoDB tables of 480M, and we test Xtrabackup SST with the following settings:

Blue:Xtrabackup with tar (default)
Red: Xtrabackup with xbstream and 2 threads (like the configuration above)
Yellow: Xtrabackup with xbstream and 4 threads

SST_XBSTREAM

This quick benchmark was realized on a small dataset with tables not having secondary indexes (no benefits from compact). I will prepare a more detailed benchmark for a future blog post.

But we can see that the new settings helps to “free” the donor faster, but the process on the joiner takes more time.
So it depends on what are your constraints to choose the best options that fit your workload’s requirements.

Remember: to be able to use Xtrabackup as SST you need to add in my.cnf:
wsrep_sst_method=xtrabackup
wsrep_sst_auth=user:password

Important: to be able to use xbstream with compression, you need to have qpress installed on the system and in the mysql user’s path (like /usr/bin for example).

Also the MySQL datadir must be empty before starting the SST process (see lp:1193240)

If you want to get familiar with these new options, I’ve updated my puppet recipes available on github.

Note:
For rebuild-threads, the default is $(grep -c processor /proc/cpuinfo) and for qpress, the decompression is also done in parallel (with number of qpress instances == $(grep -c processor /proc/cpuinfo)).

The post 2 new features added to Percona XtraDB Cluster (PXC) since 5.5.31 appeared first on MySQL Performance Blog.

Percona XtraDB Cluster/ Galera with Percona Monitoring Plugins

$
0
0

The Percona Monitoring Plugins (PMP) provide some free tools to make it easier to monitor PXC/Galera nodes.  Monitoring broadly falls into two categories: alerting and historical graphing, and the plugins support Nagios and Cacti, respectively, for those purposes.

Graphing

An update to the PMP this summer (thanks to our Remote DBA team for supporting this!) added a Galera-specific host template that includes a variety of Galera-related stats, including:

  • Replication traffic and transaction counts and average trx size
  • Inbound and outbound (Send and Recv) queue sizes
  • Parallelization efficiency
  • Write conflicts (Local Cert Failures and Brute Force Aborts)
  • Cluster size
  • Flow control

You can see examples and descriptions of all the graphs in the manual.

Alerting

There is not a Galera-specific Nagios plugin in the PMP yet, but there does exist a check that can pretty universally check any status variable you like called pmp-check-mysql-status.  We can pretty easily adapt this to check some key action-worthy Galera stats, but I hadn’t worked out the details until a customer requested it recently.

Checking for a Primary Cluster

Technically this is a cluster or cluster-partition state for whatever part of the cluster the queried node is a part of.  However, any single node could be disconnected from the rest of the cluster, so checking this on each node should be fine.  We can verify this with this check:

$ /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_cluster_status -C == -T str -c non-Primary
OK wsrep_cluster_status (str) = Primary | wsrep_cluster_status=Primary;;non-Primary;0;

Local node state

We also want to verify the given node is ‘Synced’ into the cluster and not in some other state:

/usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_local_state_comment -C '!=' -T str -w Synced
OK wsrep_local_state_comment (str) = Synced | wsrep_local_state_comment=Synced;;Synced;0;

Note that we are only  warning when the state is not Synced — this is because it is perfectly valid for a node to be in the Donor/Desynced state.  This warning can alert us to a node in a less-than-ideal state without screaming about it, but you could certainly go critical instead.

Verify the Cluster Size

This is a bit of a sanity check, but we want to know how many nodes are in the cluster and either warn if we’re down a single node or go critical if we’re down more.  For a three node cluster, your check might look like this:

# /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_cluster_size -C '<=' -w 2 -c 1
OK wsrep_cluster_size = 3 | wsrep_cluster_size=3;2;1;0;

This is OK when we have 3 nodes, warns at 2 nodes and goes critical at 1 node (when we have no redundancy left).   You could certainly adjust thresholds differently depending on your normative cluster size.   This check is likely meaningless unless we’re also in a Primary cluster, so you could set a service dependency on the Primary Cluster check here.

Check for Flow Control

Flow control is really something to keep an eye on in your cluster. We can monitor the recent state of flow control like this:

/usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_flow_control_paused -w 0.1 -c 0.9
OK wsrep_flow_control_paused = 0.000000 | wsrep_flow_control_paused=0.000000;0.1;0.9;0;

This warns when FC exceeds 10% and goes critical after 90%.  This may need some fine tuning, but I believe it’s a general principle that some small amount of FC might be normal, but you want to know when it starts to get more excessive.

Conclusion

Alerting with Nagios and Graphing with Cacti tend to work best with per-host checks and graphs, but there are aspects of a PXC cluster that you may want to monitor from a cluster-wide perspective.  However, most of the things that can “go wrong” are easily detectable with per-host checks and you can get by without needing a custom script that is Galera-aware.

I’d also always recommend what I call a “service check” that connects through your VIP or load balancer to ensure that MySQL is available (regardless of underlying cluster state) and can do a query.  As long as that works (proving there is at least 1 Primary cluster node), you can likely sleep through any other cluster event.  :)

The post Percona XtraDB Cluster/ Galera with Percona Monitoring Plugins appeared first on MySQL Performance Blog.

Useful MySQL 5.6 features you get for free in PXC 5.6

$
0
0

I get a lot of questions about Percona XtraDB Cluster 5.6 (PXC 5.6), specifically about whether such and such MySQL 5.6 Community Edition feature is in PXC 5.6.  The short answer is: yes, all features in community MySQL 5.6 are in Percona Server 5.6 and, in turn, are in PXC 5.6.  Whether or not the new feature is useful in 5.6 really depends on how useful it is in general with Galera.

I thought it would be useful to highlight a few features and try to show them working:

Innodb Fulltext Indexes

Yes, FTS works in Innodb in 5.6, so why wouldn’t it work in PXC 5.6?  To test this I used the Sakila database , which contains a single table with FULLTEXT.  In the sakila-schema.sql file, it is still designated a MyISAM table:

CREATE TABLE film_text (
  film_id SMALLINT NOT NULL,
  title VARCHAR(255) NOT NULL,
  description TEXT,
  PRIMARY KEY  (film_id),
  FULLTEXT KEY idx_title_description (title,description)
)ENGINE=MyISAM DEFAULT CHARSET=utf8;

I edited that file to change MyISAM to Innodb, loaded the schema and data into my 3 node cluster:

[root@node1 sakila-db]# mysql < sakila-schema.sql
[root@node1 sakila-db]# mysql < sakila-data.sql

and it works seamlessly:

node1 mysql> select title, description, match( title, description) against ('action saga' in natural language mode) as score from sakila.film_text order by score desc limit 5;
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| title | description | score |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| FACTORY DRAGON | A Action-Packed Saga of a Teacher And a Frisbee who must Escape a Lumberjack in The Sahara Desert | 3.0801234245300293 |
| HIGHBALL POTTER | A Action-Packed Saga of a Husband And a Dog who must Redeem a Database Administrator in The Sahara Desert | 3.0801234245300293 |
| MATRIX SNOWMAN | A Action-Packed Saga of a Womanizer And a Woman who must Overcome a Student in California | 3.0801234245300293 |
| REEF SALUTE | A Action-Packed Saga of a Teacher And a Lumberjack who must Battle a Dentist in A Baloon | 3.0801234245300293 |
| SHANE DARKNESS | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin | 3.0801234245300293 |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
5 rows in set (0.00 sec)

Sure enough, I can run this query on any node and it works fine:

node3 mysql> select title, description, match( title, description) against ('action saga' in natural language mode) as score from sakila.film_text order by score desc limit 5;
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| title           | description                                                                                               | score              |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| FACTORY DRAGON  | A Action-Packed Saga of a Teacher And a Frisbee who must Escape a Lumberjack in The Sahara Desert         | 3.0801234245300293 |
| HIGHBALL POTTER | A Action-Packed Saga of a Husband And a Dog who must Redeem a Database Administrator in The Sahara Desert | 3.0801234245300293 |
| MATRIX SNOWMAN  | A Action-Packed Saga of a Womanizer And a Woman who must Overcome a Student in California                 | 3.0801234245300293 |
| REEF SALUTE     | A Action-Packed Saga of a Teacher And a Lumberjack who must Battle a Dentist in A Baloon                  | 3.0801234245300293 |
| SHANE DARKNESS  | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin                          | 3.0801234245300293 |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
5 rows in set (0.05 sec)
node3 mysql> show create table sakila.film_text\G
*************************** 1. row ***************************
       Table: film_text
Create Table: CREATE TABLE `film_text` (
  `film_id` smallint(6) NOT NULL,
  `title` varchar(255) NOT NULL,
  `description` text,
  PRIMARY KEY (`film_id`),
  FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

There might be a few caveats and differences from how FTS works in Innodb vs MyISAM, but it is there.

Minimal replication images

Galera relies heavily on RBR events, but until 5.6 those were entire row copies, even if you only changed a single column in the table. In 5.6 you can change this to send only the updated data using the variable binlog_row_image=minimal.

Using a simple sysbench update test for 1 minute, I can determine the baseline size of the replicated data:

node3 mysql> show global status like 'wsrep_received%';
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| wsrep_received       | 703       |
| wsrep_received_bytes | 151875070 |
+----------------------+-----------+
2 rows in set (0.04 sec)
... test runs for 1 minute...
node3 mysql> show global status like 'wsrep_received%';
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| wsrep_received       | 38909     |
| wsrep_received_bytes | 167749809 |
+----------------------+-----------+
2 rows in set (0.17 sec)

This results in 62.3 MB of data replicated in this test.

If I set binlog_row_image=minimal on all nodes and do a rolling restart, I can see how this changes:

node3 mysql> show global status like 'wsrep_received%';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_received       | 3     |
| wsrep_received_bytes | 236   |
+----------------------+-------+
2 rows in set (0.07 sec)
... test runs for 1 minute...
node3 mysql> show global status like 'wsrep_received%';
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received       | 34005    |
| wsrep_received_bytes | 14122952 |
+----------------------+----------+
2 rows in set (0.13 sec)

This yields a mere 13.4MB, that’s 80% smaller, quite a savings!  This benefit, of course, fully depends on the types of workloads you are doing.

Durable Memcache Cluster

It turns out this feature does not work properly with Galera, see below for an explanation:

5.6 introduces an Memcached interface for Innodb.  This means any standard memcache client can talk to our PXC nodes with the memcache protocol and the data is:

  • Replicated to all nodes
  • Durable across the cluster
  • Highly available
  • Easy to hash memcache clients across all servers for better cache coherency

To set this up, we need to simply load the innodb_memcache schema from the example and restart the daemon to get a listening memcached port:

[root@node1 ~]# mysql < /usr/share/mysql/innodb_memcached_config.sql
[root@node1 ~]# service mysql restart
Shutting down MySQL (Percona XtraDB Cluster)...... SUCCESS!
Starting MySQL (Percona XtraDB Cluster)...... SUCCESS!
[root@node1 ~]# lsof +p`pidof mysqld` | grep LISTEN
mysqld  31961 mysql   11u  IPv4             140592       0t0      TCP *:tram (LISTEN)
mysqld  31961 mysql   55u  IPv4             140639       0t0      TCP *:memcache (LISTEN)
mysqld  31961 mysql   56u  IPv6             140640       0t0      TCP *:memcache (LISTEN)
mysqld  31961 mysql   59u  IPv6             140647       0t0      TCP *:mysql (LISTEN)

This all appears to work and I can fetch the sample AA row from all the nodes with the memcached interface:

node1 mysql> select * from demo_test;
+-----+--------------+------+------+------+
| c1  | c2           | c3   | c4   | c5   |
+-----+--------------+------+------+------+
| AA  | HELLO, HELLO |    8 |    0 |    0 |
+-----+--------------+------+------+------+
[root@node3 ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
get AA
VALUE AA 8 12
HELLO, HELLO
END

However, if I try to update a row, it does not seem to replicate (even if I set innodb_api_enable_binlog):

[root@node3 ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set DD 0 0 0
STORED
^]
telnet> quit
Connection closed.
node3 mysql> select * from demo_test;
+----+--------------+------+------+------+
| c1 | c2           | c3   | c4   | c5   |
+----+--------------+------+------+------+
| AA | HELLO, HELLO |    8 |    0 |    0 |
| DD |              |    0 |    1 |    0 |
+----+--------------+------+------+------+
2 rows in set (0.00 sec)
node1 mysql> select * from demo_test;
+-----+--------------+------+------+------+
| c1  | c2           | c3   | c4   | c5   |
+-----+--------------+------+------+------+
| AA  | HELLO, HELLO |    8 |    0 |    0 |
+-----+--------------+------+------+------+

So unfortunately the memcached plugin must use some backdoor to Innodb that Galera is unaware of. I’ve filed a bug on the issue, but it’s not clear if there will be an easy solution or if a whole lot of code will be necessary to make this work properly.

In the short-term, however, you can at least read data from all nodes with the memcached plugin as long as data is only written using the standard SQL interface.

Async replication GTID Integration

Async GTIDs were introduced in 5.6 in order to make CHANGE MASTER easier.  You have always been able to use async replication from any cluster node, but now with this new GTID support, it is much easier to failover to another node in the cluster as a new master.

If we take one node out of our cluster to be a slave and enable GTID binary logging on the other two by adding these settings:

server-id = ##
log-bin = cluster_log
log-slave-updates
gtid_mode = ON
enforce-gtid-consistency

If I generate some writes on the cluster, I can see GTIDs are working:

node1 mysql> show master status\G
*************************** 1. row ***************************
File: cluster_log.000001
Position: 573556
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1505
1 row in set (0.00 sec)
node2 mysql> show master status\G
*************************** 1. row ***************************
File: cluster_log.000001
Position: 560011
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1505
1 row in set (0.00 sec)

Notice that we’re at GTID 1505 on both nodes, even though the binary log position happens to be different.

I set up my slave to replicate from node1 (.70.2):

node3 mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.70.2
                  Master_User: slave
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
           Retrieved_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1506
            Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1506
                Auto_Position: 1
1 row in set (0.00 sec)

And it’s all caught up.  If put some load on the cluster, I can easily change to node2 as my master without needing to stop writes:

node3 mysql> stop slave;
Query OK, 0 rows affected (0.09 sec)
node3 mysql> change master to master_host='192.168.70.3', master_auto_position=1;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> start slave;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.70.3
                  Master_User: slave
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
            Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-3712
                Auto_Position: 1

So this seems to work pretty well. It does turns out there is a bit of a bug, but it’s actually with Xtrabackup — currently the binary logs are not copied in Xtrabackup SST and this can cause GTID inconsistencies within nodes in the cluster.  I would expect this to get fixed relatively quickly.

Conclusion

MySQL 5.6 introduces a lot of new interesting features that are even more compelling in the PXC/Galera world.  If you want to experiment for yourself, I pushed the Vagrant environment I used to Github at: https://github.com/jayjanssen/pxc_56_features

The post Useful MySQL 5.6 features you get for free in PXC 5.6 appeared first on MySQL Performance Blog.

The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB

$
0
0

Most technologies achieving high-availability for MySQL need a load-balancer to spread the client connections to a valid database host, even the Tungsten special connector can be seen as a sophisticated load-balancer. People often use hardware load balancer or software solution like haproxy. In both cases, in order to avoid having a single point of failure, multiple load balancers must be used. Load balancers have two drawbacks: they increase network latency and/or they add a validation check load on the database servers. The increased network latency is obvious in the case of standalone load balancers where you must first connect to the load balancer which then completes the request by connecting to one of the database servers. Some workloads like reporting/adhoc queries are not affected by a small increase of latency but other workloads like oltp processing and real-time logging are. Each load balancers must also check regularly if the database servers are in a sane state, so adding more load balancers increases the idle chatting over the network. In order to reduce these impacts, a very different type of load balancer is needed, let me introduce the Iptables ClusterIP target.

Normally, as stated by the RFC 1812 Requirements for IP Version 4 Routers an IP address must be unique on a network and each host must respond only for IPs it own. In order to achieve a load balancing behavior, the Iptables ClusterIP target doesn’t strictly respect the RFC. The principle is simple, each computer in the cluster share an IP address and MAC address with the other members but it answers requests only for a given subset, based on the modulo of a network value which is sourceIP-sourcePort by default. The behavior is controlled by an iptables rule and by the content of the kernel file /proc/net/ipt_CLUSTERIP/VIP_ADDRESS. The kernel /proc file just informs the kernel to which portion of the traffic it should answer. I don’t want to go too deep in the details here since all those things are handled by the Pacemaker resource agent, IPaddr2.

The IPaddr2 Pacemaker resource agent is commonly used for VIP but what is less know is its behavior when defined as part of a clone set. When part of clone set, the resource agent defines a VIP which uses the Iptables ClusterIP target, the iptables rules and the handling of the proc file are all done automatically. That seems very nice in theory but until recently, I never succeeded in having a suitable distribution behavior. When starting the clone set on, let’s say, three nodes, it distributes correctly, one instance on each but if 2 nodes fail and then recover, the clone instances all go to the 3rd node and stay there even after the first two nodes recover. That bugged me for quite a while but I finally modified the resource agent and found a way to have it work correctly. It also now set correctly the MAC address if none is provided to the MAC multicast address domain which starts by “01:00:5E”. The new agent, IPaddr3, is available here. Now, let’s show what we can achieve with it.

We’ll start from the setup described in my previous post and we’ll modify it. First, download and install the IPaddr3 agent.

root@pacemaker-1:~# wget -O /usr/lib/ocf/resource.d/percona/IPaddr3 https://github.com/percona/percona-pacemaker-agents/raw/master/agents/IPaddr3
root@pacemaker-1:~# chmod u+x /usr/lib/ocf/resource.d/percona/IPaddr3

Repeat these steps on all 3 nodes. Then, we’ll modify the pacemaker configuration like this (I’ll explain below):

node pacemaker-1 \
        attributes standby="off"
node pacemaker-2 \
        attributes standby="off"
node pacemaker-3 \
        attributes standby="off"
primitive p_cluster_vip ocf:percona:IPaddr3 \
        params ip="172.30.212.100" nic="eth1" \
        meta resource-stickiness="0" \
        op monitor interval="10s"
primitive p_mysql_monit ocf:percona:mysql_monitor \
        params reader_attribute="readable_monit" writer_attribute="writable_monit" user="repl_user" password="WhatAPassword" pid="/var/lib/mysql/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" max_slave_lag="5" cluster_type="pxc" \
        op monitor interval="1s" timeout="30s" OCF_CHECK_LEVEL="1"
clone cl_cluster_vip p_cluster_vip \
        meta clone-max="3" clone-node-max="3" globally-unique="true"
clone cl_mysql_monitor p_mysql_monit \
        meta clone-max="3" clone-node-max="1"
location loc-distrib-cluster-vip cl_cluster_vip \
        rule $id="loc-distrib-cluster-vip-rule" -1: p_cluster_vip_clone_count gt 1
location loc-enable-cluster-vip cl_cluster_vip \
        rule $id="loc-enable-cluster-vip-rule" 2: writable_monit eq 1
location loc-no-cluster-vip cl_cluster_vip \
        rule $id="loc-no-cluster-vip-rule" -inf: writable_monit eq 0
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1384275025" \
        maintenance-mode="off"

First, the VIP primitive is modified to use the new agent, IPaddr3, and we set resource-stickiness=”0″. Next, we define the cl_cluster_vip clone set using: clone-max=”3″ to have three instances, clone-node-max=”3″ to allow up to three instances on the same node and globally-unique=”true” to tell Pacemaker it has to allocate an instance on a node even if there’s already one. Finally, there’re three location rules needed to get the behavior we want, one using the p_cluster_vip_clone_count attribute and the other two around the writable_monit attribute. Enabling all that gives:

root@pacemaker-1:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:51:38 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

and the network configuration is:

root@pacemaker-1:~# iptables -L INPUT -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
CLUSTERIP  all  --  0.0.0.0/0            172.30.212.100       CLUSTERIP hashmode=sourceip-sourceport clustermac=01:00:5E:91:18:86 total_nodes=3 local_node=1 hash_init=0
root@pacemaker-1:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
2
root@pacemaker-2:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
3
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1

In order to test the access, you need to query the VIP from a fourth node:

root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-1
pacemaker-1
pacemaker-2
pacemaker-2
pacemaker-2
pacemaker-3
pacemaker-2
^C

So, all good… Let’s now desync the pacemaker-1 and pacemaker-2.

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:53:51 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-3
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 3
    + readable_monit                    : 1
    + writable_monit                    : 1
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1,2,3
root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3

Now, if pacemaker-1 and pacemaker-2 are back in sync, we have the desired distribution:

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:58:40 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

All the clone instances redistributed on all nodes as we wanted.

As a conclusion, Pacemaker with a clone set of IPaddr3 is a very interesting kind of load balancer, especially if you already have pacemaker deployed. It introduces almost no latency, it doesn’t need any other hardware, doesn’t increase the database validation load and is as highly-available as your database is. The only drawback I can see is in a case where the inbound traffic is very important. In that case, all nodes are receiving all the traffic and are equally saturated. With databases and web type traffics, the inbound traffic is usually small. This solution also doesn’t redistribute the connections based on the server load like a load balancer can do but that would be fairly easy to implement with something like a server_load attribute and an agent similar to mysql_monitor but that will check the server load instead of the database status. In such a case, I suggest using much more than 1 VIP clone instance per node to have a better granularity in load distribution. Finally, the ClusterIP target, although still fully supported, has been deprecated in favor of the Cluster-match target. It is basically the same principle and I plan to adapt the IPaddr3 agent to Cluster-match in a near future.

The post The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB appeared first on MySQL Performance Blog.

ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring

$
0
0

ClusterControl for Percona XtraDB Cluster is now available in three different versions thanks to our partnership with Severalnines. ClusterControl will make it simpler to manage and monitor Percona XtraDB Cluster, MySQL Cluster, MySQL Replication, or MySQL Galera.

I am very excited about our GA release of Percona XtraDB Cluster 5.6 last week. As Vadim described in his blog post announcing the release, we have brought together the benefits of Percona Server 5.6, Percona XtraBackup and Galera 3 to create a drop-in compatible, open source, state-of-the-art high availability MySQL clustering solution. We could not have done it without the close cooperation and help of our partners at Codership. To learn more, join us on February 5th for a webinar entitled “What’s New in Percona XtraDB Cluster 5.6” presented by Jay Janssen.

While these new features in Percona XtraDB Cluster are highly valuable, we regularly receive feedback from users that they would like more tools to make it easier to manage and monitor their MySQL clusters. In response to that feedback, I’m pleased to announce our partnership with Severalnines to make ClusterControl more available to Percona XtraDB Cluster users and, in fact, all of our Support customers who use MySQL clustering solutions.

Percona XtraDB Cluster users now have three choices:

On the last option, Percona ClusterControl is a privately branded version of ClusterControl supplied to us by Severalnines. With a Percona Support contract covering Percona XtraDB Cluster, MySQL Cluster, or MySQL Galera, we provide support for your deployment and use of Percona ClusterControl. Percona has worked directly with Severalnines to ensure full compatibility of ClusterControl Community Edition with Percona XtraDB Cluster. The Percona Support team will provide assistance with the configuration and administration of Percona ClusterControl. We will help you get setup and then we will help you make the most of your new management and monitoring tools for the high availability solution of your choice.

We will also provide support for Severalnines ClusterControl Enterprise Edition when it is purchased from Severalnines or Percona and used with a cluster deployment covered by a Percona Support subscription that includes coverage of Percona XtraDB Cluster, MySQL Galera, or MySQL Cluster. ClusterControl Enterprise Edition is the de-facto management solution for Galera-based clusters and it provides the highest levels of monitoring and management capabilities for Percona XtraDB Cluster users.

Percona ClusterControl will provide our Support customers broad flexibility to deploy, monitor, and manage Percona XtraDB Clusters, supporting on-premise and Amazon Web Services deployments, as well as deployments spread across multiple cloud vendors. Percona ClusterControl can be used to automate the deployment of Percona XtraDB Cluster in cross-region, multi-datacenter setups. Percona ClusterControl offers a full range of management functionality for clusters, including:

  • Node and cluster recovery
  • Configuration management
  • Performance health checks
  • Online backups
  • Upgrades
  • Scale out
  • Cloning of clusters
  • Deploying HAProxy load balancers

Percona ClusterControl can be used with multiple clustering solutions including MySQL Cluster, MySQL Replication, MySQL Galera, MariaDB Galera Cluster, and MongoDB. Percona Support subscribers that elect coverage for these clustering solutions receive Percona ClusterControl with their subscriptions.

Just as we made Percona XtraDB Cluster a much better solution by partnering with Codership, we now expect it will be a much easier to manage and monitor solution thanks to our partnership with Severalnines. Whether you need Percona Support or not, I hope you will take advantage of these powerful management and monitoring solutions which are now available for Percona XtraDB Cluster. To learn more about using ClusterControl to manage Percona XtraDB Cluster, please join us on February 19th for a webinar entitled “Performance Monitoring and Troubleshooting of Percona XtraDB Cluster” presented by Peter Boros.

The post ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring appeared first on MySQL Performance Blog.

Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6

$
0
0

Overview

Percona XtraDB Cluster 5.6 has been GA for several months now and people are thinking more and more about moving from 5.5 to 5.6. Most people don’t want to upgrade all at once, but would prefer a rolling upgrade to avoid downtime and ensure 5.6 is behaving in a stable fashion before putting all of production on it. The official guide to a rolling upgrade can be found in the PXC 5.6 manual. This blog post will attempt to summarize the basic process.

However, there are a few caveats to trying to do a rolling 5.6 upgrade from 5.5:

  1. If you mix Galera 2 and Galera 3 nodes, you must set wsrep_provider_options=”socket.checksum=1″ on the Galera 3 nodes for backwards compatibility between Galera versions.
  2. You must set some 5.6 settings for 5.5 compatibility with respect to replication:
    1. You can’t enable GTID async replication on the 5.6 nodes
    2. You should use –log-bin-use-v1-row-events on the 5.6 nodes
    3. You should set binlog_checksum=NONE  on the 5.6 nodes
  3. You must not SST a 5.5 donor to a 5.6 joiner as the SST script does not handle mysql_upgrade
  4. You should set the 5.6 nodes read_only and not write to them!

The basic upgrade flow

The basic upgrade flow is:

  1. For some node(s): upgrade to 5.6, do all the above stuff, put them into a read_only pool
  2. Repeat step 1 as desired
  3. Once your 5.6 pool is sufficiently large, cut over writes to the 5.6 pool (turn off read_only, etc.) and upgrade the rest.

This is, in essence, exactly like upgrading a 5.5 master/slave cluster to 5.6 — you upgrade the slaves first, promote a slave and upgrade the master; we just have more masters to think about.

Once your upgrade is fully to 5.6, then you can go back through and remove all the 5.5 backwards compatibility

Why can’t I write to the 5.6 nodes?

The heaviest caveat is probably the fact that in a mixed 5.5 / 5.6 cluster, you are not supposed to write to the 5.6 nodes.  Why is that?  Well, the reason goes back to MySQL itself.  PXC/Galera uses standard RBR binlog events from MySQL for replication.   Replication between major MySQL versions is only ever officially supported:

  • across 1 major version (i.e., 5.5 to 5.6, though multiple version hops do often work)
  • from a lower version master to a higher version slave (i.e., 5.5 is your master and 5.6 is your slave, but not the other way around)

This compatibility requirement (which has existed for a very long time in MySQL) works great when you have a single Master replication topology, but true multi-master (multi-writer) has obviously never been considered.

Some alternatives

Does writing to the 5.6 nodes REALLY break things?

The restriction on 5.6 masters of 5.5 slaves is probably too strict in many cases.  Technically only older to newer replication is ever truly supported, but in practice you may be able to run a mixed cluster with writes to all nodes as long as you are careful.  This means (at least) that any modifications to column type formats in the newer version NOT be upgraded while the old version remains active in the cluster.  There might be other issues, I’m not sure, I cannot say I’ve tested every possible circumstance.

So, can I truly say I recommend this?  I cannot say that officially, but you may find it works fine.  As long as you acknowledge that something unforeseen may break your cluster and your migration plan, it may be reasonable.  If you decide to explore this option, please test this thoroughly and be willing to accept the consequences of it not working before trying it in production!

Using Async replication to upgrade

Another alternative is rather than trying to mix the clusters and keeping 5.6 nodes read_only, why not just setup the 5.6 cluster as an async slave of your 5.5 cluster and migrate your application to the new cluster when you are ready?  This is practically the same as maintaining a split 5.5/5.6 read_write/read_only cluster without so much risk and a smaller list of don’ts.  Cutover in this case would be effectively like promoting a 5.6 slave to master, except you would promote the 5.6 cluster.

One caveat with this approach might be dealing with replication throughput:  async may not be able to keep up replicating your 5.5 cluster writes to a separate 5.6 cluster.  Definitely check out wsrep_preordered to speed things up, it may help.  But realize some busy workloads just may not ever be able to use async into another cluster.

Just take the outage

A final alternative for this post is the idea of simply upgrading the entire cluster to 5.6 all at once during a maintenance window.  I grant that this defeats the point of a rolling upgrade, but it may offer a lot of simplicity in the longer run.

Conclusion

A rolling PXC / Galera upgrade across major MySQL versions is limited by the fact that there is no official support or reason for Oracle to support newer master to older slave.  In practice, it may work much of the time, but these situations should be considered carefully and the risk weighed against all other options.

The post Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6 appeared first on MySQL Performance Blog.

MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training)

$
0
0

Percona XtraDB Cluster 5.6I’ve had the opportunity to train lots of people on Percona XtraDB Cluster (PXC) over the last few years the product has existed.  This has taken the form of phone calls, emails, blog posts, webinars, and consulting engagements. This doesn’t count all the time I’ve spent grilling Codership on how things actually work.  But it has culminated in the PXC tutorial I have given annually at Percona Live MySQL Conference and Expo in Santa Clara, California for the last two years.  Baron even attended this year and had this say:

“Jay Janssen’s tutorial on Percona XtraDB Cluster was impressive. I can’t imagine how much time he must have spent preparing for that.”

(Baron: Thanks for this compliment, BTW)  The answer is: I haven’t kept track, but I’ve delivered it 3 times at conferences and it gets better each time.  I started working on it during the summer of 2012 for Percona Live NYC that year and have put in countless hours since making it better.  This year, despite my voice problems, I was really happy with how it came out.

We’ve had lots of requests for repeats of this tutorial and with more detail.  As such, I’ll be doing a 2-day training about MySQL HA with Percona XtraDB Cluster in sunny San Jose, California on July 16th -17th 2014.  This Percona MySQL Training session is appropriately titled, “MySQL High Availability With Percona XtraDB Cluster” and you can get more information and also register here.

This course will include an overview of how Percona XtraDB Cluster fits into the MySQL High Availability world, followed by a deep-dive into PXC.  All the content of the tutorial will be covered, but it will be spread out with more details on all the components and important bits of using PXC and Galera.

If you’re unfamiliar with my tutorial — it is intensively hands-on.  This will not be 2 days of lecture: You’ll be setting up and experimenting with a real cluster on the fly during the class. Expect plenty of command-line work and to leave with a much better sense of how to truly operate and use a Percona XtraDB Cluster.

Once again, you can check out the details on our website and register NOW to attend right here.

The post MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training) appeared first on MySQL Performance Blog.


Semi-Sync replication performance in MySQL 5.7.4 DMR

$
0
0

I was interested to hear about semi-sync replication improvements in MySQL’s 5.7.4 DMR release and decided to check it out.  I previously blogged about poor semi-sync performance and was pretty disappointed from semi-sync’s performance across WAN distances back then, particularly with many client threads.

The Test

The basic environment of these tests was:

  • AWS EC2 m3.medium instances
  • Master in us-east-1, slave in us-west-1 (~78ms ping RTT)
  • CentOS 6.5
  • innodb_flush_log_at_trx_commit=1
  • sync_binlog=1
  • Semi-sync replication plugin installed and enabled.
  • GTID’s enabled (except on 5.5)
  • sysbench 0.5 update_index.lua test, 60 seconds, 250k table size.
  • MySQL 5.7 was tested with both AFTER_SYNC and AFTER_COMMIT settings for rpl_semi_sync_master_wait_point
  • I tested Percona XtraDB Cluster 5.6 / Galera 3.5 as well by means of comparison

Without further ado, here’s the TpmC results I got for a single client thread:

 

These graphs are interactive, so mouse-over for more details. I’m using log scales to better highlight the differences.

The blue bars represent transactions per second (more is better).  The red bars represent average latency per transaction per client (less is better).  Remember these transactions are synchronously being copied across the US before the client can execute another.

The first test is our control:  Async allows ~273 TPS on a single thread.  Once we introduce synchronicity, we clearly see the bulk of the time is that round trip.  Note that MySQL 5.5 and 5.6 are a bit higher than MySQL 5.7 and Percona XtraDB Cluster, the latter of which show pretty similar results.

Adding parallelism

This gets more interesting to see if we redo the same tests, but with 32 test threads:

In the MySQL 5.5 and 5.6 tests, we can clearly see nasty serialization.  Both really don’t allow more performance than single threaded sysbench.  I was happy to see, however, that this seems to be dramatically improved in MySQL 5.7, nice job Oracle!

AFTER_SYNC and AFTER_COMMIT vary, but AFTER_SYNC is the default and is preferred over AFTER_COMMIT.  The reasoning here is AFTER_SYNC forces the semi-sync wait BEFORE the transaction is committed on the master.  The client still must wait for the semi-sync in AFTER_COMMIT, but other transactions may see its data on the master BEFORE we confirm the semi-sync slave has received it.  This is potentially bad because if the master crashed at that instant, clients may have read data from the master that did not make it to a failover slave.    This is a type of ‘phantom read’ and Yoshinori explains it in more detail here.

What about Percona XtraDB Cluster?

I also want to discuss the Percona XtraDB Cluster results, Galera here is somewhat slower than MySQL 5.7 semi-sync.  There may be some enhancements to Galera that can be made (competition is a good thing), but there are still some significant differences here:

  • Galera allows for writing on any and all nodes, semi-sync does not
  • Galera introduces the certification process to check for conflicts, Semi-sync does not
  • Galera is not 2-phase commit and transactions are not committed synchronously anywhere except the node originating the transaction.  So, it is similar to Semi-sync in this way.
  • I ran the Galera tests with no log-bin (Galera does not require it)
  • I ran the Galera tests with innodb_flush_log_at_trx_commit=1
  • I set the fc_limit on the second node really high to eliminate Flow control as a bottleneck.  In a live cluster, it would typically be needed.
  • Galera provides parallel slave threads for faster apply, but it doesn’t matter here because I set the fc_limit so high

TL;DR

Semi-sync in MySQL 5.7 looks like a great improvement.  Any form of synchronicity is always going to be expensive, particularly over 10s and 100s of milliseconds of latency.  With MySQL 5.7, I’d be much more apt to recommend semi-sync as an option than in previous releases.  Thanks to Oracle for investing here.

The post Semi-Sync replication performance in MySQL 5.7.4 DMR appeared first on MySQL Performance Blog.

Changing an async slave of a PXC cluster to a new Master

$
0
0

Async and PXC

A common question I get about Percona XtraDB Cluster is if you can mix it with asynchronous replication, and the answer is yes!  You can pick any node in your cluster and it can either be either a slave or a master just like any other regular MySQL standalone server (Just be sure to use log-slave-updates in both cases on the node in question!).  Consider this architecture:

Canvas 1

However, there are some caveats to be aware of.  If you slave from a cluster node, there is no built in mechanism to fail that slave over automatically to another master node in your cluster.  You cannot assume that the binary log positions are the same on all nodes in your cluster (even if they start binary logging at the same time), so you can’t issue a CHANGE MASTER without knowing the proper binary log position to start at.

Canvas 2

Until recently, I thought it was not possible to easily do this without scanning both the relay log on the slave and the binary log on the new master and trying to match up the (binary) replication event payloads somehow.  It turns out it’s actually quite easy to do without the old master being available and without pausing writes on your cluster.

Galera Seqnos and Binary log Xids

Given the above scenario, we have a 2 node cluster with node3 async replicating from node1.  Writes are going to node2.  If I stop writes briefly and look at node1’s binary log, I can see something interesting:

[root@node1 mysql]# mysqlbinlog node1.000004  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 442923
#130620 11:07:00 server id 2  end_log_pos 442950 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Notice the Xid field. Galera is modifying this field that exists in a standard binlog with its last commited seqno:

[root@node1 ~]# mysql -e "show global status like 'wsrep_last_committed'";
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 86277 |
+----------------------+-------+

If I check node2, sure enough its latest binlog has the same Xid, even though it’s in a different file at a different position:

[root@node2 mysql]# mysqlbinlog node2.000002  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 162077
#130620 11:07:00 server id 2  end_log_pos 162104 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

So, with both nodes doing log-bin (and both with log-slave-updates), even though they are on different log files, they still have the same Xid that is very easy to search for.

But, does our slave (node3) know about these Xids? If we check the latest relay log:

[root@node3 mysql]# mysqlbinlog node3-relay-bin.000002  | tail
MzkzODgyMjg3NDctMjQwNzE1NDU1MjEtNTkxOTQwMzk0MTYtMjU3MTQxNDkzMzI=
'/*!*/;
# at 442942
#130620 11:07:00 server id 2  end_log_pos 442950 	Xid = 86277
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Yes, indeed! We’re assuming the SQL thread on this slave has processed everything in the relay log and, if it has, then we know that the Galera transaction seqno 86277 was the last thing written on the slave. If it hasn’t, you’d have to find the last position applied by the SQL thread using SHOW SLAVE STATUS and find the associated Xid with that position.

Testing it out

Now, let’s simulate a failure to see if we can use this information to our advantage. I have write load running against node2 and I kill node1:

root@node2 mysql]# myq_status wsrep
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:13:33 P  24  2 Sync T/T   0   0  2k  11 2.8M  258 0.0   0   0   0   9    0    0    1
11:13:34 P  24  2 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   9    0    0    1
11:13:35 P  24  2 Sync T/T   0   0  14   0  22K    0 0.0   0   0   0   8    0    0    1
11:13:36 P  24  2 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   8    0    0    1
11:13:37 P  24  2 Sync T/T   0   0   7   0  11K    0 0.0   0   0   0   8    0    0    1
11:13:38 P  24  2 Sync T/T   0   0   5   0 7.5K    0 0.0   0   0   0   8    0    0    1
11:13:39 P  24  2 Sync T/T   0   0   6   0 9.3K    0 0.0   0   0   0   8    0    0    1
11:13:40 P  24  2 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   8    0    0    1
11:13:41 P  24  2 Sync T/T   0   0   8   0  12K    0 0.0   0   0   0   8    0    0    1
11:13:42 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:43 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:44 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:45 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:47 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:48 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:13:49 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:50 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:51 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:52 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:53 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:54 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:55 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:56 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:57 P  24  2 Sync T/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:13:58 N 615  1 Init F/T   0   0   0   2    0  234 0.0   0   0   0   8    0    0    0
11:13:59 N 615  1 Init F/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0
11:14:00 N 615  1 Init F/T   0   0   0   0    0    0 0.0   0   0   0   8    0    0    0

The node failure takes a little while. This is because it was a 2 node cluster and we just lost quorum. Note the node goes to the Init state and it is also non-Primary. This also happened to kill our test client. If we check our slave, we can see it is indeed disconnected since node1 went away:

node3 mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: node1
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: node1.000004
          Read_Master_Log_Pos: 3247513
               Relay_Log_File: node3-relay-bin.000002
                Relay_Log_Pos: 3247532
        Relay_Master_Log_File: node1.000004
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
...

Since our remaining node is non-Primary, we need to bootstrap it first and restart our load. (Note: This is only because it was a 2 node cluster, this is not strictly necessary to do this procedure in general).

[root@node2 mysql]# mysql -e 'set global wsrep_provider_options="pc.bootstrap=true"'
[root@node2 mysql]# myq_status wsrep
mycluster / node2 / Galera 2.5(r150)
Wsrep    Cluster  Node     Queue   Ops     Bytes     Flow    Conflct PApply        Commit
    time P cnf  #  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt lcf bfa dst oooe oool wind
11:16:46 P  25  1 Sync T/T   0   0  2k  17 3.0M 1021 0.0   0   0   0   5    0    0    1
11:16:47 P  25  1 Sync T/T   0   0  10   0  16K    0 0.0   0   0   0   5    0    0    1
11:16:48 P  25  1 Sync T/T   0   0  13   0  20K    0 0.0   0   0   0   5    0    0    1
11:16:49 P  25  1 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   5    0    0    1
11:16:50 P  25  1 Sync T/T   0   0   8   0  11K    0 0.0   0   0   0   5    0    0    1
11:16:51 P  25  1 Sync T/T   0   0  15   0  23K    0 0.0   0   0   0   5    0    0    1
11:16:52 P  25  1 Sync T/T   0   0  11   0  17K    0 0.0   0   0   0   6    0    0    1
11:16:53 P  25  1 Sync T/T   0   0  10   0  15K    0 0.0   0   0   0   6    0    0    1

Ok, so our cluster of 1 is back up and taking writes. But, how can we CHANGE MASTER so node3 replicates from node1? First we need to find node3’s last received Xid:

[root@node3 mysql]# mysqlbinlog node3-relay-bin.000002 | tail
MDk5MjkxNDY3MTUtMDkwMzM3NTg2NjgtMTQ0NTI2MDQ5MDItNzgwOTA5MTcyNTc=
'/*!*/;
# at 3247505
#130620 11:13:41 server id 2  end_log_pos 3247513 	Xid = 88085
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

We can tell from SHOW SLAVE STATUS that all of the relay log was processed, so we know the last applied Xid was 88085. Checking node2, we can indeed tell the cluster has progressed further:

[root@node2 mysql]# mysql -e "show global status like 'wsrep_last_committed'"
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 88568 |
+----------------------+-------+

It’s easy to grep for node3’s last Xid in node2’s binlog (if the slave has been disconnected a while, you may have to search a bit more through the binlogs):

[root@node2 mysql]# mysqlbinlog node2.000002 | grep "Xid = 88085"
#130620 11:13:41 server id 2  end_log_pos 2973899 	Xid = 88085

So, on node2, Xid 88085 is in binary log node2.000002, position 2973899.  2973899 is the “end” position, so that should be where we start from on the slave.  Back on node3 we can now issue a correct CHANGE MASTER:

node3 mysql> slave stop;
Query OK, 0 rows affected (0.01 sec)
node3 mysql> change master to master_host='node2', master_log_file='node2.000002', master_log_pos=2973899;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> slave start;
Query OK, 0 rows affected (0.00 sec)
node3 mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: node2
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: node2.000002
          Read_Master_Log_Pos: 7072423
               Relay_Log_File: node3-relay-bin.000002
                Relay_Log_Pos: 4098705
        Relay_Master_Log_File: node2.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
....

Of course, we are still scanning relay logs and binary logs, but matching Xid numbers is a lot easier than trying to match RBR payloads.

Summary

  • Galera writes the seqno (part of its GTID) to node’s binary logs (if enabled) in the Xid field
  • Xid is consistent across all nodes in a cluster, regardless of individual binary log position
  • This Xid field is propagated to slaves and can be matched up in their relay logs
  • Those slaves that are also binary logging with log-slave-updates will overwrite Xid, so you this isn’t a true GTID for async in 5.5.

The post Changing an async slave of a PXC cluster to a new Master appeared first on MySQL Performance Blog.

2 new features added to Percona XtraDB Cluster (PXC) since 5.5.31

$
0
0

With the last Percona XtraDB Cluster (PXC) release, two major features were added:

* a new command to bootstrap the cluster was added to the init script
* SST via Xtrabackup now supports Xtrabackup 2.1 features

In this post, I want to explain the benefits of these added features and how to use them.

If you follow the mysqlperformanceblog regularly, you’ve already noticed that there are several ways to start a node and that it’s not always easy to start the node that is supposed to bootstrap the entire cluster.

See :
http://www.percona.com/blog/2013/01/29/how-to-start-a-percona-xtradb-cluster/
http://www.percona.com/doc/percona-xtradb-cluster/manual/bootstrap.html
http://www.lefred.be/node/167
new undocumented option –wsrep-new-cluster

With the new option, bootstrap-pxc, there is no need to modify my.cnf or add parameters (when supported) to the init script. Now you can simply bootstrap (start the first node that will initiate the cluster) using one of these commands :

/etc/init.d/mysql bootstrap-pxc

or

service mysql bootstrap-pxc

Much easier, isn’t it ?

Now the second added feature is the support of xbstream from Xtrabackup 2.1.

This is useful as it will help to speed up SST operations as the backup could be performed in parallel, compressed and compacted (without copying the secondary indexes).

To configure this, you just need to add two sections to my.cnf: [sst] and [xtrabackup] like this:

[sst]
   streamfmt=xbstream
   [xtrabackup]
   compress
   compact
   parallel=2
   compress-threads=2
   rebuild-threads=2

Some example to compare the effect of these parameters.

We have one database with 4 InnoDB tables of 480M, and we test Xtrabackup SST with the following settings:

Blue:Xtrabackup with tar (default)
Red: Xtrabackup with xbstream and 2 threads (like the configuration above)
Yellow: Xtrabackup with xbstream and 4 threads

SST_XBSTREAM

This quick benchmark was realized on a small dataset with tables not having secondary indexes (no benefits from compact). I will prepare a more detailed benchmark for a future blog post.

But we can see that the new settings helps to “free” the donor faster, but the process on the joiner takes more time.
So it depends on what are your constraints to choose the best options that fit your workload’s requirements.

Remember: to be able to use Xtrabackup as SST you need to add in my.cnf:
wsrep_sst_method=xtrabackup
wsrep_sst_auth=user:password

Important: to be able to use xbstream with compression, you need to have qpress installed on the system and in the mysql user’s path (like /usr/bin for example).

Also the MySQL datadir must be empty before starting the SST process (see lp:1193240)

If you want to get familiar with these new options, I’ve updated my puppet recipes available on github.

Note:
For rebuild-threads, the default is $(grep -c processor /proc/cpuinfo) and for qpress, the decompression is also done in parallel (with number of qpress instances == $(grep -c processor /proc/cpuinfo)).

The post 2 new features added to Percona XtraDB Cluster (PXC) since 5.5.31 appeared first on MySQL Performance Blog.

Percona XtraDB Cluster/ Galera with Percona Monitoring Plugins

$
0
0

The Percona Monitoring Plugins (PMP) provide some free tools to make it easier to monitor PXC/Galera nodes.  Monitoring broadly falls into two categories: alerting and historical graphing, and the plugins support Nagios and Cacti, respectively, for those purposes.

Graphing

An update to the PMP this summer (thanks to our Remote DBA team for supporting this!) added a Galera-specific host template that includes a variety of Galera-related stats, including:

  • Replication traffic and transaction counts and average trx size
  • Inbound and outbound (Send and Recv) queue sizes
  • Parallelization efficiency
  • Write conflicts (Local Cert Failures and Brute Force Aborts)
  • Cluster size
  • Flow control

You can see examples and descriptions of all the graphs in the manual.

Alerting

There is not a Galera-specific Nagios plugin in the PMP yet, but there does exist a check that can pretty universally check any status variable you like called pmp-check-mysql-status.  We can pretty easily adapt this to check some key action-worthy Galera stats, but I hadn’t worked out the details until a customer requested it recently.

Checking for a Primary Cluster

Technically this is a cluster or cluster-partition state for whatever part of the cluster the queried node is a part of.  However, any single node could be disconnected from the rest of the cluster, so checking this on each node should be fine.  We can verify this with this check:

$ /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_cluster_status -C == -T str -c non-Primary
OK wsrep_cluster_status (str) = Primary | wsrep_cluster_status=Primary;;non-Primary;0;

Local node state

We also want to verify the given node is ‘Synced’ into the cluster and not in some other state:

/usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_local_state_comment -C '!=' -T str -w Synced
OK wsrep_local_state_comment (str) = Synced | wsrep_local_state_comment=Synced;;Synced;0;

Note that we are only  warning when the state is not Synced — this is because it is perfectly valid for a node to be in the Donor/Desynced state.  This warning can alert us to a node in a less-than-ideal state without screaming about it, but you could certainly go critical instead.

Verify the Cluster Size

This is a bit of a sanity check, but we want to know how many nodes are in the cluster and either warn if we’re down a single node or go critical if we’re down more.  For a three node cluster, your check might look like this:

# /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_cluster_size -C '<=' -w 2 -c 1
OK wsrep_cluster_size = 3 | wsrep_cluster_size=3;2;1;0;

This is OK when we have 3 nodes, warns at 2 nodes and goes critical at 1 node (when we have no redundancy left).   You could certainly adjust thresholds differently depending on your normative cluster size.   This check is likely meaningless unless we’re also in a Primary cluster, so you could set a service dependency on the Primary Cluster check here.

Check for Flow Control

Flow control is really something to keep an eye on in your cluster. We can monitor the recent state of flow control like this:

/usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_flow_control_paused -w 0.1 -c 0.9
OK wsrep_flow_control_paused = 0.000000 | wsrep_flow_control_paused=0.000000;0.1;0.9;0;

This warns when FC exceeds 10% and goes critical after 90%.  This may need some fine tuning, but I believe it’s a general principle that some small amount of FC might be normal, but you want to know when it starts to get more excessive.

Conclusion

Alerting with Nagios and Graphing with Cacti tend to work best with per-host checks and graphs, but there are aspects of a PXC cluster that you may want to monitor from a cluster-wide perspective.  However, most of the things that can “go wrong” are easily detectable with per-host checks and you can get by without needing a custom script that is Galera-aware.

I’d also always recommend what I call a “service check” that connects through your VIP or load balancer to ensure that MySQL is available (regardless of underlying cluster state) and can do a query.  As long as that works (proving there is at least 1 Primary cluster node), you can likely sleep through any other cluster event.  :)

The post Percona XtraDB Cluster/ Galera with Percona Monitoring Plugins appeared first on MySQL Performance Blog.

Useful MySQL 5.6 features you get for free in PXC 5.6

$
0
0

I get a lot of questions about Percona XtraDB Cluster 5.6 (PXC 5.6), specifically about whether such and such MySQL 5.6 Community Edition feature is in PXC 5.6.  The short answer is: yes, all features in community MySQL 5.6 are in Percona Server 5.6 and, in turn, are in PXC 5.6.  Whether or not the new feature is useful in 5.6 really depends on how useful it is in general with Galera.

I thought it would be useful to highlight a few features and try to show them working:

Innodb Fulltext Indexes

Yes, FTS works in Innodb in 5.6, so why wouldn’t it work in PXC 5.6?  To test this I used the Sakila database , which contains a single table with FULLTEXT.  In the sakila-schema.sql file, it is still designated a MyISAM table:

CREATE TABLE film_text (
  film_id SMALLINT NOT NULL,
  title VARCHAR(255) NOT NULL,
  description TEXT,
  PRIMARY KEY  (film_id),
  FULLTEXT KEY idx_title_description (title,description)
)ENGINE=MyISAM DEFAULT CHARSET=utf8;

I edited that file to change MyISAM to Innodb, loaded the schema and data into my 3 node cluster:

[root@node1 sakila-db]# mysql < sakila-schema.sql
[root@node1 sakila-db]# mysql < sakila-data.sql

and it works seamlessly:

node1 mysql> select title, description, match( title, description) against ('action saga' in natural language mode) as score from sakila.film_text order by score desc limit 5;
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| title | description | score |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| FACTORY DRAGON | A Action-Packed Saga of a Teacher And a Frisbee who must Escape a Lumberjack in The Sahara Desert | 3.0801234245300293 |
| HIGHBALL POTTER | A Action-Packed Saga of a Husband And a Dog who must Redeem a Database Administrator in The Sahara Desert | 3.0801234245300293 |
| MATRIX SNOWMAN | A Action-Packed Saga of a Womanizer And a Woman who must Overcome a Student in California | 3.0801234245300293 |
| REEF SALUTE | A Action-Packed Saga of a Teacher And a Lumberjack who must Battle a Dentist in A Baloon | 3.0801234245300293 |
| SHANE DARKNESS | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin | 3.0801234245300293 |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
5 rows in set (0.00 sec)

Sure enough, I can run this query on any node and it works fine:

node3 mysql> select title, description, match( title, description) against ('action saga' in natural language mode) as score from sakila.film_text order by score desc limit 5;
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| title           | description                                                                                               | score              |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
| FACTORY DRAGON  | A Action-Packed Saga of a Teacher And a Frisbee who must Escape a Lumberjack in The Sahara Desert         | 3.0801234245300293 |
| HIGHBALL POTTER | A Action-Packed Saga of a Husband And a Dog who must Redeem a Database Administrator in The Sahara Desert | 3.0801234245300293 |
| MATRIX SNOWMAN  | A Action-Packed Saga of a Womanizer And a Woman who must Overcome a Student in California                 | 3.0801234245300293 |
| REEF SALUTE     | A Action-Packed Saga of a Teacher And a Lumberjack who must Battle a Dentist in A Baloon                  | 3.0801234245300293 |
| SHANE DARKNESS  | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin                          | 3.0801234245300293 |
+-----------------+-----------------------------------------------------------------------------------------------------------+--------------------+
5 rows in set (0.05 sec)
node3 mysql> show create table sakila.film_textG
*************************** 1. row ***************************
       Table: film_text
Create Table: CREATE TABLE `film_text` (
  `film_id` smallint(6) NOT NULL,
  `title` varchar(255) NOT NULL,
  `description` text,
  PRIMARY KEY (`film_id`),
  FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

There might be a few caveats and differences from how FTS works in Innodb vs MyISAM, but it is there.

Minimal replication images

Galera relies heavily on RBR events, but until 5.6 those were entire row copies, even if you only changed a single column in the table. In 5.6 you can change this to send only the updated data using the variable binlog_row_image=minimal.

Using a simple sysbench update test for 1 minute, I can determine the baseline size of the replicated data:

node3 mysql> show global status like 'wsrep_received%';
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| wsrep_received       | 703       |
| wsrep_received_bytes | 151875070 |
+----------------------+-----------+
2 rows in set (0.04 sec)
... test runs for 1 minute...
node3 mysql> show global status like 'wsrep_received%';
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| wsrep_received       | 38909     |
| wsrep_received_bytes | 167749809 |
+----------------------+-----------+
2 rows in set (0.17 sec)

This results in 62.3 MB of data replicated in this test.

If I set binlog_row_image=minimal on all nodes and do a rolling restart, I can see how this changes:

node3 mysql> show global status like 'wsrep_received%';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_received       | 3     |
| wsrep_received_bytes | 236   |
+----------------------+-------+
2 rows in set (0.07 sec)
... test runs for 1 minute...
node3 mysql> show global status like 'wsrep_received%';
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received       | 34005    |
| wsrep_received_bytes | 14122952 |
+----------------------+----------+
2 rows in set (0.13 sec)

This yields a mere 13.4MB, that’s 80% smaller, quite a savings!  This benefit, of course, fully depends on the types of workloads you are doing.

Durable Memcache Cluster

It turns out this feature does not work properly with Galera, see below for an explanation:

5.6 introduces an Memcached interface for Innodb.  This means any standard memcache client can talk to our PXC nodes with the memcache protocol and the data is:

  • Replicated to all nodes
  • Durable across the cluster
  • Highly available
  • Easy to hash memcache clients across all servers for better cache coherency

To set this up, we need to simply load the innodb_memcache schema from the example and restart the daemon to get a listening memcached port:

[root@node1 ~]# mysql < /usr/share/mysql/innodb_memcached_config.sql
[root@node1 ~]# service mysql restart
Shutting down MySQL (Percona XtraDB Cluster)...... SUCCESS!
Starting MySQL (Percona XtraDB Cluster)...... SUCCESS!
[root@node1 ~]# lsof +p`pidof mysqld` | grep LISTEN
mysqld  31961 mysql   11u  IPv4             140592       0t0      TCP *:tram (LISTEN)
mysqld  31961 mysql   55u  IPv4             140639       0t0      TCP *:memcache (LISTEN)
mysqld  31961 mysql   56u  IPv6             140640       0t0      TCP *:memcache (LISTEN)
mysqld  31961 mysql   59u  IPv6             140647       0t0      TCP *:mysql (LISTEN)

This all appears to work and I can fetch the sample AA row from all the nodes with the memcached interface:

node1 mysql> select * from demo_test;
+-----+--------------+------+------+------+
| c1  | c2           | c3   | c4   | c5   |
+-----+--------------+------+------+------+
| AA  | HELLO, HELLO |    8 |    0 |    0 |
+-----+--------------+------+------+------+
[root@node3 ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
get AA
VALUE AA 8 12
HELLO, HELLO
END

However, if I try to update a row, it does not seem to replicate (even if I set innodb_api_enable_binlog):

[root@node3 ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set DD 0 0 0
STORED
^]
telnet> quit
Connection closed.
node3 mysql> select * from demo_test;
+----+--------------+------+------+------+
| c1 | c2           | c3   | c4   | c5   |
+----+--------------+------+------+------+
| AA | HELLO, HELLO |    8 |    0 |    0 |
| DD |              |    0 |    1 |    0 |
+----+--------------+------+------+------+
2 rows in set (0.00 sec)
node1 mysql> select * from demo_test;
+-----+--------------+------+------+------+
| c1  | c2           | c3   | c4   | c5   |
+-----+--------------+------+------+------+
| AA  | HELLO, HELLO |    8 |    0 |    0 |
+-----+--------------+------+------+------+

So unfortunately the memcached plugin must use some backdoor to Innodb that Galera is unaware of. I’ve filed a bug on the issue, but it’s not clear if there will be an easy solution or if a whole lot of code will be necessary to make this work properly.

In the short-term, however, you can at least read data from all nodes with the memcached plugin as long as data is only written using the standard SQL interface.

Async replication GTID Integration

Async GTIDs were introduced in 5.6 in order to make CHANGE MASTER easier.  You have always been able to use async replication from any cluster node, but now with this new GTID support, it is much easier to failover to another node in the cluster as a new master.

If we take one node out of our cluster to be a slave and enable GTID binary logging on the other two by adding these settings:

server-id = ##
log-bin = cluster_log
log-slave-updates
gtid_mode = ON
enforce-gtid-consistency

If I generate some writes on the cluster, I can see GTIDs are working:

node1 mysql> show master statusG
*************************** 1. row ***************************
File: cluster_log.000001
Position: 573556
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1505
1 row in set (0.00 sec)
node2 mysql> show master statusG
*************************** 1. row ***************************
File: cluster_log.000001
Position: 560011
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1505
1 row in set (0.00 sec)

Notice that we’re at GTID 1505 on both nodes, even though the binary log position happens to be different.

I set up my slave to replicate from node1 (.70.2):

node3 mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.70.2
                  Master_User: slave
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
           Retrieved_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1506
            Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-1506
                Auto_Position: 1
1 row in set (0.00 sec)

And it’s all caught up.  If put some load on the cluster, I can easily change to node2 as my master without needing to stop writes:

node3 mysql> stop slave;
Query OK, 0 rows affected (0.09 sec)
node3 mysql> change master to master_host='192.168.70.3', master_auto_position=1;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> start slave;
Query OK, 0 rows affected (0.02 sec)
node3 mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.70.3
                  Master_User: slave
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
            Executed_Gtid_Set: e941e026-ac70-ee1c-6dc9-40f8d3b5db3f:1-3712
                Auto_Position: 1

So this seems to work pretty well. It does turns out there is a bit of a bug, but it’s actually with Xtrabackup — currently the binary logs are not copied in Xtrabackup SST and this can cause GTID inconsistencies within nodes in the cluster.  I would expect this to get fixed relatively quickly.

Conclusion

MySQL 5.6 introduces a lot of new interesting features that are even more compelling in the PXC/Galera world.  If you want to experiment for yourself, I pushed the Vagrant environment I used to Github at: https://github.com/jayjanssen/pxc_56_features

The post Useful MySQL 5.6 features you get for free in PXC 5.6 appeared first on MySQL Performance Blog.

The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB

$
0
0

Most technologies achieving high-availability for MySQL need a load-balancer to spread the client connections to a valid database host, even the Tungsten special connector can be seen as a sophisticated load-balancer. People often use hardware load balancer or software solution like haproxy. In both cases, in order to avoid having a single point of failure, multiple load balancers must be used. Load balancers have two drawbacks: they increase network latency and/or they add a validation check load on the database servers. The increased network latency is obvious in the case of standalone load balancers where you must first connect to the load balancer which then completes the request by connecting to one of the database servers. Some workloads like reporting/adhoc queries are not affected by a small increase of latency but other workloads like oltp processing and real-time logging are. Each load balancers must also check regularly if the database servers are in a sane state, so adding more load balancers increases the idle chatting over the network. In order to reduce these impacts, a very different type of load balancer is needed, let me introduce the Iptables ClusterIP target.

Normally, as stated by the RFC 1812 Requirements for IP Version 4 Routers an IP address must be unique on a network and each host must respond only for IPs it own. In order to achieve a load balancing behavior, the Iptables ClusterIP target doesn’t strictly respect the RFC. The principle is simple, each computer in the cluster share an IP address and MAC address with the other members but it answers requests only for a given subset, based on the modulo of a network value which is sourceIP-sourcePort by default. The behavior is controlled by an iptables rule and by the content of the kernel file /proc/net/ipt_CLUSTERIP/VIP_ADDRESS. The kernel /proc file just informs the kernel to which portion of the traffic it should answer. I don’t want to go too deep in the details here since all those things are handled by the Pacemaker resource agent, IPaddr2.

The IPaddr2 Pacemaker resource agent is commonly used for VIP but what is less know is its behavior when defined as part of a clone set. When part of clone set, the resource agent defines a VIP which uses the Iptables ClusterIP target, the iptables rules and the handling of the proc file are all done automatically. That seems very nice in theory but until recently, I never succeeded in having a suitable distribution behavior. When starting the clone set on, let’s say, three nodes, it distributes correctly, one instance on each but if 2 nodes fail and then recover, the clone instances all go to the 3rd node and stay there even after the first two nodes recover. That bugged me for quite a while but I finally modified the resource agent and found a way to have it work correctly. It also now set correctly the MAC address if none is provided to the MAC multicast address domain which starts by “01:00:5E”. The new agent, IPaddr3, is available here. Now, let’s show what we can achieve with it.

We’ll start from the setup described in my previous post and we’ll modify it. First, download and install the IPaddr3 agent.

root@pacemaker-1:~# wget -O /usr/lib/ocf/resource.d/percona/IPaddr3 https://github.com/percona/percona-pacemaker-agents/raw/master/agents/IPaddr3
root@pacemaker-1:~# chmod u+x /usr/lib/ocf/resource.d/percona/IPaddr3

Repeat these steps on all 3 nodes. Then, we’ll modify the pacemaker configuration like this (I’ll explain below):

node pacemaker-1
        attributes standby="off"
node pacemaker-2
        attributes standby="off"
node pacemaker-3
        attributes standby="off"
primitive p_cluster_vip ocf:percona:IPaddr3
        params ip="172.30.212.100" nic="eth1"
        meta resource-stickiness="0"
        op monitor interval="10s"
primitive p_mysql_monit ocf:percona:mysql_monitor
        params reader_attribute="readable_monit" writer_attribute="writable_monit" user="repl_user" password="WhatAPassword" pid="/var/lib/mysql/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" max_slave_lag="5" cluster_type="pxc"
        op monitor interval="1s" timeout="30s" OCF_CHECK_LEVEL="1"
clone cl_cluster_vip p_cluster_vip
        meta clone-max="3" clone-node-max="3" globally-unique="true"
clone cl_mysql_monitor p_mysql_monit
        meta clone-max="3" clone-node-max="1"
location loc-distrib-cluster-vip cl_cluster_vip
        rule $id="loc-distrib-cluster-vip-rule" -1: p_cluster_vip_clone_count gt 1
location loc-enable-cluster-vip cl_cluster_vip
        rule $id="loc-enable-cluster-vip-rule" 2: writable_monit eq 1
location loc-no-cluster-vip cl_cluster_vip
        rule $id="loc-no-cluster-vip-rule" -inf: writable_monit eq 0
property $id="cib-bootstrap-options"
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff"
        cluster-infrastructure="openais"
        expected-quorum-votes="3"
        stonith-enabled="false"
        no-quorum-policy="ignore"
        last-lrm-refresh="1384275025"
        maintenance-mode="off"

First, the VIP primitive is modified to use the new agent, IPaddr3, and we set resource-stickiness=”0″. Next, we define the cl_cluster_vip clone set using: clone-max=”3″ to have three instances, clone-node-max=”3″ to allow up to three instances on the same node and globally-unique=”true” to tell Pacemaker it has to allocate an instance on a node even if there’s already one. Finally, there’re three location rules needed to get the behavior we want, one using the p_cluster_vip_clone_count attribute and the other two around the writable_monit attribute. Enabling all that gives:

root@pacemaker-1:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:51:38 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

and the network configuration is:

root@pacemaker-1:~# iptables -L INPUT -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
CLUSTERIP  all  --  0.0.0.0/0            172.30.212.100       CLUSTERIP hashmode=sourceip-sourceport clustermac=01:00:5E:91:18:86 total_nodes=3 local_node=1 hash_init=0
root@pacemaker-1:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
2
root@pacemaker-2:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
3
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1

In order to test the access, you need to query the VIP from a fourth node:

root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-1
pacemaker-1
pacemaker-2
pacemaker-2
pacemaker-2
pacemaker-3
pacemaker-2
^C

So, all good… Let’s now desync the pacemaker-1 and pacemaker-2.

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:53:51 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-3
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 3
    + readable_monit                    : 1
    + writable_monit                    : 1
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1,2,3
root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3

Now, if pacemaker-1 and pacemaker-2 are back in sync, we have the desired distribution:

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:58:40 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

All the clone instances redistributed on all nodes as we wanted.

As a conclusion, Pacemaker with a clone set of IPaddr3 is a very interesting kind of load balancer, especially if you already have pacemaker deployed. It introduces almost no latency, it doesn’t need any other hardware, doesn’t increase the database validation load and is as highly-available as your database is. The only drawback I can see is in a case where the inbound traffic is very important. In that case, all nodes are receiving all the traffic and are equally saturated. With databases and web type traffics, the inbound traffic is usually small. This solution also doesn’t redistribute the connections based on the server load like a load balancer can do but that would be fairly easy to implement with something like a server_load attribute and an agent similar to mysql_monitor but that will check the server load instead of the database status. In such a case, I suggest using much more than 1 VIP clone instance per node to have a better granularity in load distribution. Finally, the ClusterIP target, although still fully supported, has been deprecated in favor of the Cluster-match target. It is basically the same principle and I plan to adapt the IPaddr3 agent to Cluster-match in a near future.

The post The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB appeared first on MySQL Performance Blog.

ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring

$
0
0

ClusterControl for Percona XtraDB Cluster is now available in three different versions thanks to our partnership with Severalnines. ClusterControl will make it simpler to manage and monitor Percona XtraDB Cluster, MySQL Cluster, MySQL Replication, or MySQL Galera.

I am very excited about our GA release of Percona XtraDB Cluster 5.6 last week. As Vadim described in his blog post announcing the release, we have brought together the benefits of Percona Server 5.6, Percona XtraBackup and Galera 3 to create a drop-in compatible, open source, state-of-the-art high availability MySQL clustering solution. We could not have done it without the close cooperation and help of our partners at Codership. To learn more, join us on February 5th for a webinar entitled “What’s New in Percona XtraDB Cluster 5.6” presented by Jay Janssen.

While these new features in Percona XtraDB Cluster are highly valuable, we regularly receive feedback from users that they would like more tools to make it easier to manage and monitor their MySQL clusters. In response to that feedback, I’m pleased to announce our partnership with Severalnines to make ClusterControl more available to Percona XtraDB Cluster users and, in fact, all of our Support customers who use MySQL clustering solutions.

Percona XtraDB Cluster users now have three choices:

On the last option, Percona ClusterControl is a privately branded version of ClusterControl supplied to us by Severalnines. With a Percona Support contract covering Percona XtraDB Cluster, MySQL Cluster, or MySQL Galera, we provide support for your deployment and use of Percona ClusterControl. Percona has worked directly with Severalnines to ensure full compatibility of ClusterControl Community Edition with Percona XtraDB Cluster. The Percona Support team will provide assistance with the configuration and administration of Percona ClusterControl. We will help you get setup and then we will help you make the most of your new management and monitoring tools for the high availability solution of your choice.

We will also provide support for Severalnines ClusterControl Enterprise Edition when it is purchased from Severalnines or Percona and used with a cluster deployment covered by a Percona Support subscription that includes coverage of Percona XtraDB Cluster, MySQL Galera, or MySQL Cluster. ClusterControl Enterprise Edition is the de-facto management solution for Galera-based clusters and it provides the highest levels of monitoring and management capabilities for Percona XtraDB Cluster users.

Percona ClusterControl will provide our Support customers broad flexibility to deploy, monitor, and manage Percona XtraDB Clusters, supporting on-premise and Amazon Web Services deployments, as well as deployments spread across multiple cloud vendors. Percona ClusterControl can be used to automate the deployment of Percona XtraDB Cluster in cross-region, multi-datacenter setups. Percona ClusterControl offers a full range of management functionality for clusters, including:

  • Node and cluster recovery
  • Configuration management
  • Performance health checks
  • Online backups
  • Upgrades
  • Scale out
  • Cloning of clusters
  • Deploying HAProxy load balancers

Percona ClusterControl can be used with multiple clustering solutions including MySQL Cluster, MySQL Replication, MySQL Galera, MariaDB Galera Cluster, and MongoDB. Percona Support subscribers that elect coverage for these clustering solutions receive Percona ClusterControl with their subscriptions.

Just as we made Percona XtraDB Cluster a much better solution by partnering with Codership, we now expect it will be a much easier to manage and monitor solution thanks to our partnership with Severalnines. Whether you need Percona Support or not, I hope you will take advantage of these powerful management and monitoring solutions which are now available for Percona XtraDB Cluster. To learn more about using ClusterControl to manage Percona XtraDB Cluster, please join us on February 19th for a webinar entitled “Performance Monitoring and Troubleshooting of Percona XtraDB Cluster” presented by Peter Boros.

The post ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring appeared first on MySQL Performance Blog.


Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6

$
0
0

Overview

Percona XtraDB Cluster 5.6 has been GA for several months now and people are thinking more and more about moving from 5.5 to 5.6. Most people don’t want to upgrade all at once, but would prefer a rolling upgrade to avoid downtime and ensure 5.6 is behaving in a stable fashion before putting all of production on it. The official guide to a rolling upgrade can be found in the PXC 5.6 manual. This blog post will attempt to summarize the basic process.

However, there are a few caveats to trying to do a rolling 5.6 upgrade from 5.5:

  1. If you mix Galera 2 and Galera 3 nodes, you must set wsrep_provider_options=”socket.checksum=1″ on the Galera 3 nodes for backwards compatibility between Galera versions.
  2. You must set some 5.6 settings for 5.5 compatibility with respect to replication:
    1. You can’t enable GTID async replication on the 5.6 nodes
    2. You should use –log-bin-use-v1-row-events on the 5.6 nodes
    3. You should set binlog_checksum=NONE  on the 5.6 nodes
  3. You must not SST a 5.5 donor to a 5.6 joiner as the SST script does not handle mysql_upgrade
  4. You should set the 5.6 nodes read_only and not write to them!

The basic upgrade flow

The basic upgrade flow is:

  1. For some node(s): upgrade to 5.6, do all the above stuff, put them into a read_only pool
  2. Repeat step 1 as desired
  3. Once your 5.6 pool is sufficiently large, cut over writes to the 5.6 pool (turn off read_only, etc.) and upgrade the rest.

This is, in essence, exactly like upgrading a 5.5 master/slave cluster to 5.6 — you upgrade the slaves first, promote a slave and upgrade the master; we just have more masters to think about.

Once your upgrade is fully to 5.6, then you can go back through and remove all the 5.5 backwards compatibility

Why can’t I write to the 5.6 nodes?

The heaviest caveat is probably the fact that in a mixed 5.5 / 5.6 cluster, you are not supposed to write to the 5.6 nodes.  Why is that?  Well, the reason goes back to MySQL itself.  PXC/Galera uses standard RBR binlog events from MySQL for replication.   Replication between major MySQL versions is only ever officially supported:

  • across 1 major version (i.e., 5.5 to 5.6, though multiple version hops do often work)
  • from a lower version master to a higher version slave (i.e., 5.5 is your master and 5.6 is your slave, but not the other way around)

This compatibility requirement (which has existed for a very long time in MySQL) works great when you have a single Master replication topology, but true multi-master (multi-writer) has obviously never been considered.

Some alternatives

Does writing to the 5.6 nodes REALLY break things?

The restriction on 5.6 masters of 5.5 slaves is probably too strict in many cases.  Technically only older to newer replication is ever truly supported, but in practice you may be able to run a mixed cluster with writes to all nodes as long as you are careful.  This means (at least) that any modifications to column type formats in the newer version NOT be upgraded while the old version remains active in the cluster.  There might be other issues, I’m not sure, I cannot say I’ve tested every possible circumstance.

So, can I truly say I recommend this?  I cannot say that officially, but you may find it works fine.  As long as you acknowledge that something unforeseen may break your cluster and your migration plan, it may be reasonable.  If you decide to explore this option, please test this thoroughly and be willing to accept the consequences of it not working before trying it in production!

Using Async replication to upgrade

Another alternative is rather than trying to mix the clusters and keeping 5.6 nodes read_only, why not just setup the 5.6 cluster as an async slave of your 5.5 cluster and migrate your application to the new cluster when you are ready?  This is practically the same as maintaining a split 5.5/5.6 read_write/read_only cluster without so much risk and a smaller list of don’ts.  Cutover in this case would be effectively like promoting a 5.6 slave to master, except you would promote the 5.6 cluster.

One caveat with this approach might be dealing with replication throughput:  async may not be able to keep up replicating your 5.5 cluster writes to a separate 5.6 cluster.  Definitely check out wsrep_preordered to speed things up, it may help.  But realize some busy workloads just may not ever be able to use async into another cluster.

Just take the outage

A final alternative for this post is the idea of simply upgrading the entire cluster to 5.6 all at once during a maintenance window.  I grant that this defeats the point of a rolling upgrade, but it may offer a lot of simplicity in the longer run.

Conclusion

A rolling PXC / Galera upgrade across major MySQL versions is limited by the fact that there is no official support or reason for Oracle to support newer master to older slave.  In practice, it may work much of the time, but these situations should be considered carefully and the risk weighed against all other options.

The post Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6 appeared first on MySQL Performance Blog.

MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training)

$
0
0

Percona XtraDB Cluster 5.6I’ve had the opportunity to train lots of people on Percona XtraDB Cluster (PXC) over the last few years the product has existed.  This has taken the form of phone calls, emails, blog posts, webinars, and consulting engagements. This doesn’t count all the time I’ve spent grilling Codership on how things actually work.  But it has culminated in the PXC tutorial I have given annually at Percona Live MySQL Conference and Expo in Santa Clara, California for the last two years.  Baron even attended this year and had this say:

“Jay Janssen’s tutorial on Percona XtraDB Cluster was impressive. I can’t imagine how much time he must have spent preparing for that.”

(Baron: Thanks for this compliment, BTW)  The answer is: I haven’t kept track, but I’ve delivered it 3 times at conferences and it gets better each time.  I started working on it during the summer of 2012 for Percona Live NYC that year and have put in countless hours since making it better.  This year, despite my voice problems, I was really happy with how it came out.

We’ve had lots of requests for repeats of this tutorial and with more detail.  As such, I’ll be doing a 2-day training about MySQL HA with Percona XtraDB Cluster in sunny San Jose, California on July 16th -17th 2014.  This Percona MySQL Training session is appropriately titled, “MySQL High Availability With Percona XtraDB Cluster” and you can get more information and also register here.

This course will include an overview of how Percona XtraDB Cluster fits into the MySQL High Availability world, followed by a deep-dive into PXC.  All the content of the tutorial will be covered, but it will be spread out with more details on all the components and important bits of using PXC and Galera.

If you’re unfamiliar with my tutorial — it is intensively hands-on.  This will not be 2 days of lecture: You’ll be setting up and experimenting with a real cluster on the fly during the class. Expect plenty of command-line work and to leave with a much better sense of how to truly operate and use a Percona XtraDB Cluster.

Once again, you can check out the details on our website and register NOW to attend right here.

The post MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training) appeared first on MySQL Performance Blog.

Semi-Sync replication performance in MySQL 5.7.4 DMR

$
0
0

I was interested to hear about semi-sync replication improvements in MySQL’s 5.7.4 DMR release and decided to check it out.  I previously blogged about poor semi-sync performance and was pretty disappointed from semi-sync’s performance across WAN distances back then, particularly with many client threads.

The Test

The basic environment of these tests was:

  • AWS EC2 m3.medium instances
  • Master in us-east-1, slave in us-west-1 (~78ms ping RTT)
  • CentOS 6.5
  • innodb_flush_log_at_trx_commit=1
  • sync_binlog=1
  • Semi-sync replication plugin installed and enabled.
  • GTID’s enabled (except on 5.5)
  • sysbench 0.5 update_index.lua test, 60 seconds, 250k table size.
  • MySQL 5.7 was tested with both AFTER_SYNC and AFTER_COMMIT settings for rpl_semi_sync_master_wait_point
  • I tested Percona XtraDB Cluster 5.6 / Galera 3.5 as well by means of comparison

Without further ado, here’s the TpmC results I got for a single client thread:

 

These graphs are interactive, so mouse-over for more details. I’m using log scales to better highlight the differences.

The blue bars represent transactions per second (more is better).  The red bars represent average latency per transaction per client (less is better).  Remember these transactions are synchronously being copied across the US before the client can execute another.

The first test is our control:  Async allows ~273 TPS on a single thread.  Once we introduce synchronicity, we clearly see the bulk of the time is that round trip.  Note that MySQL 5.5 and 5.6 are a bit higher than MySQL 5.7 and Percona XtraDB Cluster, the latter of which show pretty similar results.

Adding parallelism

This gets more interesting to see if we redo the same tests, but with 32 test threads:

In the MySQL 5.5 and 5.6 tests, we can clearly see nasty serialization.  Both really don’t allow more performance than single threaded sysbench.  I was happy to see, however, that this seems to be dramatically improved in MySQL 5.7, nice job Oracle!

AFTER_SYNC and AFTER_COMMIT vary, but AFTER_SYNC is the default and is preferred over AFTER_COMMIT.  The reasoning here is AFTER_SYNC forces the semi-sync wait BEFORE the transaction is committed on the master.  The client still must wait for the semi-sync in AFTER_COMMIT, but other transactions may see its data on the master BEFORE we confirm the semi-sync slave has received it.  This is potentially bad because if the master crashed at that instant, clients may have read data from the master that did not make it to a failover slave.    This is a type of ‘phantom read’ and Yoshinori explains it in more detail here.

What about Percona XtraDB Cluster?

I also want to discuss the Percona XtraDB Cluster results, Galera here is somewhat slower than MySQL 5.7 semi-sync.  There may be some enhancements to Galera that can be made (competition is a good thing), but there are still some significant differences here:

  • Galera allows for writing on any and all nodes, semi-sync does not
  • Galera introduces the certification process to check for conflicts, Semi-sync does not
  • Galera is not 2-phase commit and transactions are not committed synchronously anywhere except the node originating the transaction.  So, it is similar to Semi-sync in this way.
  • I ran the Galera tests with no log-bin (Galera does not require it)
  • I ran the Galera tests with innodb_flush_log_at_trx_commit=1
  • I set the fc_limit on the second node really high to eliminate Flow control as a bottleneck.  In a live cluster, it would typically be needed.
  • Galera provides parallel slave threads for faster apply, but it doesn’t matter here because I set the fc_limit so high

TL;DR

Semi-sync in MySQL 5.7 looks like a great improvement.  Any form of synchronicity is always going to be expensive, particularly over 10s and 100s of milliseconds of latency.  With MySQL 5.7, I’d be much more apt to recommend semi-sync as an option than in previous releases.  Thanks to Oracle for investing here.

The post Semi-Sync replication performance in MySQL 5.7.4 DMR appeared first on MySQL Performance Blog.

Galera data on Percona Cloud Tools (and other MySQL monitoring tools)

$
0
0

I was talking with a Percona Support customer earlier this week who was looking for Galera data on Percona Cloud Tools. (Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses.)

The customer mentioned they were already keeping track of some Galera stats on Cacti, and given they were inclined to use Percona Cloud Tools more and more, they wanted to know if it was already supporting Percona XtraDB Cluster. My answer was: “No, not yet: you can install agents in each node (the regular way in the first node, then manually on the other nodes… and when prompted say “No” to create MySQL user and provide the one you’re using already) and monitor them as autonomous MySQL servers – but the concept of cluster and specially the “Galera bits” has yet to be implemented there.

Except I was wrong.

By “concept of cluster” I mean supporting the notion of group instances, which should allow a single cluster-wide view for metrics and query reports, such as the slow queries (which are recorded locally on the node where the query was run and thus observed in a detached way). This still needs to be implemented indeed, but it’s on the roadmap.

The “Galera bits” I mentioned above are the various “wsrep_” status variables. In fact, those refer to the Write Set REPlication patches (based in the wsrep API), a set of hooks applied to the InnoDB/XtraDB storage engine and other components of MySQL that modifies the way replication works (to put it in a very simplified way), which in turn are used by the Galera library to provide a “generic Synchronous Multi-Master replication plugin for transactional applications.” A patched version of Percona Server together with the Galera libray compose the base of Percona XtraDB Cluster.

As I found out only now, Percona Cloud Tools does collect data from the various wsrep_ variables and it is available for use – it’s just not shown by default. A user only needs to add a dashboard/chart manually on PCT to see these metrics:

adding_wsrep_chart

Click on the picture to enlarge it

Now, I need to call that customer …

Monitoring the cluster

Since I’m touching this topic I thought it would be useful to include some additional information on monitoring a Galera (Percona XtraDB Cluster in particular) cluster, even though most of what I mention below has already been published in different posts here on the MySQL Performance Blog. There’s a succint documentation page bearing the same title of this section that indicates the main wsrep_ variables you should monitor to check the health status of the cluster and how well it’s coping with load along the time (performance). Remember you can get a grasp of the full set of variables at any time by issuing the following command from one (or each one) of the nodes:

mysql> SHOW GLOBAL STATUS LIKE "wsrep_%";

And for a broader and real time view of the wsrep_ status variables you can use Jay Janssen’s myq_gadgets toolkit, which he modified a couple of years ago to account for Galera.

There’s also a specific Galera-template available in our Percona Monitoring Plugins (PMP) package that you can use in your Cacti server. That would cover the “how well the cluster copes with load along the time,” providing historical graphing. And while there isn’t a Galera specific plugin for Nagios in PMP, Jay explains in another post how you can customize pmp-check-mysql-status to “check any status variable you like,” describing his approach to keep a cluster’s “health status” in check by setting alerts on specific stats, on a per node basis.

VividCortex recently added a set of templates for Galera in their product and you can also rely on Severalnines’ ClusterControl monitoring features to get that “global view” of your cluster that Percona Cloud Tools doesn’t yet provide. Even though ClusterControl is a complete solution for cluster deployment and management, focusing on the automation of the whole process, the monitoring part alone is particularly interesting as it encompasses cluster-wide information in a customized way, including the “Galera bits”. You may want to give it a try as the monitoring features are available in the Community version of the product (and if you’re a Percona customer with a Support contract covering Percona XtraDB Cluster, then you’re entitled to get support for it from us).

One thing I should note that differentiate the monitoring solutions from above is that while you can install Cacti, Nagios and ClusterControl as servers in your own infrastructure both Percona Cloud Tools and VividCortex are hosted, cloud-based services. Having said that, neither one nor the other upload sensitive data to the cloud and both provide options for query obfuscation.

Summary

Contrary to what I believed, Percona Cloud Tools does provide support for “Galera bits” (the wsrep_ status variables), even though it has yet to implement support for the notion of group instances, which will allow for cluster-wide view for metrics and query reports. You can also rely on the Galera template for Cacti provided by Percona Monitoring Plugins for historical graphing and some clever use of Nagios’ pmp-check-mysql-status for customized cluster alerts. VividCortex and ClusterControl also provide monitoring for Galera.

Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. Sign up to request access to the beta today.  

The post Galera data on Percona Cloud Tools (and other MySQL monitoring tools) appeared first on MySQL Performance Blog.

Q&A: Percona XtraDB Cluster as a MySQL HA solution for OpenStack

$
0
0

Thanks to all who attended my Nov. 12 webinar titled, “Percona XtraDB Cluster as a MySQL HA Solution for OpenStack.” I had several questions which I covered at the end and a few that I didn’t. You can view the entire webinar and download the slides here.

Q&A: Percona XtraDB Cluster as a MySQL HA solution for OpenstackQ: Is the read,write speed reduced in Galera compared to normal MySQL?

For reads, it’s the same (unless you use the sync_wait feature, used to be called causal reads).

For writes, the cost of replication (~1 RTT to the worst node), plus the cost of certification will be added to each transaction commit.  This will have a non-trivial impact on your write performance, for sure.  However, I believe most OpenStack meta store use cases should not suffer overly by this performance penalty.

Q: Does state transfers affect a continuing transaction within the nodes?

Mostly, no.  The joining node will queue ongoing cluster replication while receiving its state transfer and use that to do its final ‘catchup’.  The node donating the state transfer may get locked briefly during a full SST, which could temporarily block writes on that node.  If you use the built-in clustercheck (or check the same things it checks), you can avoid this by diverting traffic away from a donor.

Q: Perhaps not the correct webinar for this question, but I was also expecting to hear something about using PXC in combination with OpenStack Trove. If you’ve got time, could you tell something about that?

Trove has not supported the concept of more than a single SQL target.  My understanding is a recent improvement here for MongoDB setups may pave the way for more robust Trove instances backed by Percona XtraDB Cluster.

Q: For Loadbalancing using the Java Mysql driver, would you suggest HA proxy or the loadbalancing connection in the java driver. Also how does things work in case of persistent connections and connection pools like dbcp?

Each node in a PXC cluster is just a mysql server that can handle normal mysql connections.  Obviously if a node fails, it’s easy to detect that you should probably not keep using that node, but in Percona XtraDB Cluster you need to watch out for things like cluster partitions where nodes lose quorum and stop taking queries (but still allow connections)  I believe it’s possible to configure advanced connection pooling setups to properly health check the nodes (unless the necessary features are not in the pool implementation), but I don’t have a reference architecture to point you to.

Q: Are there any manual solutions to avoid deadlocks within a high write context to force a query to execute on all nodes?

Yes, write to a single node, at least for the dataset that has high write volume.

Remember what I said in the webinar:  you can only update a given row once per RTT, so there’s an upper cap of throughput on the smallest locking granularity in InnoDB (i.e., a single row).

This manifests itself in two possible ways:

1) In a multi-node writing cluster by triggering deadlocks on conflicts.  Only approximately 1 transaction modifying a given row per RTT would NOT receive a deadlock

2) In a single-node writing cluster by experiencing longer lock waits.  Transaction times (and subsequently lock times) are extended by the replication and certification time, so other competing transactions will lock wait until the blocking transaction commit. There are no replication conflict deadlocks in this case, but the net effect is exactly the same:  only 1 transaction per row per RTT.

Galera offers us high data redundancy at the cost of some throughput.  If that doesn’t work for your workload, then asynchronous replication (or perhaps semi-sync in 5.7) will work better for you.

Note that there is wsrep_retry_autocommit.  However, this only works for autocommited transactions.  If your write volume was so high that you need to increase this a lot to get the conflict rate down, you are likely sacrificing a lot of CPU power (rollbacks are expensive in Innodb!) if a single transaction needs multiple retries to commit.  This still doesn’t get around the law:  1 trx per row per RTT at best.

That was all of the questions. Be sure to check out our next OpenStack webinar on December 10 by my colleague Peter Boros. It’s titled “MySQL and OpenStack Deep Dive” and you can register here (and like all of our webinars, it’s free). I also invite you to visit our OpenStack Live 2015 page and learn more about that conference in Santa Clara, California this coming April 13-14. The Call for Papers ends Nov. 16 and speakers get a full-access conference pass. I hope to see you there!

The post Q&A: Percona XtraDB Cluster as a MySQL HA solution for OpenStack appeared first on MySQL Performance Blog.

Viewing all 110 articles
Browse latest View live