A closer look at Percona XtraDB Cluster for MySQL

December 4, 2012, 9:19 am

≫ Next: Percona XtraDB Cluster (PXC): what about GRA_*.log files ?

The Web And PHP Magazine just published an article describing Percona XtraDB Cluster written by Liz van Dijk and Kenny Gryp. It is a pretty high-level introduction to the technology, so definitely give it a read if you’ve been wanting to give it a closer look.

Go to their website http://www.webandphp.com and download Issue number 9 for free!

The post A closer look at Percona XtraDB Cluster for MySQL appeared first on MySQL Performance Blog.

↧

Percona XtraDB Cluster (PXC): what about GRA_*.log files ?

December 19, 2012, 8:55 am

≫ Next: Percona XtraDB Cluster: SElinux is not always the culprit !

≪ Previous: A closer look at Percona XtraDB Cluster for MySQL

How easy is it to identify and debug Percona XtraDB Cluster replication problem ?

If you are using PXC, you may have already seen in your datadirectory several log files starting with GRA_

Those files correspond to a replication failure. That means the slave thread was not able to apply one transaction. For each of those file, a corresponding warning or error message is present in the mysql error log file.

Those error can also be false positive like a bad DDL statement (DROP a table that doesn’t exists for example) and therefore nothing to worry about. However it’s always recommended to understand what’s is happening.

As the GRA files contain binlog events in ROW format representing the failed transaction this post explains how to proceed.

The first step to be able to analyze your GRA files is to add a binlog header to the file.
You can download one here :GRA-header

We can verify it easily:

file /tmp/GRA-header
/tmp/GRA-header: MySQL replication log

Now we need to select one GRA log file:

[root@node2 mysql]# ls GRA_*.log
GRA_3_3.log
[root@node2 mysql]# file GRA_3_3.log
GRA_3_3.log: data
[root@node2 mysql]# ls -l GRA_3_3.log
-rw-rw----. 1 mysql mysql 106 Nov 29 23:28 GRA_3_3.log

We add the header and we can then use mysqlbinlog to see its content:

[root@node2 mysql]# cat GRA-header > GRA_3_3-bin.log
[root@node2 mysql]# cat GRA_3_3.log >> GRA_3_3-bin.log
[root@node2 mysql]# file GRA_3_3.log
GRA_3_3.log: data
[root@node2 mysql]# file GRA_3_3-bin.log
GRA_3_3-bin.log: MySQL replication log
[root@node2 mysql]# mysqlbinlog -vvv GRA_3_3-bin.log
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#120715  9:45:56 server id 1  end_log_pos 107     Start: binlog v 4, server v 5.5.25-debug-log created 120715  9:45:56 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
NHUCUA8BAAAAZwAAAGsAAAABAAQANS41LjI1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA0dQJQEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
# at 160
#121129 23:28:54 server id 0  end_log_pos 53     Table_map: `sakila`.`actor` mapped to number 33
#121129 23:28:54 server id 0  end_log_pos 106     Write_rows: table id 33 flags: STMT_END_F
BINLOG '
puG3UBMAAAAANQAAADUAAAAAACEAAAAAAAEABnNha2lsYQAFYWN0b3IABAIPDwcEhwCHAAA=
puG3UBcAAAAANQAAAGoAAAAAACEAAAAAAAEABP/wvwEJR3Vkb3FmdW5lBk5pa25ldqbht1A=
'/*!*/;
### INSERT INTO sakila.actor
### SET
###   @1=447 /* SHORTINT meta=0 nullable=0 is_null=0 */
###   @2='Gudoqfune' /* VARSTRING(135) meta=135 nullable=0 is_null=0 */
###   @3='Niknev' /* VARSTRING(135) meta=135 nullable=0 is_null=0 */
###   @4=1354228134 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

So it’s clear that the problem occurred when inserting a record in sakila.actor table.
And if we check in the error log for the corresponding error message (we know at what time to check):

121129 23:28:54 [ERROR] Slave SQL: Could not execute Write_rows event on table sakila.actor; Duplicate entry '447' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 106, Error_code: 1062
121129 23:28:54 [Warning] WSREP: RBR event 2 Write_rows apply warning: 121, 3
121129 23:28:55 [ERROR] WSREP: Failed to apply trx: source: 7af1ab8e-3a70-11e2-0800-66155499f3af version: 2 local: 0 state: APPLYING flags: 1 conn_id: 8 trx_id: 2582 seqnos (l: 9, g: 3, s: 2, d: 0, ts: 1354228134888418369)
121129 23:28:55 [ERROR] WSREP: Failed to apply app buffer: ¦á·P^S, seqno: 3, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():49
at galera/src/replicator_smm.cpp:apply_trx_ws():120
121129 23:28:55 [ERROR] WSREP: Node consistency compromized, aborting...

In this case it’s obvious why it failed but it’s not always the case. Now you know how to find the cause of these replication problems.

Also those files (GRA_*.log) doesn’t clean up automatically and are present only for troubleshooting purpose, so after having identified if they really represent a problem or not, you can manually delete them.

This was also discussed in galera-codership mailing list.

The post Percona XtraDB Cluster (PXC): what about GRA_*.log files ? appeared first on MySQL Performance Blog.

↧

Percona XtraDB Cluster: SElinux is not always the culprit !

December 20, 2012, 6:02 am

≫ Next: How to start a Percona XtraDB Cluster

≪ Previous: Percona XtraDB Cluster (PXC): what about GRA_*.log files ?

If you are using SElinux, you should know that it’s advised to disable it to avoid issue with PXC. Generally the communication between your nodes doesn’t work properly and a node having SElinux enabled won’t be able to join the cluster.

So when a node doesn’t join the cluster where it should, my first reflex is to have a look at audit.log. But recently I faced another problem: the node joined the cluster but SST failed (whatever which method was used, discarding skip).

I checked SElinux and it was of course disabled, then I add some debug information in the SST script but it seemed that the script was never launched. And this time the culprit is called : AppArmor !

Percona doesn’t provide any AppArmor profile for PXC, but it seems that on this server (Ubuntu TLS), a previous version of MySQL was installed and then removed but the AppArmor profile was still present.

So if you use apparmor (or if you don’t know) and you want to check is there is a profile for mysql, you can run the following command :

root@testmachine:~# apparmor_status
apparmor module is loaded.
7 profiles are loaded.
7 profiles are in enforce mode.
/sbin/dhclient
/usr/lib/NetworkManager/nm-dhcp-client.action
/usr/lib/connman/scripts/dhclient-script
/usr/sbin/mysqld
/usr/sbin/named
/usr/sbin/ntpd
/usr/sbin/tcpdump
0 profiles are in complain mode.
2 processes have profiles defined.
2 processes are in enforce mode.
/usr/sbin/named (1205)
/usr/sbin/ntpd (1347)
0 processes are in complain mode.

You can disable a profile easily by running

sudo ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/
sudo apparmor_parser -R /etc/apparmor.d/usr.sbin.mysqld

For more information related to AppArmor, you can refer to Ubuntu’s wiki

So now if you run ubuntu, you have two things to check first : SElinux and AppArmor !

Note: We often advise to disable SElinux and AppArmor on dedicated MySQL servers to avoid the performance overhead

The post Percona XtraDB Cluster: SElinux is not always the culprit ! appeared first on MySQL Performance Blog.

↧

How to start a Percona XtraDB Cluster

January 29, 2013, 1:49 pm

≫ Next: Announcing Percona XtraDB Cluster 5.5.29-23.7.1

≪ Previous: Percona XtraDB Cluster: SElinux is not always the culprit !

Before version 5.5.28 of Percona XtraDB Cluster, the easiest way was to join the cluster using wsrep_urls in [mysqld_safe] section of my.cnf.

So with a cluster of 3 nodes like this :

node1 = 192.168.1.1 node2 = 192.168.1.2 node3 = 192.168.1.3

we defined the setting like this :

wsrep_urls=gcomm://192.168.1.1:4567,gcomm://192.168.1.2:4567,gcomm://192.168.1.3:4567

With that line above in my.cnf on each node, when PXC (mysqld) was started, the node tried to join the cluster on the first IP, if no node was running on that IP, the next IP was tried and so on…. until the node could join the cluster or after it tried and didn’t find any node running the cluster, in that case mysqld failed to start.
To avoid this, when all nodes where down and you wanted to start the cluster, it was possible to have wsrep_urls defined like this :

wsrep_urls=gcomm://192.168.1.1:4567,gcomm://192.168.1.2:4567,gcomm://192.168.1.3:4567,gcomm://

That was a nice feature, especially for people that didn’t want to modify my.cnf after starting the first node initializing the cluster or people automating their deployment with a configuration management system.

Now, since wsrep_urls is deprecated since version 5.5.28 what is the better option to start the cluster ?

In my.cnf, [mysqld] section this time, you can use wsrep_cluster_address with the following syntax:

wsrep_cluster_address=gcomm://192.168.1.1,192.168.1.2,192.168.1.3

As you can see the port is not needed and gcomm:// is specified only once.

Note:In Debian and Ubuntu, the ip of the node cannot be present in that variable due to a glibc error:
130129 17:03:45 [Note] WSREP: gcomm: connecting to group 'testPXC', peer '192.168.80.1:,192.168.80.2:,192.168.80.3:' 17:03:45 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. [...] /usr/sbin/mysqld(_Z23wsrep_start_replicationv+0x111)[0x664c21] /usr/sbin/mysqld(_Z18wsrep_init_startupb+0x65)[0x664da5] /usr/sbin/mysqld[0x5329af] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x8bd)[0x534fad] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f3bb24b676d] /usr/sbin/mysqld[0x529d1d]

So what can be done to initialize the cluster when all nodes are down ? There are two options:

modify my.cnf and set wsrep_cluster_address=gcomm:// then when the node is started change it again, this is not my favourite option.
start mysql using the following syntax (it works only on RedHat and CentOS out of the box):
/etc/init.d/myslqd start --wsrep-cluster-address="gcomm://"
As there is no need to modify my.cnf, this is how I recommend to do it.

The post How to start a Percona XtraDB Cluster appeared first on MySQL Performance Blog.

↧

Announcing Percona XtraDB Cluster 5.5.29-23.7.1

January 30, 2013, 6:32 pm

≫ Next: Announcing Percona XtraDB Cluster 5.5.29-23.7.2

≪ Previous: How to start a Percona XtraDB Cluster

Percona is glad to announce the release of Percona XtraDB Cluster on January 30th, 2013. Binaries are available from downloads area or from our software repositories.

Bugs fixed:

In some cases when node is recovered variable threads_running would become huge. Bug fixed #1040108 (Teemu Ollakka).
Variable wsrep_defaults_file would be set up to the value in the last configuration file read. Bug fixed by keeping the value found in the top configuration file. Bug fixed #1079892 (Alex Yurchenko).
Variable wsrep_node_name was initialized before the glob_hostname, which lead to empty value for wsrep_node_name if it wasn’t set explicitly. Bug fixed #1081389 (Alex Yurchenko).
Running FLUSH TABLES WITH READ LOCK when slave applier needed to abort transaction that is on the way, caused the deadlock situation. Resolved by grabbing Global Read Lock before pausing wsrep provider. Bug fixed #1083162 (Teemu Ollakka).
Percona XtraDB Cluster would crash when processing a delete for a table with foreign key constraint. Bug fixed #1078346 (Seppo Jaakola).
When variable innodb_support_xa was set to 0, wsrep position wasn’t stored into the InnoDB tablespace. Bug fixed #1084199 (Teemu Ollakka).
Using XtraBackup for State Snapshot Transfer would fail due to mktemp error. Bug fixed #1080829 (Alex Yurchenko).
XtraBackup donor would run XtraBackup indefinitely if the xtrabackup --tmpdir was on tmpfs. Bug fixed #1086978 (Alex Yurchenko).
In some cases non-uniform foreign key reference could cause a slave crash. Fixed by using primary key of the child table when appending exclusive key for cascading delete operation. Bug fixed #1089490 (Seppo Jaakola).
Percona XtraDB Cluster would crash when binlog_format was set to STATEMENT. This was fixed by introducing the warning message. Bug fixed #1088400 (Seppo Jaakola).
An explicitly set wsrep_node_incoming_address might make “SHOW STATUS LIKE 'wsrep_incoming_addresses';” return the address without the port number. Bug fixed #1082406 (Alex Yurchenko).
Percona XtraDB Cluster would crash if the node’s own address would be specified in the wsrep_cluster_address variable. Bug fixed #1099413 (Alexey Yurchenko).
When installing from yum repository, Percona-XtraDB-Cluster-server and Percona-XtraDB-Cluster-client would conflict with mysql and mysql-server packages. Bug fixed #1087506 (Ignacio Nin).

Other bug fixes: bug fixed #1037165, bug fixed #812059.

Based on Percona Server 5.5.29-29.4 including all the bug fixes in it, Percona XtraDB Cluster 5.5.29-23.7.1 is now the current stable release. All of Percona‘s software is open-source and free.
We did our best to eliminate bugs and problems, but this is a software, so bugs are expected. If you encounter them, please report them to our bug tracking system.

The post Announcing Percona XtraDB Cluster 5.5.29-23.7.1 appeared first on MySQL Performance Blog.

↧

Announcing Percona XtraDB Cluster 5.5.29-23.7.2

February 13, 2013, 12:34 pm

≫ Next: Investigating MySQL Replication Latency in Percona XtraDB Cluster

≪ Previous: Announcing Percona XtraDB Cluster 5.5.29-23.7.1

Percona XtraDB Cluster

Percona is glad to announce the release of Percona XtraDB Cluster 5.5.29-23.7.2 on February 13th, 2013. Binaries are available from downloads area or from our software repositories.

Bugs fixed:

DML operations on temporary tables would try to append the key for provider library, which could cause a memory leak. Bug fixed for MyISAM temporary tables, InnoDB temporary tables still have this issue, probably caused by upstream bug #67259. Bug fixed #1112514 (Seppo Jaakola).
Bug fix for bug #1078346 introduced a regression. Foreign key checks were skipped in the parent table, which would cause the foreign key constraint errors. Bug fixed #1117175 (Seppo Jaakola).

Based on Percona Server 5.5.29-29.4 including all the bug fixes in it, Percona XtraDB Cluster 5.5.29-23.7.2 is now the current stable release. All of Percona‘s software is open-source and free.

We did our best to eliminate bugs and problems, but this is a software, so bugs are expected. If you encounter them, please report them to our bug tracking system.

The post Announcing Percona XtraDB Cluster 5.5.29-23.7.2 appeared first on MySQL Performance Blog.

↧

Investigating MySQL Replication Latency in Percona XtraDB Cluster

March 3, 2013, 6:10 pm

≫ Next: Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries

≪ Previous: Announcing Percona XtraDB Cluster 5.5.29-23.7.2

Investigating MySQL Replication Latency in Percona XtraDB Cluster

I was curious to check how Percona XtraDB Cluster behaves when it comes to MySQL replication latency — or better yet, call it data propagation latency. It was interesting to see whenever I can get stale data reads from other cluster nodes after write performed to some specific node. To test it I wrote quite a simple script (you can find it in the end of the post) which connects to one node in the cluster, performs an update and then immediately does the read from second node. If the data has been already propagated — good, if not we’ll continue to retry reads until it finally propagates, and then measure the latency. This is used to see whenever application can see any stale reads.

My setup is 3 Percona XtraDB Cluster nodes talking through dedicated 1Gbit cluster network (DPE1, DPE2, DPE3) and I’m running the test from 4th server (SMT2) so it is pretty realistic setup from typical data center latency point of view though the server hardware is not the most recent.

First lets look at baseline when cluster has no load but running the script doing writes to DPE1 and immediately reading from DPE2

Summary: 94 out of 10000 rounds (0.94%)  Delay distribution: Min: 0.71 ms;  Max: 2.16 ms Avg: 0.89 ms

These results tell me 2 things. First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other. Second – it is actually doing quite well with less than 1% of tests able to see any inconsistency and the delay being less than 1ms in average with rather stable results.

But we do not setup clusters to be idle right ? So lets to another test, now running the Sysbench load on DPE1. With concurrency of 32 this corresponds to pretty significant load.

sysbench --test=oltp   --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32  --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run

Results become as follows:

Summary: 3901 out of 10000 rounds (39.01%)  Delay distribution: Min: 0.66 ms;  Max: 201.36 ms Avg: 3.81 ms
Summary: 3893 out of 10000 rounds (38.93%)  Delay distribution: Min: 0.66 ms;  Max: 42.9 ms Avg: 3.76 ms

As expected we can observe inconsistency much more frequently in almost 40% though the average observed delay remains just few milliseconds, which is something most applications would not even notice.

Now if we run sysbench on DPE2 (the load on the node which we’re reading from)

Summary: 3747 out of 10000 rounds (37.47%)  Delay distribution: Min: 0.86 ms;  Max: 108.15 ms Avg: 8.62 ms
Summary: 3721 out of 10000 rounds (37.21%)  Delay distribution: Min: 0.81 ms;  Max: 291.81 ms Avg: 8.54 ms

We can observe the effect in similar amount of cases but delay is higher in this case both in average and the Maximum one. This tells me from data propagation standpoint the cluster is more sensitive to the load on the nodes which receive the data, not the ones where writes are done.

Lets remember though what Sysbench OLTP has only rather small portion of writes. What if we look at workloads which consists 100% of Writes. We can do it with Sysbench, for example:

sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key  --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32  --init-rng=on --max-requests=0 --max-time=3000 run

Running this load on DPE1 I’m getting:

Summary: 1062 out of 10000 rounds (10.62%)  Delay distribution: Min: 0.71 ms;  Max: 285.07 ms Avg: 3.21 ms
Summary: 1113 out of 10000 rounds (11.13%)  Delay distribution: Min: 0.81 ms;  Max: 275.94 ms Avg: 5.06 ms

Surprise! results are actually better than if we put mixed load as we can observe any delay only in about 11%.

However if we run the same side load on DPE2 we get:

Summary: 5349 out of 10000 rounds (53.49%)  Delay distribution: Min: 0.81 ms;  Max: 519.61 ms Avg: 5.02 ms
Summary: 5355 out of 10000 rounds (53.55%)  Delay distribution: Min: 0.81 ms;  Max: 526.95 ms Avg: 5.06 ms

Which is the worse result out there with over 50% samples produced inconsistent data and average delay for those over 5ms and outliers going to half a second.

From these results I read the side load on the node TO which updates are being propagated causes largest delay.

At this point I remembered there is one more test I can run. What is if I put side load on DPE3 server, from which I’m not touching from the test at all ?

Summary: 833 out of 10000 rounds (8.33%)  Delay distribution: Min: 0.66 ms;  Max: 353.61 ms Avg: 2.76 ms

No surprise here as DPE3 is not being directly read or written to the load on it should cause minimal delays to data propagation from DPE1 to DPE2.

The propagation latency we’ve observed in the test so far is quite good but it is not synchronous replication behavior – we still can’t treat the cluster as if it were single server from generic application. Right. Default configuration for Percona XtraDB Cluster at this point is to replicate data asynchronously, but still guaranty there is no conflicts and data inconsistency then updates are done on multiple nodes. There is an option you can enable to get fully synchronous replication behavior:

mysql> set global wsrep_causal_reads=1;
Query OK, 0 rows affected (0.00 sec)

When this option is enabled the cluster will wait for the data to be actually replicated (committed) before serving the read. The great thing is wsrep_causal_reads is session variables so you can mix different applications on the same cluster – some requiring better data consistency guarantees, other being OK with a little bit stale data but looking for the best performance possible.

So far so good. We can make cluster to handle significant load with small transactions and still have very respectful data propagation delay or we can enable wsrep_causal_reads=1 option and get full data consistency. But what happens if we have some larger transactions ? To test this I have created the copy of sbtest table and will run a long update while running my test to see how the latency is impacted:

mysql> update sbtest2 set k=k+1;
Query OK, 1000000 rows affected (1 min 14.12 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

Running this query on the DPE1 box I’m getting following result:

...
Result Mismatch for Value 48;  Retries: 1   Delay: 0.76 ms
Result Mismatch for Value 173;  Retries: 1   Delay: 1.21 ms
Result Mismatch for Value 409;  Retries: 1   Delay: 0.86 ms
Result Mismatch for Value 460;  Retries: 142459   Delay: 46526.7 ms
Result Mismatch for Value 461;  Retries: 65   Delay: 22.92 ms
Result Mismatch for Value 464;  Retries: 1   Delay: 0.71 ms
Result Mismatch for Value 465;  Retries: 1   Delay: 0.76 ms
...
Summary: 452 out of 10000 rounds (4.52%)  Delay distribution: Min: 0.66 ms;  Max: 46526.7 ms Avg: 104.28 ms

So the propagation delay was pretty good until this given query had to be replicated, in which case we could observe the replication delay for over 45 seconds which is quite nasty.
Note though delay was for less period than it takes to execute the query on the master. This is because application of the changes on the master in parallel and updates to the sbtest table and sbtest2 table can be done in parallel (even changes to the same table can) but the certification process is serial as well as sending write set to the other nodes, and it must be taking some 45 seconds to send the write set and perform certification.

If we run the same query on DPE2 the interesting thing happens. The script does not show any data propagation delays but it visibly stalls, as I guess because the UPDATE statement issued to DPE1 is blocked for some time. To check this idea I decided to use the sysbench script with very simple point update queries to see if we get any significant stalls. My base run on DPE1 is as follows:

root@dpe01:/etc/mysql# sysbench --test=oltp --oltp-auto-inc=off --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key  --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=1  --init-rng=on --max-requests=0 --max-time=300 run
....
    per-request statistics:
         min:                                  0.68ms
         avg:                                  0.88ms
         max:                                306.80ms
         approx.  95 percentile:               0.94ms
....

We can see quite respectful performance with longest request taking some 300ms – so no stalls. Lets do the run again now running the same update statement on the different cluster node:

per-request statistics:
         min:                                  0.69ms
         avg:                                  1.12ms
         max:                              52334.76ms
         approx.  95 percentile:               0.97ms

As we see there is a stall in update for 50+ second, again while certification is happening. So certification does not only delay data propagation but can stall updates done to the different tables on the different nodes.

Summary:

Percona XtraDB Cluster performs very well when it comes to small transactions offering very small propagation delay and an option of synchronous replication all together. However when it comes to large transactions you can get in a lot of trouble with major stalls both in terms of data propagation and in terms of writes. The system I did test on is pretty old and I would expect modern systems can run certification several times faster still taking tens of seconds for what I would consider medium size transaction modifying 1 million of rows is rather long time. So make sure to have a good understanding how large transactions your application has and how longs stalls it can handle.

Appendix:
As promised the script I was using for testing.

# The idea with this script is as follows. We have 2 nodes. We write to one node and when read from second node
# To see whenever we get the same data or different
$writer_host="dpe01";
$reader_host="dpe02";
$user="test";
$password="test";
$table="test.sbtest";
$increment=2;
$offset=1;
$max_id=1000;
$rounds=10000;
$writer=new mysqli($writer_host,$user,$password);
$reader=new mysqli($reader_host,$user,$password);
$total_delay=0;
$min_delay=100000000;
$max_delay=0;
$delays=0;
$sum_delay=0;
for($val=0; $val<$rounds;$val++)
{
$id=rand(1,$max_id);
$id=floor($id/$increment)*$increment+$offset;
$writer->query(“UPDATE $table set k=$val where id=$id”);
$tw=microtime(true);
/* Loop while we get the right result */
$retries=0;
while(true)
{
$result=$reader->query(“SELECT k from $table where id=$id”);
$row=$result->fetch_row();
if ($row[0]!=$val)
$retries++;
else
{
$tr=microtime(true);
break;
}
$result->close();
}
if ($retries!=0) /* If we had to retry compute stats */
{
$delay=round(($tr-$tw)*1000,2);
$delays++;
$sum_delay+=$delay;
$min_delay=min($min_delay,$delay);
$max_delay=max($max_delay,$delay);
echo(“Result Mismatch for Value $val; Retries: $retries Delay: $delay ms\n”);
}
}
if ($delays>0)
$avg_delay=round($sum_delay/$delays,2);
else
$avg_delay=0;
$delay_pct=round($delays/$val*100,3);
echo(“Summary: $delays out of $val rounds ($delay_pct%) Delay distribution: Min: $min_delay ms; Max: $max_delay ms Avg: $avg_delay ms\n”);
?>

Appendix2: Percona XtraDB Cluster related configuration

# PXC Settings for  Version: '5.5.29-55-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), wsrep_23.7.2.r3843
wsrep_node_address=10.9.9.1
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3
#wsrep_cluster_address=gcomm://
wsrep_slave_threads=8
wsrep_sst_method=xtrabackup
wsrep_cluster_name=DPE
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_sst_auth=root:

The post Investigating MySQL Replication Latency in Percona XtraDB Cluster appeared first on MySQL Performance Blog.

↧

Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries

March 6, 2013, 6:47 am

≫ Next: Oracle Technical Experts at the Percona Live MySQL Conference and Expo

≪ Previous: Investigating MySQL Replication Latency in Percona XtraDB Cluster

Accessing Percona XtraDB Cluster nodes in parallel with MySQL asynchronous calls

This post is followup to Peter’s recent post, “Investigating MySQL Replication Latency in Percona XtraDB Cluster,” in which a question was raised as to whether we can measure latency to all nodes at the same time. It is an interesting question: If we have N nodes, can we send queries to nodes to be executed in parallel?

To answer it, I decided to try a new asynchronous call to send a query to MySQL using a new MySQLnd driver for PHP. In this post I’ll only show how to make these calls, and in following posts how to measure latency to all nodes.

PHP does not provide a way for parallel execution, so this is where an asynchronous call helps. What does this call do? Basically we send a query to MySQL, and we do not wait for result but rather get response later.

The MySQLnd driver has been available since PHP 5.3, and in most part it mimics the standard MySQLi driver and functions it provides. But in addition to that, it also provides a function, mysqli_poll, which unfortunately is marked as “not documented,” however we still can use it — using an example from PHP docs.

So there is my example on how to access Percona XtraDB Cluster nodes in parallel:

$reader_hosts = array( "192.88.225.243", "192.88.225.242", "192.88.225.240", "192.88.225.160", "192.88.225.159" );
$all_links=array();
# Establish connections
foreach ($reader_hosts as $i) {
        $mysqli = new mysqli($i, 'root', '', 'test');
        if ($mysqli->connect_error) {
                echo 'Connect Error (' . $mysqli->connect_errno . ') '
                        . $mysqli->connect_error;
        }else{
                $all_links[]=$mysqli;
                $mysqli->query("SET wsrep_causal_reads=1");
        }
}
# Run queries in parallel:
foreach ($all_links as $linkid => $link) {
 $link->query("SELECT something FROM tableN WHERE ", MYSQLI_ASYNC);
}
$processed = 0;
do {
        $links = $errors = $reject = array();
        foreach ($all_links as $link) {
                $links[] = $errors[] = $reject[] = $link;
        }
    # loop to wait on results
    if (!mysqli_poll($links, $errors, $reject, 60)) {
        continue;
    }
    foreach ($links as $k=>$link) {
        if ($result = $link->reap_async_query()) {
            $res = $result->fetch_row();
            # Handle returned result
            mysqli_free_result($result);
        } else die(sprintf("MySQLi Error: %s", mysqli_error($link)));
        $processed++;
    }
} while ($processed < count($all_links));

As conclusion, we see that using PHP with MySQLnd drivers we can execute 5 MySQL asynchronous queries in parallel against 5 different nodes of Percona XtraDB Cluster.

The post Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries appeared first on MySQL Performance Blog.

↧

Oracle Technical Experts at the Percona Live MySQL Conference and Expo

March 12, 2013, 5:00 am

≫ Next: Percona MySQL University coming to Toronto this Friday!

≪ Previous: Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries

I’m pleased to announce that Oracle is sending some of their top technical people to speak at the Percona Live MySQL Conference and Expo. The conference takes place April 22-25, 2013 at the Santa Clara Convention Center and Hyatt Santa Clara.

Tomas Ulin, VP, MySQL Engineering for Oracle, will present an invited keynote talk on “Driving MySQL Innovation” during the Tuesday morning opening keynotes. With the recent release of MySQL 5.6, conference attendees will hear about the latest developments of this major MySQL release.

In addition to Tomas, Oracle MySQL technologists will also lead three breakout sessions at the Percona Live MySQL Conference:

• “MySQL 5.6: Performance Benchmarks, Tuning, and ‘Best’ Practices” by Dmitri Kravtchuk, MySQL Performance Architect

• “MySQL 5.6: What’s New in InnoDB” by Sunny Bains, Senior Principal Software Engineer

• “MySQL 5.6: Redefining Replication” by Luís Soares, Senior Software Engineer

Percona has always encouraged participation by all MySQL-related technologies in the Percona Live MySQL Conference and Expo. Over the past two years we have extended invitations to all major ecosystem participants to take part. We are very pleased to welcome Oracle speakers to the conference where they can join the MySQL community conversations and provide more enriching content around MySQL 5.6 and the future of Oracle MySQL.

Percona Live MySQL Conference and Expo

Percona Live MySQL Conference and Expo Logo

The 2013 Percona Live MySQL Conference and Expo promises to be bigger, better and more informative than 2012. The conference has more great speakers, more great sponsors, and a third conference day added to the schedule. Outstanding contributions from MySQL users and vendors in the form of keynotes, tutorials, breakout sessions, and more, ensure that this will be the liveliest, most engaging Percona Live event yet.

I invite you to join us in Santa Clara, April 22-25, by taking advantage of Advanced Rate Pricing now. As a special bonus, use discount code “Percona15” to receive 15% off the current rates (not valid for Expo Only passes).

See you in Santa Clara!

The post Oracle Technical Experts at the Percona Live MySQL Conference and Expo appeared first on MySQL Performance Blog.

↧

Percona MySQL University coming to Toronto this Friday!

March 20, 2013, 3:00 am

≫ Next: My Sessions at Percona Live MySQL Conference and Expo 2013

≪ Previous: Oracle Technical Experts at the Percona Live MySQL Conference and Expo

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Percona CEO Peter Zaitsev leads a track at Percona MySQL University in Raleigh, N.C. on Jan. 29, 2013.

Percona MySQL University, Toronto is taking place this Friday and I’m very excited about this event because it is a special opportunity to fit a phenomenal number of specific and focused MySQL technical talks all into one day, for free.

Over the course of the day we will cover some of the hottest topics in the MySQL space. There will be talks covering topics like MySQL 5.6, MySQL in the Cloud and High Availability for MySQL, as well as Percona XtraDB Cluster for MySQL. We have talks planned for nearly every MySQL developer and DBA, from anyone just starting with MySQL all the way to those of you who are seasoned MySQL Experts.

In addition to the conference presentations, we are providing you the opportunity to spend 45 (complementary) minutes one-on-one with a Percona MySQL Consulting expert. Do not delay in reserving your time with a Percona Expert, as the number of available slots are very limited. Reserve your slot from Percona MySQL University event registration page.

Finally we’re going to have a raffle at the end of the day. The prizes include a ticket next month’s Percona Live MySQL Conference and Expo 2013 – the largest event of the year in MySQL ecosystem, a full week of Percona Training, a Percona MySQL Support subscription, and more valuable prices.

Percona MySQL University will be held at the FreshBooks offices at 35 Golden Avenue, Suite 105, Toronto, Ontario M6R 2J5. Directions along with the day’s complete agenda and registration details are on our Eventbright page.

We hope you’ll join us and invite your friends and colleagues !

The post Percona MySQL University coming to Toronto this Friday! appeared first on MySQL Performance Blog.

↧

My Sessions at Percona Live MySQL Conference and Expo 2013

April 1, 2013, 9:05 am

≫ Next: Keynotes, BOFs, and the Community Networking Reception at Percona Live MySQL Conference and Expo

≪ Previous: Percona MySQL University coming to Toronto this Friday!

As is typical at the beginning of every April, many of us who submitted talks to the Percona Live MySQL Conference and Expo are wondering why we submitted so many. I had 3 official talks selected, including one that is a 6-hour tutorial, as well as a BoF. Here’s the highlights:

Percona XtraDB Cluster / Galera in Practice (6 hour tutorial):

I’ve been working on this tutorial since last summer. The first incarnation was at Percona Live NY last year, but I’ve altered it quite a bit and expanded to fit (hopefully) a 6 hour format. Expect a lot of down and dirty hands-on work with setting up, managing and monitoring PXC. MariaDB Galera Cluster users should fit right in:

http://www.percona.com/live/mysql-conference-2013/sessions/percona-xtradb-cluster-galera-practice-part-1

http://www.percona.com/live/mysql-conference-2013/sessions/percona-xtradb-cluster-galera-practice-part-2

Migrating to Percona XtraDB Cluster for MySQL

This is a 1-hour talk that covers the ins and outs of what it takes (and means) to migrate to PXC. Expect high level over view of PXC architectures with use cases (and anti-use cases), as well as a practical look at what configuring a PXC setup looks like. Again, Galera / MariaDB Galera Cluster users should fit right in here.

http://www.percona.com/live/mysql-conference-2013/sessions/migrating-percona-xtradb-cluster

The Hazards of Multi-writing in a Dual-Master setup

This talks covers the basics of why multi-writing multi-master architectures without a replication technology that somehow prevents or handles replication conflicts is a usually bad idea. If you’ve considered an architecture that uses standard MySQL async replication with any kind of circular replication and multi-node simultaneous writing, you should check this out:

http://www.percona.com/live/mysql-conference-2013/sessions/hazards-multi-writing-dual-master-setup

BOF: Galera / Percona XtraDB Cluster for MySQL

I also submitted a Birds of Feather talk for Galera and PXC. I know Percona PXC experts like myself and the Codership folks should be in attendance. I can’t speak for any representatives from the MariaDB Galera Cluster team, but they are certainly most welcome. I’d really like to focus this BoF on real-life experiences and “hard” questions about Galera that we can ask of Codership instead of yet another “Galera 101″ session if we can help it.

http://www.percona.com/live/mysql-conference-2013/sessions/bof-galera-percona-xtradb-cluster

Hope to see you at the conference! With all of the Galera talks at the conference, this is really the premier Galera conference of the year!

The post My Sessions at Percona Live MySQL Conference and Expo 2013 appeared first on MySQL Performance Blog.

↧

Keynotes, BOFs, and the Community Networking Reception at Percona Live MySQL Conference and Expo

April 15, 2013, 6:00 am

≫ Next: Percona XtraDB Cluster 5.5.30-23.7.4 for MySQL now available

≪ Previous: My Sessions at Percona Live MySQL Conference and Expo 2013

The Percona Live MySQL Conference and Expo begins next Monday and runs April 22-25, 2013. Attendees will see great keynotes from leaders in the industry including representatives from Oracle, Amazon Web Services, HP, Continuent, and Percona. They can also participate in thought provoking Birds of a Feather sessions on Tuesday night and the Wednesday night Community Networking Reception will be fun and entertaining with the presentation of the Community Awards and the Lightning Talks.

If you cannot attend the entire Percona Live MySQL Conference but want to take advantage of the keynotes, BOFs, and Community Networking Reception, I’m pleased to offer a limited number of $5 Expo Only passes. Use discount code “KEY” when registering for the Percona Live MySQL Conference. Hurry, though, as only 100 passes are available at this price! This discount is only available for new ticket purchases. The regular price for Expo Only passes is $50 prior to the conference and $100 onsite.

Percona Live MySQL Conference and Expo Logo

I’m personally looking forward to Matt Aslett’s keynote on Thursday morning of the Percona Live MySQL Conference. Matt is the Research Director, Data Management and Analytics for 451 Research. The description for his talk, “The State of the MySQL Ecosystem“, summarizes what we’re all coming to understand:

“It is now over three years since Oracle acquired MySQL along with Sun Microsystems. Fears for the open source database’s survival appear to have been misplaced as Oracle has increased investment in MySQL development. At the same time, a thriving ecosystem of potential alternatives and complementary products has emerged to provide MySQL users with greater choice in terms of both functionality and support. As a result of that choice, we are seeing the increasing independence of the ecosystem of MySQL-related products and services from MySQL itself – both in terms of a commercial product, and also a development project. The continued maturity of vendors such as Percona and SkySQL, as well as the formation of the MariaDB Foundation, has the potential to accelerate that trend. The MySQL ecosystem is far from fragmenting, but 451 Research’s updated survey of database users indicates that the center of gravity has begun to shift towards an increased state of independence.”

Mirroring the growing diversity in the MySQL ecosystem, Percona Live MySQL Conference attendees have an opportunity to hear from a variety of server projects in both the keynotes and breakout sessions including presentations on Oracle MySQL, Percona Server, and MariaDB during the Percona Live MySQL Conference.

Percona Live MySQL Conference Keynotes

Tuesday

Peter Zaitsev, Co-founder and CEO, Percona
“Our Ever-Evolving MySQL Ecosystem”
Simone Brunozzi, Senior Technology Evangelist, Amazon Web Services
“Databases in the Cloud: Present and Future”
Tomas Ulin, Vice President of MySQL Engineering, Oracle
Keynote Topic: “Driving MySQL Innovation”

Wednesday

Robert Hodges, CEO, Continuent
“How MySQL Can Thrive in the World of Massive Data Hype”
Keynote Panel from Amazon Web Services, Continuent, HP and Percona
“The Impact of MySQL 5.6 and Its Future in the Cloud”

Thursday

Matt Aslett, Research Manager, Data Management and Analytics, 451 Research
“The State of the MySQL Ecosystem”
Brian Aker, Fellow, HP Cloud Division
“The MySQL Ecosystem and Cloud Computing”

Breakout Sessions on Oracle MySQL

Sunny Bains, Senior Principal Software Engineer, Oracle
Topic: “MySQL 5.6: What’s New in InnoDB”
Dmitri Kravtchuk, MySQL Performance Architect, Oracle
Topic: “MySQL 5.6: Performance Benchmarks, Tuning, and ‘Best’ Practices”
Luís Soares, Senior Software Engineer, Oracle
Topic: “MySQL 5.6: Redefining Replication”

Breakout Sessions on Percona Server and Related Projects

Stewart Smith, Director of Server Development, Percona
Topic: “Percona Server 5.6”
Jay Janssen, Consulting Lead, Percona
Topic: “Migrating to Percona XtraDB Cluster”
Mark Atwood, Vipul Sabhaya, and Jim Cooley, HP
Topic: “Using Percona Server as Database-as-a-Service on OpenStack”

Breakout Sessions on MariaDB

Monty Widenius and Colin Charles, Chief Evangelist for MariaDB for Monty Program
Topic: “MariaDB 10.0 & What’s New With the Project”
Sergei Petrunia, Optimizer Developer, and Colin Charles, Chief Evangelist, MariaDB, for Monty Program Ab
Topic: “MariaDB Cassandra Interoperability”
Monty Widenius and Sergei Golubchik, VP Architecture, Monty Program Ab
Topic: “Replication Changes in MariaDB”
Sergei Golubchik, VP Architecture, Monty Program Ab
Topic: “Storage Engines and Other Plugins: What’s New?”

The Percona Live MySQL Conference includes a Diamond Sponsor Keynote panel on Wednesday morning on the “Impact of MySQL 5.6 and its Future in the Cloud”. Moderated by me, the panel will include MySQL industry leaders Simone Brunozzi, senior technology evangelist at Amazon Web Services; Robert Hodges, CEO of Continuent; Brian Aker, fellow, HP Cloud Division; and Peter Zaitsev, co-founder and CEO of Percona. The discussion will focus on MySQL 5.6 and how MySQL must evolve if it is to remain competitive in the new world order of the cloud and big data.

If you can join us in Santa Clara for the Percona Live MySQL Conference and Expo, use discount code “Percona15″ to receive 15% off your full conference pass. If you can only make it to the keynotes, BOFs, or Community Networking Reception, use discount code “KEY” for a $5 Expo Only pass. And if you cannot make it this year, watch this blog following the conference and we’ll announce when and where the keynote recordings and breakout session slide decks can be found.

The post Keynotes, BOFs, and the Community Networking Reception at Percona Live MySQL Conference and Expo appeared first on MySQL Performance Blog.

↧

Percona XtraDB Cluster 5.5.30-23.7.4 for MySQL now available

April 16, 2013, 11:33 pm

≫ Next: Follow these basics when migrating to Percona XtraDB Cluster for MySQL

≪ Previous: Keynotes, BOFs, and the Community Networking Reception at Percona Live MySQL Conference and Expo

Percona is glad to announce the release of Percona XtraDB Cluster 5.5.30-23.7.4 for MySQL on April 17, 2013. Binaries are available from the downloads area or from our software repositories.

New Features:

Percona XtraDB Cluster has implemented initial implementation of weighted quorum. Weight for node can be assigned via pc.weight option in the wsrep_provider_options variable. Accepted values are in the range [0, 255] (inclusive). Quorum is computed using weighted sum over group members.
Percona XtraDB Cluster binary will now be bundled with the libjemalloc library. For RPM/deb packages, this library will be available for download from our repositories. Benchmark showing the impact of memory allocators on MySQL performance can be found in this blogpost.
This release of Percona XtraDB Cluster has fixed number of foreign key and packaging bugs.

Bug Fixes:

Fixed yum dependencies that were causing conflicts in CentOS 6.3 during installation. Bug fixed #1031427 (Ignacio Nin).
In case the Percona XtraDB Cluster was built from the source rpm, wsrep revision information would be missing. Bug fixed #1128906 (Alexey Bychko).
The method of generating md5 digest over tuples in a table with no primary key was not deterministic which could lead to a node failure. Bug fixed #1019473 (Seppo Jaakola).
Percona XtraDB Cluster was built with YaSSL which could cause some of the programs that use it to crash. Fixed by building packages with OpenSSL support rather than the bundled YaSSL library. Bug fixed #1104977 (Raghavendra D Prabhu).
Clustercheck script would hang in case the MySQL server on a node is hung. As a consequence clustercheck script would never fail-over that server. Bug fixed #1035927 (Raghavendra D Prabhu).
High values in variables evs.send_window and evs.user_send_window could trigger cluster crash under high load. Bug fixed #1080539 (Teemu Ollakka).
Standard MySQL port would be used when port number wasn’t explicitly defined in the wsrep_node_incoming_address. Bug fixed #1082406 (Alex Yurchenko).
Dropping a non-existing temporary table would be replicated when TOI was used in wsrep_OSU_method variable. This bug was fixed for the case when DROP TEMPORARY TABLE statement was used, but it will still replicate in case DROP TABLE statement is used on a temporary table. Bug fixed #1084702 (Seppo Jaakola).
In case two nodes in a 3-node cluster had to abort due to inconsistency, one wouldn’t correctly notify the surviving node which would lead to surviving node to loose the primary component and cause subsequent downtime. Bug fixed #1108165 (Alex Yurchenko).
In some cases non-uniform foreign key reference could cause a slave crash. Fixed by using primary key of the child table when appending exclusive key for cascading delete operation. Bug fixed #1089490 (Seppo Jaakola).
Parallel applying would fail in case mixed CHAR and VARCHAR columns would be used in foreign key definitions. Bug fixed #1100496 (Seppo Jaakola).
Debian packages included the old version of innotop. Fixed by removing innotop and its InnoDBParser Perl package from source and Debian installation. Bug fixed #1032139 (Alexey Bychko).
The mysqld_safe script would fail to retrieve the Galera replication position on Ubuntu 10.04, because the different shell was used. Bug fixed #1108431 (Alex Yurchenko).
Cascading foreign key constraint could lead to unresolved replication conflict and leave a slave hanging. Bug fixed #1130888 (Seppo Jaakola).
If MySQL replication threads were started before running wsrep recovery, this would lead to memory corruption and server crash. Bug fixed #1132974 (Seppo Jaakola).
Conflicting prepared statements in multi-master use case could cause node to hang. This was happening due to prepared statement execution loop, which does not honor wsrep status codes correctly. Bug fixed #1144911 (Seppo Jaakola).
State Snapshot Transfer with Xtrabackup would fail if the tmpdir was specified more than once in the MySQL configuration file (my.cnf). Bugs fixed #1160047 and #1086978 (Raghavendra D Prabhu).
Donor node would run XtraBackup indefinitely when xtrabackup tmpdir was set up on tmpfs. Bug fixed #1086978 (Alex Yurchenko).
Issues with compiling Galera on the ARM architecture has been fixed. Bug fixed #1133047 (Alex Yurchenko).
Upstream bugfix for bug #59354 triggered a regression that could cause transaction conflicts. Bug fixed #1158221 (Seppo Jaakola).
Galera builds would fail when they were built with the new boost library. Bug fixed #1131736 (Alex Yurchenko).
Folder lost+found wasn’t included in the rsync SST filter, this caused the SST failure due to insufficient privileges. Fixed by excluding lost+found folder if found. Bug fixed #1154095 (Alex Yurchenko).
If variable innodb_thread_concurrency has been defined to throttle InnoDB access, and work load contained DDL statements, a cluster node could remain hanging for unresolved MDL conflict. Fixed by adding a new method to cancel a thread waiting for InnoDB concurrency. Bug fixed #1155183 (Seppo Jaakola).
Handling of the network issues in Galera has been improved. Bug fixed #1153727 (Teemu Ollakka).
Fixed the wrong path in the /etc/xinetd.d/mysqlchk script. Bugs fixed #1000761 and #1132934 (Raghavendra D Prabhu).
When upgrading the Percona-XtraDB-Cluster-server package, /usr/bin/clustercheck script would get overwritten, and any changes (such as username and password) would be lost. Bug fixed #1158443 (Raghavendra D Prabhu).
In case CREATE TABLE AS SELECT statement was running in parallel with the DDL statement on the selected table, in some cases first statement could be left hanging. Bug fixed #1164893 (Seppo Jaakola).
Galera builds would fail when gcc 4.8 was used. Bug fixed #1164992 (Alex Yurchenko).
Percona-XtraDB-Cluster-galera package version number didn’t match the wsrep_provider_version one. Bug fixed #1111672 (Alexey Bychko).
Only rpm debug build was available for Percona XtraDB Cluster, fixed by providing the deb debug build as well. Bug fixed #1096123 (Ignacio Nin).

Other bug fixes: bug fixed #1162421 (Seppo Jaakola), bug fixed #1093054 (Alex Yurchenko), bug fixed #1166060 (Teemu Ollakka), bug fixed #1166065 (Teemu Ollakka).

Based on Percona Server 5.5.30-30.2 including all the bug fixes in it and on Codership wsrep API 5.5.30-23.7.4, Percona XtraDB Cluster 5.5.30-23.7.4 is now the current stable release. All of Percona’s software is open-source and free. Release notes for Percona XtraDB Cluster 5.5.30-23.7.4 are available in our online documentation.

We did our best to eliminate bugs and problems, but this is a software, so bugs are expected. If you encounter them, please report them to our bug tracking system.

UPDATE[18-04-2013]: There was a RPM packaging regression introduced with the fix for bug #710799. This regression only affected clean RPM installations and not upgrades. We have pushed the fixed packages to the repositories. Bug fixed #1170024.

The post Percona XtraDB Cluster 5.5.30-23.7.4 for MySQL now available appeared first on MySQL Performance Blog.

↧

Follow these basics when migrating to Percona XtraDB Cluster for MySQL

May 1, 2013, 3:00 am

≫ Next: Galera Flow Control in Percona XtraDB Cluster for MySQL

≪ Previous: Percona XtraDB Cluster 5.5.30-23.7.4 for MySQL now available

Galera/Percona XtraDB Cluster (PXC) for MySQL is a hot thing right now and some users jump right in without enough testing. Consequently, they’re more likely to either suffer failure or issues that prevent them from moving forward. If you are thinking of migrating your workload to Percona XtraDB Cluster, make sure to go through these basics.

log_slave_updates is REQUIRED

You need to have log_slave_updates enabled on the cluster node acting as async slave for replicated events from the async master to be applied to the other nodes, that is if you have more than one PXC node. This is because before Galera can create writesets for the replicated events, binlog events must be generated for the transactions first. Under normal async replication, an event will not be written to the slave’s binary unless log_slave_updates is enabled, this is similar to Percona XtraDB Cluster in that if you want an async event replicated to the whole cluster you have to have the same enabled.

MyISAM in PXC May Lead to Inconsistencies and May Not Even Work!

MyISAM tables are supported within Percona XtraDB Cluster, however, MyISAM has only basic support, primarily because the storage engine is non-transactional and so PXC cannot guarantee the data will remain consistent within the cluster. Also, at the time of this writing, from async stream, MyISAM is not being replicated at all which I reported on this bug. This would be a showstopper for anyone who wants to, but still have MyISAM tables. You can still try by filtering MyISAM tables though if you can leave them behind. Lastly, once that bug above is fixed, and you still have MyISAM tables you wish to keep running under PXC, wsrep_replicate_myisam allows you to do so. However, if you can, you should consider moving to InnoDB altogether. There are very few reasons to stay with MyISAM nowadays i.e. if you have FULLTEXT you simply cannot replace in short term.

Control Your Auto-Incrementing Columns

PXC/Galera controls auto-incrementing values internally within the cluster, this is to avoid collisions when INSERTs are happening on not only a single node. However, this may work differently when replicating from an async master, for example like the one described on these two bugs. Galera use writesets to replicate cluster events to the other nodes, in essence these are RBR events, plus a few additional structures used for certification. Having said that, it would be good if your async master can use ROW based binlog format as well to achieve better consistency, if you have an async master <= 5.0 though, you can workaround this by turning off wsrep_auto_increment_control from the Percona XtraDB Cluster nodes as workaround. Note that with the latter, make sure to not forget turning the feature back on when you switch to the new cluster especially if you are planning to write on multiple nodes.

Have PRIMARY KEYS

If you still have tables without PRIMARY KEYs, then its time to make one for them. Galera does not work well with those and even if there is basic support when wsrep_certify_nonPK is enabled, you can still hit issues like when automatic creation of primary keys for use during certification becomes non-deterministic. Although the previous bug has been fixed on latest release (5.5.30-23.7.4), table without PK imposes an additional overhead, and because cluster performance is somewhat dependent on the slowest node – this overhead can easily become visible on the whole cluster and your async replication being affected.

Be Prepared for some Latency

PXC can take workloads, however not just any workload – it shines with small transactions but not with big ones. If you are consistently running overnight reporting jobs and getting them through the replication stream expect some replication lag. This is because because synchronous replication inside PXC has an additional overhead, this means the SQL_THREAD will not be able to execute events as fast, on top of that, other factors affecting async replication like if your workload is CPU or IO bound. Peter wrote some good details about it here.

If you have encountered any other issues replicating to Percona XtraDB Cluster, I’d like to hear your thoughts and experience on the comments

The post Follow these basics when migrating to Percona XtraDB Cluster for MySQL appeared first on MySQL Performance Blog.

↧

Galera Flow Control in Percona XtraDB Cluster for MySQL

May 2, 2013, 3:00 am

≫ Next: Percona XtraDB Cluster for MySQL and encrypted Galera replication

≪ Previous: Follow these basics when migrating to Percona XtraDB Cluster for MySQL

Last week at Percona Live, I delivered a six-hour tutorial about Percona XtraDB Cluster (PXC) for MySQL. I actually had more material than I covered (by design), but one thing I regret we didn’t cover was Flow control. So, I thought I’d write a post covering flow control because it is important to understand.

What is flow control?

One of the things that people don’t often expect when switching to Galera is existence of a replication feedback mechanism, unlike anything you find in standard async MySQL replication. It is my belief that the lack of understanding of this system, or even that it exists, leads to unnecessary frustration with Galera and cluster “stalls” that are preventable.

This feedback, called flow control, allows any node in the cluster to instruct the group when it needs replication to pause and when it is ready for replication to continue. This prevents any node in the synchronous replication group from getting too far behind the others in applying replication.

This may sound counter-intuitive at first: how would synchronous replication get behind? As I’ve mentioned before, Galera’s replication is synchronous to the point of ensuring transactions are copied to all nodes and global ordering is established, but apply and commit is asynchronous on all but the node the transaction is run on.

It’s important to realize that Galera prevents conflicts to such transactions that have been certified but not yet applied, so multi-node writing will not lead to inconsistencies, but that is beyond the scope of this post.

Tuning flow control

Flow control is triggered when a Synced node exceeds a specific threshold relative to the size of the receive queue (visible via the wsrep_local_recv_queue global status variable). Donor/Desynced nodes do not apply flow control, though they may enter states where the recv_queue grows substantially. Therefore care should be taken for applications to avoid using Donor/Desynced nodes, particularly when using a blocking SST method like rsync or mysqldump.

So, flow control kicks in when the recv queue gets too big, but how big is that? And when is flow control relaxed? There are a few settings that are relevant here, and they are all configured via the wsrep_provider_options global variable.

gcs.fc_limit

This setting controls when flow control engages. Simply speaking, if the wsrep_local_recv_queue exceeds this size on a given node, a pausing flow control message will be sent. However, it’s a bit trickier than that, because of fc_master_slave (see below).

The fc_limit defaults to 16 transactions. This effectively means that this is as far as a given node can be behind committing transactions from the cluster.

gcs.fc_master_slave

The fc_limit is modified dynamically if you have fc_master_slave disabled (which it is by default). This mode actually adjusts the fc_limit dynamically based on the number of nodes in the cluster. The more nodes in the cluster, the larger the calculated fc_limit becomes. The theory behind this is that the larger the cluster gets (and presumably busier with more writes coming from more nodes), the more leeway each node will get to be a bit further behind applying.

If you only write to a single node in PXC, then it is recommended you disable this feature by setting fc_master_slave=YES. Despite its name, this setting really does no more than to change if the fc_limit is dynamically resized or not. It contains no other magic that helps single node writing in PXC to perform better.

gcs.fc_factor

If fc_limit controls when flow control is enabled, then fc_factor addresses when it is released. The factor is a number between 0.0 and 1.0, which is multiplied by the current fc_limit (adjusted by the above calculation if fc_master_slave=NO). This yields the number of transactions the recv queue must fall BELOW before another flow control message is sent by the node giving the cluster permission to continue replication.

This setting traditionally defaulted to 0.5, meaning the queue had to fall below 50% of the fc_limit before replication was resumed. A large fc_limit in this case might mean a long wait before flow control gets relaxed again. However, this was recently modified to a default of 1.0 to allow replication to resume as soon as possible.

An example configuration tuning flow control in a master/slave cluster might be:

mysql> set global wsrep_provider_options="gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0";

Working with flow control

What happens during flow control

Simply speaking: flow control makes replication stop, and therefore makes writes (which are synchronous) stop, on all nodes until flow control is relaxed.

In normal operation we would expect that a large receive queue might be the result of some brief performance issue on a given node, or perhaps the effect of some large transaction briefly stalling an applier thread.

However, it is possible to halt queue applying on any node by simply by running “FLUSH TABLES WITH READ LOCK”, or perhaps by “LOCK TABLE”, in which case flow control will kick in just as soon as the fc_limit is exceeded. Therefore, care must be taken that your application or some other maintenance operation (like a backup) doesn’t inadvertently cause flow control on your cluster.

The cost of increasing the fc_limit

Keeping the fc_limit small has ~~two~~ three purposes:

It limits the amount of delay any node in the cluster might have applying cluster transactions. Therefore, it keeps reads more up to date without needing to use wsrep_causal_reads.
It minimizes the expense of certification by keeping the window between new transactions being committed and the oldest unapplied transaction small. The larger the queue is, the more costly certification gets. EDIT: actually the cost of certification depends only the size of the transactions, which translates into number of unique key lookups into the certification index, which is a hash table. A small fc_limit does however keep the certification index smaller in memory.
It keeps the certification interval small, which minimizes replication conflicts on a cluster where writes happen on all nodes.

On a master/slave cluster, therefore, it’s reasonable to increase the fc_limit because the only lagging nodes will be the slaves with no writes coming from them. However, with multi-node writing, larger queues will make ~~certification more expensive~~ replication conflicts more likely and therefore time-consuming to the application.

How to tell if flow control is happening and where it is coming from

There are two global status variables you can check to see what flow control is happening:

wsrep_flow_control_paused – the fraction of time (out of 1.0) since the last SHOW GLOBAL STATUS that flow control is effect, regardless of which node caused it. Generally speaking, anything above 0.0 is to be avoided.
wsrep_flow_control_sent – the number of flow control messages sent by the local node to the cluster. This can be used to discover which node is causing flow control.

I would strongly recommend monitoring and graphing wsrep_flow_control_sent so you can tell if and when flow control is happening and what node (or nodes) are causing it.

Using myq_gadgets, I can easily see flow control if I execute a FLUSH TABLES WITH READ LOCK on node3:

[root@node3 ~]# myq_status wsrep
Wsrep    Cluster        Node           Queue   Ops     Bytes     Flow        Conflct
    time  name P cnf  #  name  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt dst lcf bfa
09:22:17 myclu P   3  3 node3 Sync T/T   0   0   0   9    0  13K 0.0   0 101   0   0
09:22:18 myclu P   3  3 node3 Sync T/T   0   0   0  18    0  28K 0.0   0 108   0   0
09:22:19 myclu P   3  3 node3 Sync T/T   0   4   0   3    0 4.3K 0.0   0 109   0   0
09:22:20 myclu P   3  3 node3 Sync T/T   0  18   0   0    0    0 0.0   0 109   0   0
09:22:21 myclu P   3  3 node3 Sync T/T   0  27   0   0    0    0 0.0   0 109   0   0
09:22:22 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 0.9   1 109   0   0
09:22:23 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:24 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:25 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:26 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:27 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:20 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:21 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0
09:22:22 myclu P   3  3 node3 Sync T/T   0  29   0   0    0    0 1.0   0 109   0   0

Notice node3′s queue fills up, it sends 1 flow control message (to pause) and then Flow control is in a pause state 100% of the time. We can tell flow control came from this node because ‘Flow snt’ shows a message sent as soon as flow control is engaged.

Flow control and State transfer donation

Donor nodes should not cause flow control because they are moved from the Synced to the Donor/Desynced state. Donors in that state will continue to apply replication as they are permitted, but will build up a large replication queue without flow control if they are blocked by the underlying SST method, i.e., by FLUSH TABLES WITH READ LOCK.

The post Galera Flow Control in Percona XtraDB Cluster for MySQL appeared first on MySQL Performance Blog.

↧

Percona XtraDB Cluster for MySQL and encrypted Galera replication

May 3, 2013, 3:00 am

≫ Next: Is Synchronous Replication right for your app?

≪ Previous: Galera Flow Control in Percona XtraDB Cluster for MySQL

Few people realize that Galera/Percona XtraDB (PXC) replication can be encrypted via SSL for secure transfer of your replicated data. Setting this up is actually quite easy to do and probably will look familiar to a lot of people.

Setting up SSL and Galera

Create and propagate a single key/cert pair

First, we create a private key/cert pair:

[root@node1 ssl]# openssl req -new -x509 -days 365000 -nodes -keyout key.pem -out cert.pem
Generating a 2048 bit RSA private key
..............+++
................................................................+++
writing new private key to 'key.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:
[root@node1 ssl]# ls -lah
total 16K
drwxr-xr-x. 2 root root 4.0K Apr  1 12:08 .
dr-xr-x---. 4 root root 4.0K Apr  1 12:03 ..
-rw-r--r--. 1 root root 1.2K Apr  1 12:08 cert.pem
-rw-r--r--. 1 root root 1.7K Apr  1 12:08 key.pem

Note that we are creating a certificate with a very long expiration time. If you use the default expiration of 1 year, your cluster will fail on the first state change after the expiration date has past.

Note also that we currently need to use the same cert and key on every node, so our next step is to copy these files to all our other nodes. Technically you should probably do this over a secure channel between the nodes via ssh or similar:

[root@node1 ssl]# scp  *.pem root@node2:.

Once we have the files on all nodes, let’s put them into /etc/mysql so they are in a common place with correct permissions:

[root@node1 ssl]# mkdir /etc/mysql
[root@node1 ssl]# mv *.pem /etc/mysql
[root@node1 ssl]# cd /etc/mysql
[root@node1 mysql]# chown -R mysql.mysql /etc/mysql/
[root@node1 mysql]# chmod -R o-rwx /etc/mysql/
[root@node1 mysql]# ls -lah
total 16K
drwxr-x---.  2 mysql mysql 4.0K Apr  1 12:12 .
drwxr-xr-x. 60 root  root  4.0K Apr  1 12:12 ..
-rw-r-----.  1 mysql mysql 1.2K Apr  1 12:08 cert.pem
-rw-r-----.  1 mysql mysql 1.7K Apr  1 12:08 key.pem

These are just examples of how you might do it. Just take care to not expose your private key and keep it secure as possible while still getting it copied amongst your nodes.

Configuring Galera

The configuration here is quite easy:

wsrep_provider_options          = "socket.ssl_cert=/etc/mysql/cert.pem; socket.ssl_key=/etc/mysql/key.pem"

We simply configure the wsrep provider with our certificate and key files on all our nodes.

However, it’s not possible to have a mixed cluster where some have SSL and some do not. This is best configured when you are setting up a new cluster, but if you need to add this on a production system, you’ll unfortunately need to rebootstrap the cluster and take a [brief] outage.

In my case, I have an existing non-SSL cluster, so to re-bootstrap, I simply:

[root@node3 mysql]# service mysql stop
[root@node2 mysql]# service mysql stop
[root@node1 mysql]# service mysql stop
[root@node1 mysql]# service mysql start --wsrep_cluster_address=gcomm://
[root@node2 mysql]# service mysql start
[root@node3 mysql]# service mysql start

There should be no need for SST in this case: each node was shutdown cleanly, and brought back up cleanly. As soon as the first node is restarted with SSL enabled, all future nodes must also have it enabled.

Other SSL options

It is also possible to set the following options (though they seem to have sane defaults to me):

socket.ssl_cipher = AES128-SHA by default
socket.ssl_compression = yes by default

Is Synchronous Replication right for your app?

May 14, 2013, 3:00 am

≫ Next: Multicast replication in Percona XtraDB Cluster (PXC) and Galera

≪ Previous: Percona XtraDB Cluster for MySQL and encrypted Galera replication

I talk with lot of people who are really interested in Percona XtraDB Cluster (PXC) and mostly they are interested in PXC as a high-availability solution. But, what they tend not to think too much about is if moving from async to synchronous replication is right for their application or not.

Facts about Galera replication

There’s a lot of different facts about Galera that come into play here, and it isn’t always obvious how they will affect your database workload. For example:

Transaction commit takes approximately the worst packet round trip time (RTT) between any two nodes in your cluster.
Transaction apply on slave nodes is still asynchronous from client commit (except on the original node where the transaction is committed)
Galera prevents writing conflicts to these pending transactions while they are inflight in the form of deadlock errors. (This is actually a form of Eventual Consistency where the client is forced to correct the problem before it can commit. It is NOT the typical form of Eventual Consistency, known as asynchronous repair, that most people think of).

Callaghan’s Law

But what does that all actually mean? Well, at the Percona Live conference a few weeks ago I heard a great maxim that really helps encapsulate a lot of this information and puts it into context with your application workload:

[In a Galera cluster] a given row can’t be modified more than once per RTT

This was attributed to Mark Callaghan from Facebook by Alexey Yurchenko from Codership at his conference talk. Henceforth this will be known as “Callaghan’s law” in Galera circles forever, though Mark didn’t immediately recall saying it.

Applied to a standalone Innodb instance

Let’s break it down a bit. Our unit of locking in Innodb is a single row (well, the PRIMARY KEY index entry for that row). This means typically on a single Innodb node we can have all sorts modifications floating around as long as they don’t touch the same row. Row locks are held for modifications until the transaction commits and that takes an fsync to the redo log by default, so applying Callaghan’s law to single-server Innodb, we’d get:

[On a single node Innodb server] a given row can’t be modified more than the time to fsync

You can obviously relax that by simply not fsyncing every transaction (innodb_flush_log_at_trx_commit != 1), or work around it with by fsyncing to memory (Battery or capacitor-backed write cache), etc., but the principle is basically the same. If we want this transaction to persist after a crash, it has to get to disk.

This has no effect on standard MySQL replication from this instance, since MySQL replication is asynchronous.

What about semi-sync MySQL replication?

It’s actually much worse than Galera. As I illustrated in a blog post last year, semi-sync must serialize all transactions and wait for them one at a time. So, Callaghan’s law applied to semi-sync is:

[On a semi-sync replication master] you can’t commit (at all) more than once per RTT.

Applied to a Galera cluster

In the cluster we’re protecting the data as well, though not by ensuring it goes to disk (though you can do that). We protect the data by ensuring it gets to every node in the cluster.

But why every node and not just a quorum? Well, it turns out transaction ordering really, really matters (really!). By enforcing replication to all nodes, we can (simultaneously) establish global ordering for the transaction, so by the time the original node gets acknowledgement of the transaction back from all the other nodes, a GTID will also (by design) be established. We’ll never end up with non-deterministic ordering of transactions as a result.

So this brings us back to Callaghan’s law for Galera. We must have group communication to replicate and establish global ordering for every transaction, and the expense of doing that for Galera is approximately one RTT between the two nodes in the cluster that are furthest apart (regardless of where the commit comes from!). The least amount of data we can change in Innodb at a time is a single row, so the most any single row can be modified cluster-wide is once per RTT.

What about WAN clusters?

Callaghan’s law applies to WAN clusters as well. LANs usually have sub-millisecond RTTs. WANs usually have anywhere from a few ms up to several hundred. This really will open a large window where rows won’t be able to be updated more than just a few times a second at best.

Some things the rule does not mean on Galera

It does NOT mean you can’t modify different rows simultaneously. You can.
It does NOT mean you can’t modify data on multiple cluster nodes simultaneously. You can.
It does NOT set an lower bound on performance, only a upper bound. The best performance you can expect is modifying a given row once per RTT, it could get slower if apply times start to lag.

So what about my application?

Think about your workload. How frequently do you update any given row? We call rows that are updated heavily “hotspots“.

Examples of hotspots

Example 1: Your application is an online game and you keep track of global achievement statistics in a single table with a row for each stat; there are just a few hundred rows. When a player makes an achievement, your application updates this table with a statement like this:

UPDATE achievements SET count = count + 1 where achievement = 'killed_troll';

How many players might accomplish this achievement at the same time?

Example 2: You have users and groups in your application. These are maintained in separate tables and there also exists a users_groups table to define the relationship between them. When someone joins a group, you run a transaction that adds the relationship row to users_groups, but also updates groups with some metadata:

BEGIN;
INSERT INTO users_groups (user_id, group_id) VALUES (100, 1);
UPDATE groups SET last_joined=NOW(), last_user_id=100 WHERE id=1;
COMMIT;

How often might multiple users join the same group?

Results

In both of the above examples you can imagine plenty of concurrent clients attempting to modify the same record at once. But what will actually happen to the clients who try to update the same row within the same RTT? This depends on which node in the cluster the writes are coming from:

From the same node: This will behave just like standard Innodb. The first transaction will acquire the necessary row locks while it commits (which will take the 1 RTT). The other transactions will lock wait until the lock(s) they need are available. The application just waits in those cases.

From other nodes: First to commit wins. The others that try to commit AFTER the first and while the first is still in the local apply queue on their nodes will get a deadlock error.

So, the best case (which may not be best for your application database throughput) will be more write latency into the cluster. The worst case is that your transactions won’t even commit and you have to take some action you normally wouldn’t have had to do.

Workarounds

If your hotspots were really bad in standalone Innodb, you might consider relaxing the fsync: set innodb_flush_log_at_trx_commit to something besides 1 and suddenly you can update much faster. I see this tuning very frequently for “performance” reasons when data durability isn’t as crucial. This is fine as long as you weigh both options carefully.

But in Galera you cannot relax synchronous replication. You can’t change the law, you can only adapt around it, but how might you do that ?

Write to one node

If your issue is really the deadlock errors and not so much the waiting, you could simply send all your writes to one node. This should prevent the deadlock errors, but will not change the lock waiting that your application will need to do for hotspots.

wsrep_retry_autocommit

If your hotspots are all updates with autocommits, you can rely on wsrep_retry_autocommit to auto-retry the transactions for you. However, each autocommit is retried only the number of times specified by this variable (default is 1 retry). This means more waiting, and after the limit is exceeded you will still get the deadlock error.

This is not implemented for full BEGIN … COMMIT multi-statement transactions since it cannot be assumed that those are not applying application logic in between the statements that is not safe to retry after the database state changes.

retry deadlocks

Now we start to get into (*gasp*) territory where your application needs to be modified. Generally if you use Innodb, you should be able to handle deadlock errors in your application. Raise your hands if your application has that logic (I usually get less than 5 people who do out of 100).

But, what to do? Retrying automatically, or giving your end user a chance to retry manually are typical answers. However, this means more latency waiting for a write to go through, and possibly some poor user experience.

batch writes

Instead of updating global counters one at a time (from Example 1, above), how about maintaining the counter in memcache or redis and only flushing to the database periodically?

if( $last_count % 100 == 0 ) {
  $db->do( "UPDATE achievements SET count = $last_count where achievement = 'killed_troll'";
}

change your schema

In Example 2, above, how above moving the ‘joined’ column to the users_groups table so we don’t need to update the parent group row so often?

INSERT INTO users_groups (user_id, group_id, joined) VALUES (100, 1, NOW());

Conclusion

Choosing a system to replicate your data to a distributed system requires tradeoffs. Most of us are used to the tradeoffs we take when deploying conventional stand-alone MySQL Innodb with asynchronous slaves. We may not think about the tradeoffs, but we’re making them (anyone obsessively testing slave position to ensure it’s caught up with the master?).

Synchronous replication with PXC and Galera is no different in that there are trade-offs, they just aren’t what we commonly expect.

If Callaghan’s law is going to cause you trouble and you are not prepared to adapt to work with it, PXC/Galera Synchronous replication is probably not right for you.

The post Is Synchronous Replication right for your app? appeared first on MySQL Performance Blog.

↧

Multicast replication in Percona XtraDB Cluster (PXC) and Galera

June 5, 2013, 3:00 am

≫ Next: Percona XtraDB Cluster (PXC) in the real world: Share your use cases!

≪ Previous: Is Synchronous Replication right for your app?

Bandwidth multiplication and synchronous clusters

I’ve seen a lot of people setting up clusters with 3-6+ nodes on 1 Gbps networks. 1 Gbps seems like a lot, doesn’t it? Actually, maybe not as much as you think. While the theoretical limit of 1Gbps is actually 120MBps, I start to get nervous around 100MBps. By default Galera uses unicast TCP for replication. Because synchronous replication needs to replicate to all nodes at once, this means 1 copy of your replication message is sent to other node in the cluster. The more nodes in your cluster, the more the bandwidth required for replication multiplies. Now, this isn’t really much different from standard mysql replication. 1 master with 5 async slaves is going to send a separate replication stream to each, so your bandwidth requirements will be similar. However, with async replication you have the luxury of not blocking the master from taking writes if bandwidth is constrained and the slaves lag for a bit, not so in Galera. So, let’s see this effect in action. I have a simple script that outputs the network throughput on an interface every second. I’m running a sysbench test on one node and measuring the outbound (UP) bandwidth on that same node:

# 2 nodes in the cluster
eth1 DOWN:24 KB/s UP:174 KB/s
eth1 DOWN:25 KB/s UP:172 KB/s
eth1 DOWN:27 KB/s UP:196 KB/s
eth1 DOWN:27 KB/s UP:195 KB/s
eth1 DOWN:27 KB/s UP:197 KB/s
eth1 DOWN:27 KB/s UP:200 KB/s
# 3 nodes in the cluster
eth1 DOWN:74 KB/s UP:346 KB/s
eth1 DOWN:79 KB/s UP:357 KB/s
eth1 DOWN:77 KB/s UP:342 KB/s
eth1 DOWN:79 KB/s UP:368 KB/s
eth1 DOWN:81 KB/s UP:368 KB/s
eth1 DOWN:78 KB/s UP:363 KB/s

This isn’t much traffic in my puny local VMs, but you get the idea. We can clearly see some factor in play adding the extra nodes.

Multicast to the rescue!

One way to address this bandwidth constraint is to switch to multicast UDP replication in Galera. This is actually really easy to do. First, we need to make sure our environment will support multicast. This is a question for your network guys and beyond the scope of this post, but in my trivial VM environment, I just need to make sure that the multicast address space routes to my Galera replication interface, eth1:

[all nodes]# ip ro add dev eth1 224.0.0.0/4
[all nodes]# ip ro show | grep 224
224.0.0.0/4 dev eth1  scope link

In that space, we pick an unused mcast address (again, talk to your network guys). I’m using 239.192.0.11, so we’ll add this to our my.cnf:

wsrep_provider_options          = "gmcast.mcast_addr=239.192.0.11"

If you already have wsrep_provider_options set, add it to the semicolon separated list instead of a separate line in your config. If we already have a running cluster, we need to shut it down, configure our mcast address and re-bootstrap it:

[root@node3 mysql]# service mysql stop
[root@node2 mysql]# service mysql stop
[root@node1 mysql]# service mysql stop

[root@node1 mysql]# service mysql start --wsrep_cluster_address=gcomm://
[root@node2 mysql]# service mysql start
[root@node3 mysql]# service mysql start

We can see that a multicast node still needs to bind to the Galera replication port, and of course that needs to be bound to the interface that the multicast will be received on.

[root@node3 mysql]# lsof -P +p 17493 | grep LISTEN
mysqld 17493 mysql 11u IPv4 39669 0t0 TCP *:4567 (LISTEN)
mysqld 17493 mysql 20u IPv4 39685 0t0 TCP *:3306 (LISTEN)

Now, let’s re-do our above test:

# 2 nodes in the cluster
eth1 DOWN:15 KB/s UP:199 KB/s
eth1 DOWN:14 KB/s UP:195 KB/s
eth1 DOWN:15 KB/s UP:212 KB/s
eth1 DOWN:14 KB/s UP:204 KB/s
eth1 DOWN:13 KB/s UP:173 KB/s
# 3 nodes in the cluster
eth1 DOWN:62 KB/s UP:185 KB/s
eth1 DOWN:61 KB/s UP:187 KB/s
eth1 DOWN:52 KB/s UP:164 KB/s
eth1 DOWN:62 KB/s UP:187 KB/s
eth1 DOWN:64 KB/s UP:186 KB/s
eth1 DOWN:62 KB/s UP:193 KB/s

So, we can see our outbound bandwidth on our master node doesn’t change as we add more nodes when we are using multicast.

Other multicast tips

We can also also bootstrap nodes using the mcast address:

#wsrep_cluster_address = gcomm://192.168.70.2,192.168.70.3,192.168.70.4
wsrep_cluster_address = gcomm://239.192.0.11

And this works fine. Pretty slick! Note that IST and SST will still use TCP unicast, so we still want to make sure those are configured to use the regular IP of the node. Typically I just set the wsrep_node_address setting on each node if this IP is not the default IP of the server. I could not find a way to migrate an existing unicast cluster to multicast with a rolling update. I believe (but could be proven wrong) that you must re-bootstrap your entire cluster to enable multicast.

The post Multicast replication in Percona XtraDB Cluster (PXC) and Galera appeared first on MySQL Performance Blog.

↧

Percona XtraDB Cluster (PXC) in the real world: Share your use cases!

June 17, 2013, 3:00 am

≫ Next: MySQL Webinar: Percona XtraDB Cluster Operations, June 26

≪ Previous: Multicast replication in Percona XtraDB Cluster (PXC) and Galera

The aim of this post is to enumerate real-world usage of Percona XtraDB Cluster (PXC), and also to solicit use cases from the readers. One of the prominent usages in the production environment that we have come across (and our Percona consultants have assisted) is that of HP Cloud. There is a post about it here by Patrick Galbraith of HP. The post focuses on their deployment of PXC for HP Cloud DNS. The post focuses on the key aspects of synchronous replication setup with high-availability guarantees like split-brain immunity.

Nobody likes to debug async replication while its broken or do the master-master/master-slave switchover when master is dying/dead. Yes, there are wrappers/scripts around this to make life easier, however, wouldn’t it be nice if this was built into the system itself? PXC based on Galera strives to provide that. Scaling makes sense only when addition/removal of hosts from a cluster or a HA setup is simple and uncomplicated.

Their post focuses on following aspects:

Initial setup
Setup of other nodes with SST (Xtrabackup SST)
Integration of chef with PXC
Finally, integration of HAProxy as a loadbalancer.

To elucidate, their initial setup goes into bootstrapping the first node. Note that in the cloud environment other nodes are not known until they are brought up, hence bootstrapping with an empty gcomm:// is done for the first node by the chef. The second node is then added which SSTs with node1 (based on gcomm://node1 of node2) through Xtrabackup SST (state snapshot transfer). Node3 subsequently joins the cluster with node1 and node2 in its gcomm:// (since by this time node1, node2 are up). After this, a subsequent run of chef-client is done to update the cnf files with IP address of members (excluding itself). The rationale behind this is that when a node is restarted (and there are others when it comes up) it joins the cluster seamlessly. I would like to note here that we are adding a bootstrap parameter to PXC so that any latter modifications like these to cnf files are not required and preset it during cluster startup itself. The only caveat is that the node information – IP address or hostname – should be known in advance (the node itself needn’t be up), which may not be feasible in a cloud environment.

Next, the SST. Xtrabackup SST is used there. SST matters a lot because not only is it used during initial node setup but also it is required when a node has been down for a while and IST (incremental state transfer) is not feasible. It also helps when node data integrity is compromised. So, naturally duration of SST is paramount. We recommend Xtrabackup SST for its reduced locking period from its use (which means the donor is blocked for a shorter while). By using Xtrabackup for SST, you also get its benefits like compression, parallel streaming, encryption, compact backups which can be used for SST (Note, the wsrep_sst_xtrabackup in 5.5.30 can’t do those except parallel, the one in 5.5.31 will handle them all, also XB 2.1 is required for most).

Finally, the HAProxy. HAProxy is one of the loadbalancers recommended for use with PXC. The other one is glb. HAProxy is used with xinetd on the node along with a script which checks PXC for its sync status. As referenced in that post, you can refer a post by Peter Boros (“Percona XtraDB Cluster reference architecture with HaProxy“) for details. In their setup they have automated this with a HAProxy in each AZ (Availability Zone) for the API server. To add, we are looking at reducing the overhead here, through steps like replacing xinetd and clustercheck with a single serving process (we are adding one in 5.5.31), looking for optimizations with HAProxy to account for high connection rates, and using pacemaker with PXC. The goal is to reduce the overhead of status checks, mainly on the node. You can also look this PLMCE talk for HAProxy deployment strategies with PXC.

To conclude, it is interesting to note that they have been able to manage this with a small team. That strongly implies scalability of resources – you scale more with less, and that is how it should be. We would like to hear from you about your architectural setup around PXC – any challenges you faced (and horror stories if any), any special deployment methodologies you employed (Puppet, Chef, Salt, Ansible etc. ), and finally any suggestions.

The post Percona XtraDB Cluster (PXC) in the real world: Share your use cases! appeared first on MySQL Performance Blog.

↧

MySQL Webinar: Percona XtraDB Cluster Operations, June 26

June 18, 2013, 1:13 pm

≫ Next: Changing an async slave of a PXC cluster to a new Master

≪ Previous: Percona XtraDB Cluster (PXC) in the real world: Share your use cases!

Percona XtraDB Cluster (PXC) was released over a year ago and since then there has been tremendous interest and adoption. There’s plenty of talks that explain the fundamentals of PXC, but we’re starting to reach a threshold where it’s easier to find folks with PXC in production and such the need for more advanced talks has arisen.

As such, I wanted to shift gears from the standard introductory talk and focus instead more on some key questions/issues/pain-points for those with PXC in production already. As such, I’m giving a webinar entitled Percona XtraDB Cluster Operations on June 26th 2013 from 10-11AM PDT. Topics will include:

Backups from the cluster
Avoiding SST
Flow Control
What and How to Monitor
Tuning best practices

This webinar is not meant to necessarily be exhaustive, but to cover key topics that administrators of PXC will commonly ask about. You can register for the webinar here.

The post MySQL Webinar: Percona XtraDB Cluster Operations, June 26 appeared first on MySQL Performance Blog.

↧