Quantcast
Channel: XtraDB Cluster – Percona Database Performance Blog
Viewing all 110 articles
Browse latest View live

Percona XtraDB Cluster 5.6.35-26.20 is now available

$
0
0
Percona XtraDB Cluster

Percona XtraDB Cluster 5.6.34-26.19

Percona announces the release of Percona XtraDB Cluster 5.6.35-26.20 on March 10, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.35-26.20 is now the current release, based on the following:

All Percona software is open-source and free. Details of this release can be found in the 5.6.35-26.20 milestone on Launchpad.

There are no new features or bug fixes to the main components, besides upstream changes and the following fixes related to packaging:

  • BLD-593: Limited the use of rm and chown by mysqld_safe to avoid exploits of the CVE-2016-5617 vulnerability. For more information, see 1660265.
    Credit to Dawid Golunski (https://legalhackers.com).
  • BLD-610: Added version number to the dependency requirements of the full RPM package.
  • BLD-645: Fixed mysqld_safe to support options with a forward slash (/). For more information, see 1652838.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!


Percona XtraDB Cluster 5.7.17-27.20 is now available

$
0
0
Percona XtraDB Cluster

Percona XtraDB ClusterPercona announces the release of Percona XtraDB Cluster 5.7.17-27.20 on March 16, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.17-27.20 is now the current release, based on the following:

All Percona software is open-source and free. Details of this release can be found in the 5.7.17-27.20 milestone on Launchpad.

There are no new features or bug fixes to the main components, besides upstream changes and the following fixes related to packaging:

  • BLD-512: Fixed startup of garbd on Ubuntu 16.04.2 LTS (Xenial Xerus).
  • BLD-519: Added the garbd debug package to the repository.
  • BLD-569: Fixed grabd script to return non-zero if it fails to start.
  • BLD-570: Fixed service script for garbd on Ubuntu 16.04.2 LTS (Xenial Xerus) and Ubuntu 16.10 (Yakkety Yak).
  • BLD-593: Limited the use of rm and chown by mysqld_safe to avoid exploits of the CVE-2016-5617 vulnerability. For more information, see 1660265.
    Credit to Dawid Golunski (https://legalhackers.com).
  • BLD-610: Added version number to the dependency requirements of the full RPM package.
  • BLD-643: Fixed systemctl to mark mysql process as inactive after it fails to start and not attempt to start it again. For more information, see 1662292.
  • BLD-644: Added the which package to PXC dependencies on CentOS 7. For more information, see 1661398.
  • BLD-645: Fixed mysqld_safe to support options with a forward slash (/). For more information, see 1652838.
  • BLD-647: Fixed systemctl to show correct status for mysql on CentOS 7. For more information, see 1644382.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

How We Made Percona XtraDB Cluster Scale

$
0
0
wsrep-stages

Percona XtraDB ClusterIn this blog post, we’ll look at the actions and efforts Percona experts took to scale Percona XtraDB Cluster.

Introduction

When we first started analyzing Percona XtraDB Cluster performance, it was pretty bad. We would see contention even with 16 threads. Performance was even worse with sync binlog=1, although the same pattern was observed even with the binary log disabled. The effect was not only limited to OLTP workloads, as even other workloads (like update-key/non-key) were also affected in a wider sense than OLTP.

That’s when we started analyzing the contention issues and found multiple problems. We will discuss all these problems and the solutions we adapted. But before that, let’s look at the current performance level.

Check this blog post for more details.

The good news is Percona XtraDB Cluster is now optimized to scale well for all scenarios, and the gain is in the range of 3x-10x.

Understanding How MySQL Commits a Transaction

Percona XtraDB Cluster contention is associated mainly with Commit Monitor contention, which comes into the picture during commit time. It is important to understand the context around it.

When a commit is invoked, it proceeds in two phases:

  • Prepare phase: mark the transaction as PREPARE, updating the undo segment to capture the updated state.
    • If bin-log is enabled, redo changes are not persisted immediately. Instead, a batch flush is done during Group Commit Flush stage.
    • If bin-log is disabled, then redo changes are persisted immediately.
  • Commit phase: Mark the transaction commit in memory.
    • If bin-log is enabled, Group Commit optimization kicks in, thereby causing a flush of redo-logs (that persists changes done to db-objects + PREPARE state of transaction) and this action is followed by a flush of the binary logs. Since the binary logs are flushed, redo log capturing of transaction commit doesn’t need to flush immediately (Saving fsync)
    • If bin-log is disabled, redo logs are flushed on completion of the transaction to persist the updated commit state of the transaction.

What is a Monitor in Percona XtraDB Cluster World?

Monitors help maintain transaction ordering. For example, the Commit Monitor ensures that no transaction with a global-seqno greater than the current commit-processing transaction’s global seqno is allowed to proceed.

How Percona XtraDB Cluster Commits a Transaction

Percona XtraDB Cluster follows the existing MySQL semantics of course, but has its own step to commit the transaction in the replication world. There are two important themes:

  1. Apply/Execution of transaction can proceed in parallel
  2. Commit is serialized based on cluster-wide global seqno.

Let’s understand the commit flow with Percona XtraDB Cluster involved (Percona XtraDB Cluster registers wsrep as an additional storage engine for replication).

  • Prepare phase:
    • wsrep prepare: executes two main actions:
      • Replicate the transaction (adding the write-set to group-channel)
      • Entering CommitMonitor. Thereby enforcing ordering of transaction.
    • binlog prepare: nothing significant (for this flow).
    • innobase prepare: mark the transaction in PREPARE state.
      • As discussed above, the persistence of the REDO log depends on if the binlog is enabled/disabled.
  • Commit phase
    • If bin-log is enabled
      • MySQL Group Commit Logic kicks in. The semantics ensure that the order of transaction commit is the same as the order of them getting added to the flush-queue of the group-commit.
    • If bin-log is disabled
      • Normal commit action for all registered storage engines is called with immediate persistence of redo log.
    • Percona XtraDB Cluster then invokes the post_commit hook, thereby releasing the Commit Monitor so that the next transaction can make progress.

With that understanding, let’s look at the problems and solutions:

PROBLEM-1:

Commit Monitor is exercised such that the complete commit operation is serialized. This limits the parallelism associated with the prepare-stage. With log-bin enabled, this is still ok since redo logs are flushed at group-commit flush-stage (starting with 5.7). But if log-bin is disabled, then each commit causes an independent redo-log-flush (in turn probable fsync).

OPTIMIZATION-1:

Split the replication pre-commit hook into two explicit actions: replicate (add write-set to group-channel) + pre-commit (enter commit-monitor).

The replicate action is performed just like before (as part of storage engine prepare). That will help complete the InnoDB prepare action in parallel (exploring much-needed parallelism in REDO flush with log-bin disabled).

On completion of replication, the pre-commit hook is called. That leads to entering the Commit Monitor for enforcing the commit ordering of the transactions. (Note: Replication action assigns the global seqno. So even if a transaction with a higher global seqno finishes the replication action earlier (due to CPU scheduling) than the transaction with a lower global seqno, it will wait in the pre-commit hook.)

Improved parallelism in the innodb-prepare stage helps accelerate log-bin enabled flow, and the same improved parallelism significantly helps in the log-bin disabled case by reducing redo-flush contention, thereby reducing fsyncs.


PROBLEM-2:

MySQL Group Commit already has a concept of ordering transactions based on the order of their addition to the GROUP COMMIT queue (FLUSH STAGE queue to be specific). Commit Monitor enforces the same, making the action redundant but limiting parallelism in MySQL Group Commit Logic (including redo-log flush that is now delayed to the flush stage).

With the existing flow (due to the involvement of Commit Monitor), only one transaction can enter the GROUP COMMIT Queue, thereby limiting optimal use of Group Commit Logic.

OPTIMIZATION-2:

Release the Commit Monitor once the transaction is successfully added to flush-stage of group-commit. MySQL will take it from there to maintain the commit ordering. (We call this interim-commit.)

Releasing the Commit Monitor early helps other transactions to make progress and real MySQL Group Commit Leader-Follower Optimization (batch flushing/sync/commit) comes into play.

This also helps ensure batch REDO log flushing.


PROBLEM-3:

This problem is specific to when the log-bin is disabled. Percona XtraDB Cluster still generates the log-bin, as it needs it for forming a replication write-set (it just doesn’t persist this log-bin information). If disk space is not a constraint, then I would suggest operating Percona XtraDB Cluster with log-bin enabled.

With log-bin disabled, OPTIMIZATION-1 is still relevant, but OPTIMIZATION-2 isn’t, as there is no group-commit protocol involved. Instead, MySQL ensures that the redo-log (capturing state change of transaction) is persisted before reporting COMMIT as a success. As per the original flow, the Commit Monitor is not released till the commit action is complete.

OPTIMIZATION-3:

The transaction is already committed to memory and the state change is captured. This is about persisting the REDO log only (REDO log modification is already captured by mtr_commit). This means we can release the Commit Monitor just before the REDO flush stage kicks in. Correctness is still ensured as the REDO log flush always persists the data sequentially. So even if trx-1 loses its slots before the flush kicks in, and trx-2 is allowed to make progress, trx-2’s REDO log flush ensures that trx-1’s REDO log is also flushed.


Conclusion

With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.

Tracking IST Progress in Percona XtraDB Cluster

$
0
0
wsrep-stages

ISTIn this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

  1. The duration the node was not present in the cluster
  2. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

  • Start a two-node cluster
  • Process some basic workloads, allow cluster replication
  • Shutdown node-2
  • Node-1 then continues to process more workloads (the workload fits the allocated gcache)
  • Restart Node-2, causing it to trigger an IST

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

Enabling Percona XtraDB Cluster SST Traffic Encryption

$
0
0
wsrep-stages

Percona XtraDB Cluster SST Traffic EncryptionIn this blog post, we’ll look at enabling Percona XtraDB Cluster SST Traffic Encryption, and some of the changes to the SSL-based encryption of SST traffic in Percona XtraDB Cluster 5.7.16.

Some background

Percona XtraDB Cluster versions prior to 5.7 support encryption methods 0, 1, 2 and 3:

  • encrypt = 0 : (default) No encryption
  • encrypt = 1 : Symmetric encryption using AES-128, user-supplied key
  • encrypt = 2 : SSL-based encryption with a CA and cert files (via socat)
  • encrypt = 3 : SSL-based encryption with cert and key files (via socat)

We are deprecating modes encrypt=1,2,3 in favor of the new mode, encrypt=4. “encrypt=3” is not recommended, since it does not verify the cert being used (it cannot verify since no Certificate Authority (CA) file is provided). “encrypt=2” and “encrypt=3” use a slightly different way of building the SSL files than MySQL does. In order to remove confusion, we’ve deprecated these modes in favor of “encrypt=4”, which can use the MySQL generated SSL files.

New feature: encrypt= 4

The previous SSL methods (encrypt=2 and encrypt=3), are based on socat usage, http://www.dest-unreach.org/socat/doc/socat-openssltunnel.html. The certs are not built the same way as the certs created by MySQL (for encryption of client communication with MySQL). To simplify SSL configuration and usage, we added a new encryption method (encrypt=4) so that the SSL files generated by MySQL can now be used for SSL encryption of SST traffic.

For instructions on how to create these files, see https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/encrypt-traffic.html.

Configuration

In general, galera views the cluster as homogeneous, so it expects that all nodes are identically configured. This extends to the SSL configuration, so the preferred configuration is that all machines share the same certs/keys. The security implication is that possession of these certs/keys allows a machine to join the cluster and receive all of the data. So proper care must be taken to secure the SSL files.

The mode “encrypt=4” uses the same option names as MySQL, so it reads the SSL information from “ssl-ca”, “ssl-cert”, and “ssl-key” in the “[sst]” section of the configuration file.

Example my.cnf:

[sst]
 encrypt=4
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

All three options (ssl-ca, ssl-cert, and ssl-key) must be specified otherwise the SST will return an error.

ssl-ca

This is the location of the Certificate Authority (CA) file. Only servers that have certificates generated from this CA file will be allowed to connect if SSL is enabled.

ssl-cert

This is the location fo the Certificate file. This is the digital certificate that will be sent to the other side of the SSL connection. The remote server will then verify that this certificate was generated from the Certificate Authority file in use by the remote server.

ssl-key

This is the location of the private key for the certificate specified in ssl-cert.

Simplified Percona XtraDB Cluster SSL Configuration

$
0
0
wsrep-stages

Percona XtraDB Cluster SSLIn this blog post, we’ll look at a feature that recently added to Percona XtraDB Cluster 5.7.16, that makes it easier to configure Percona XtraDB Cluster SSL for all related communications. It uses mode “encrypt=4”, and configures SSL for both IST/Galera communications and SST communications using the same SSL files. “encrypt=4” is a new encryption mode added in Percona XtraDB Cluster 5.7.16 (we’ll cover it in a later blog post).

If this option is used, this will override all other Galera/SST SSL-related file options. This is to ensure that a consistent configuration is applied.
Using this option also means that the Galera/SST communications are using the same keys as client connections.

Example

This example shows how to startup a cluster using this option. We will use the default SSL files created by the bootstrap node. Basically, there are two steps:

  1. Set
    pxc_encrypt_cluster_traffic=ON
     on all nodes
  2. Ensure that all nodes share the same SSL files

Step 1: Configuration (on all nodes)

We enable the

pxc_encrypt_cluster_traffic
 option in the configuration files on all nodes. The default value of this option is “OFF”, so we enable it here.
[mysqld]
 pxc_encrypt_cluster_traffic=ON

Step 2: Startup the bootstrap node

After initializing and starting up the bootstrap node, the datadir will contain the necessary data files. Here is some SSL-related log output:

[Note] Auto generated SSL certificates are placed in data directory.
 [Warning] CA certificate ca.pem is self signed.
 [Note] Auto generated RSA key files are placed in data directory.

The required files are ca.pem, server-cert.pem and server-key.pem, which are the Certificate Authority (CA) file, the server certificate and the server private key, respectively.

Step 3: Copy the SSL files to all other nodes

Galera views the cluster as a set of homogeneous nodes, so the same configuration is expected on all nodes. Therefore, we have to copy the CA file, the server’s certificate and the server’s private key. By default, MySQL names these: ca.pem, server-cert.pem, and server-key.pem, respectively.

Step 4: Startup the other nodes

This is some log output showing that the SSL certificate files have been found. The other nodes should be using the files that were created on the bootstrap node.

[Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
[Note] Skipping generation of SSL certificates as certificate files are present in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Skipping generation of RSA key pair as key files are present in data directory.

This is some log output (with

log_error_verbosity=3
), showing the SST reporting on the configuration used.
WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
WSREP_SST: [DEBUG] with encrypt=4 ssl_ca=/my/data//ca.pem ssl_cert=/my/data//server-cert.pem ssl_key=/my/data//server-key.pem

Customization

The “ssl-ca”, “ssl-cert”, and “ssl-key” options in the “[mysqld]” section can be used to specify the location of the SSL files. If these are not specified, then the datadir is searched (using the default names of “ca.pem”, “server-cert.pem” and “server-key.pem”).

[mysqld]
 pxc_encrypt_cluster_traffic=ON
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

If you want to implement this yourself, the equivalent configuration file options are:

[mysqld]
wsrep_provider_options=”socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
[sst]
encrypt=4
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

How it works

  1. Determine the location of the SSL files
    1. Uses the values if explicitly specified (via the “ssl-ca”, “ssl-cert” and “ssl-key” options in the “[mysqld]” section)
    2. If the SSL file options are not specified, we look in the data directory for files named “ca.pem”, “server-cert.pem” and “server-key.pem” for the CA file, the server certificate, and the server key, respectively.
  2. Modify the configuration
    1. Overrides the values for socket.ssl_ca, socket.ssl_cert, and socket.ssl_key in
      wsrep_provider_options
       in the “[mysqld]” section.
    2. Sets “encrypt=4” in the “[sst]” section.
    3. Overrides the values for ssl-ca, ssl-cert and ssl-key in the “[sst]” section.

This is not a dynamic setting, and is only available on startup.

BEWARE: Increasing fc_limit can affect SELECT latency

$
0
0
SELECT Latency

SELECT LatencyIn this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

  • Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

  • If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
  • But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
  • Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

  • For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
  • We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.

- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
-----------------------------
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
-----------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
-------------------------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

  • Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).

Percona XtraDB Cluster Transaction Replay Anomaly

$
0
0
wsrep-stages

Replay AnomalyIn this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.

Anomaly

Let’s understand this with an example:

  • Let’s assume a two-node cluster (node-1 and node-2)
  • Base table “t” is created as follows:

create database test;
use test;
create table t (i int, c char(20), primary key pk(i)) engine=innodb;
insert into t values (1, 'abc'), (2, 'abc'), (4, 'abc');
select * from t;
mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | abc |
| 2 | abc |
| 4 | abc |
+---+------+

  • node-2 starts runs a transaction (trx-2):

trx-2: update t set c = 'pqr';

  • node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))

trx-1: insert into t values (3, 'a');

  • trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
  • trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
  • REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:

mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | pqr |
| 2 | pqr |
| 3 | a |
| 4 | pqr |
+---+------+

  • At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr'” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.

| mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:4' |
| mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN |
| mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) |
| mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing|
| mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ |
| mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:5' |
| mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN |
| mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) |
| mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F |
| mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ |
+------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------------------+
21 rows in set (0.00 sec)

  • We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
  • Is it interesting to note what happens on node-1:
    • node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.

Percona XtraDB Cluster: “dh key too small” error during an SST using SSL

$
0
0
wsrep-stages

dh key too smallIf you’ve tried to use SSL in Percona XtraDB Cluster and saw an error in the logs like SSL3_CHECK_CERT_AND_ALGORITHM:dh key too small, we’ve implemented some changes in Percona XtraDB Cluster 5.6.34 and 5.7.16 that get rid of these errors.

Some background

dh key too small refers to the Diffie-Hellman parameters used by the SSL code that are shorter than recommended.

Due to the Logjam vulnerability (https://weakdh.org/), the required key-lengths for the Diffie-Hellman parameters were changed from 512 bits to 2048 bits. Unfortunately, older versions of OpenSSL/socat still use 512 bits (and thus caused the error to appear).

Changes made to Percona XtraDB Cluster

Since versions of socat greater than 1.7.3 now use 2048 bits for the Diffie-Hellman parameters, we only do extra work for the older versions of socat (less than 1.7.3). The SST code now:

  1. Looks for a file with the DH params
    1. Uses the “ssl_dhparams” option in the [sst] section if it exists
    2. Looks for a “dhparams.pem” file in the datadir
  2. If the file is specified and exists, uses that file as a source for the DH parameters
  3. If the file does not exist, creates a dhparams.pem file in the datadir

Generating the dhparams yourself

Unfortunately, the time it can take several minutes to create the dhparams file. We recommend that the dhparams.pem be created prior to starting the SST.

openssl dhparam -out path/to/datadir/dhparams.pem 2048

Percona XtraDB Cluster: SST tmpdir option

$
0
0
wsrep-stages

SST tmpdirIn this blog post, I’ll discuss some changes to the behavior of the Percona XtraDB Cluster SST tmpdir option in the latest versions of Percona XtraDB Cluster 5.6.35-26.20-3 and 5.7.17-29.30.

Previously, we did not use the path specified by the tmpdir. From Percona XtraDB Cluster 5.6.35-26.20-3 and 5.7.17-29.20, we use the tmpdir option to specify the location of temporary files used by the SST (on the DONOR this may be very large since the backup logs may be stored here).

Specifying the tmpdir is as expected. You supply the directory path, as in the following example:

[sst]
tmpdir=/path/to/tmp/dir/

We look for the tmpdir in the [sst], [xtrabackup], and [mysqld] sections, in that order. If left unspecified, “mktemp” follows the default behavior (on Linux distributions this will typically create the directory in $TMPDIR, followed by /tmp).

Normal security settings and permissions apply here. The directory must already exist and be readable and writable by MySQL otherwise a fatal error will be generated by the SST script.

Improved wsrep-stages and related instrumentation in Percona XtraDB Cluster

$
0
0
wsrep-stages

wsrep-stagesIn this blog post, we’ll look at how we’ve improved wsrep-stages and related instrumentation in Percona XtraDB Cluster.

Introduction

When you execute a workload and need to find out what the given thread is working on, “SHOW PROCESSLIST” comes to the top of your mind. It is an effective way to track the thread status. We decided to improve the stages in Percona XtraDB Cluster to make “SHOW PROCESSLIST” more meaningful.

In the blog below, we will check out the different wsrep-stages and the significance associated with them.

Loading of data

Running a simple insert/sysbench prepare workload. The state is stable as it mainly captures MySQL stages indicating that the table is being updated:

| 9 | root | localhost | test | Query | 0 | update | INSERT INTO sbtest3(id, k, c, pad) VALUES(893929,515608,'28459951974-62599818307-78562787160-7859397 | 0 | 0 |

Running UPDATE workload

Running simple sysbench update-key workload. Let’s look at the different states that the user sees and their significance. (MASTER and SLAVE states are different and are marked accordingly.)

MASTER view:

  • This stage indicates that the write-set is trying to replicate. Global sequence numbers are assigned after the write-set is replicated and so the global-seqno is currently -1:

| 80 | root | localhost | test | Query | 0 | wsrep: initiating replication for write set (-1) | UPDATE sbtest4 SET k=k+1 WHERE id=502338 | 0 | 1 |

  • This stage indicates successful replication of the write-set. This means the write-set is now added to the group-channel. Global-seqno is updated in the message too:

| 79 | root | localhost | test | Query | 0 | wsrep: write set replicated (196575) | UPDATE sbtest3 SET k=k+1 WHERE id=502723 | 0 | 1 |

  • This stage indicates the write-set has successfully passed the certification stage (making its path clear for commit):

| 85 | root | localhost | test | Query | 0 | wsrep: pre-commit/certification passed (196574) | UPDATE sbtest7 SET k=k+1 WHERE id=495551 | 0 | 1 |

  • This stage indicates that InnoDB commit has been triggered for the write-set:

| 138 | root | localhost | test | Query | 0 | innobase_commit_low (585721) | UPDATE sbtest6 SET k=k+1 WHERE id=500551 | 0 | 1 |

SLAVE/Replicating node view:

  • This stage indicates that the slave thread is trying to commit the replicated write-set with the given seqno. It is likely waiting for its turn of the CommitMonitor:

|  6 | system user |           | NULL | Sleep   |    0 | wsrep: committing write set (224905) | NULL             |         0 |             0 |

  • This stage indicates a successful commit of the replicated write-set with the given seqno:

| 2 | system user | | NULL | Sleep | 0 | wsrep: committed write set (224896) | NULL | 0 | 0 |

  • This stage indicates that updating the rows is in progress. (Often it was difficult to know what the workload is trying to do: UPDATE/INSERT/DELETE.) Now there is an easy way to find out:

| 13 | system user |           | NULL | Sleep   |    0 | wsrep: updating row for write-set (178711) | NULL             |         0 |             0 |

| 18 | system user | | NULL | Sleep | 0 | wsrep: writing row for write-set (793454) | NULL | 0 | 0 |

| 11 | system user | | NULL | Sleep | 0 | wsrep: deleting row for write-set (842123) | NULL | 0 | 0 |

  • This stage indicates that the given write-set is being applied:

| 10 | system user | | NULL | Sleep | 0 | wsrep: applying write-set (899370) | NULL | 0 | 0 |

Improved Instrumentation

Let’s answer some simple questions that most profiling experts will face:

  • How long did replication take (adding write-set to channel)?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%in replicate%' order by time_in_mics desc limit 5;
+---------------------------------------+--------------+
| event_name | time_in_mics |
+---------------------------------------+--------------+
| stage/wsrep/wsrep: in replicate stage | 1.2020 |
| stage/wsrep/wsrep: in replicate stage | 0.7880 |
| stage/wsrep/wsrep: in replicate stage | 0.7740 |
| stage/wsrep/wsrep: in replicate stage | 0.7550 |
| stage/wsrep/wsrep: in replicate stage | 0.7480 |
+---------------------------------------+--------------+
5 rows in set (0.01 sec)

  • How long did it take for pre-commit/certification checks?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%in pre-commit%' order by time_in_mics desc limit 5;
+----------------------------------------+--------------+
| event_name | time_in_mics |
+----------------------------------------+--------------+
| stage/wsrep/wsrep: in pre-commit stage | 1.3450 |
| stage/wsrep/wsrep: in pre-commit stage | 1.0000 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9480 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9180 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9030 |
+----------------------------------------+--------------+
5 rows in set (0.01 sec)

  • How long did it take to commit a transaction on the slave (slave_thread=16 threads)?

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%committing%' order by time_in_mics desc limit 5;
+-----------+-------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+-------------------------------+--------------+
| 56 | stage/wsrep/wsrep: committing | 0.5870 |
| 58 | stage/wsrep/wsrep: committing | 0.5860 |
| 47 | stage/wsrep/wsrep: committing | 0.5810 |
| 59 | stage/wsrep/wsrep: committing | 0.5740 |
| 60 | stage/wsrep/wsrep: committing | 0.5220 |
+-----------+-------------------------------+--------------+
5 rows in set (0.00 sec)

  • Increasing the number of slave thread creates more contention (slave_thread=64):

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%committing%' order by time_in_mics desc limit 5;
+-----------+-------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+-------------------------------+--------------+
| 90 | stage/wsrep/wsrep: committing | 1.6930 |
| 97 | stage/wsrep/wsrep: committing | 1.5870 |
| 103 | stage/wsrep/wsrep: committing | 1.5140 |
| 87 | stage/wsrep/wsrep: committing | 1.2560 |
| 102 | stage/wsrep/wsrep: committing | 1.1040 |
+-----------+-------------------------------+--------------+
5 rows in set (0.00 sec)

  • The amount oftTime taken to apply a write-set:

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%applying%' order by time_in_mics desc limit 5;
+-----------+---------------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+---------------------------------------+--------------+
| 166 | stage/wsrep/wsrep: applying write set | 1.6880 |
| 168 | stage/wsrep/wsrep: applying write set | 1.5820 |
| 146 | stage/wsrep/wsrep: applying write set | 1.5270 |
| 124 | stage/wsrep/wsrep: applying write set | 1.4760 |
| 120 | stage/wsrep/wsrep: applying write set | 1.4440 |
+-----------+---------------------------------------+--------------+
5 rows in set (0.00 sec)

Conclusion

The improved wsrep-stage framework makes it more effective for a user to find out the state of a given thread. Using the derived instrumentation through wsrep-stage is a good way to understand where the time is being spent.

Percona Live Europe Session Interview: Building Multi-Petabyte Data Warehouses with ClickHouse

$
0
0
Percona Live Europe

Percona Live Europe 2017Percona Live Europe provides open source professionals with an opportunity to discuss how various technologies get used in order to solve database problems. Alexander Zaitsev of LifeStreet/Altinity gave one such session: Building Multi-Petabyte Data Warehouses with ClickHouse.

LifeStreet needed to scale their real-time ad analytics platform to multiple petabytes. They evaluated and used a number of open source and commercial solutions, but most solutions were not efficient enough or too expensive. When Yandex released ClickHouse to open source, LifeStreeet quickly realized its potential and started an implementation project. It took a bit of time and effort, but it finally worked out and became an excellent way to address scale in LifeStreet’s database environment.

In this presentation, LifeStreet/Altinity Director of Engineering Alexander Zaitsev talked about their experiences from an application developer’s viewpoint: what worked well and not so well, what challenges they had to overcome as well as share the best practices for building large-scale platforms based on ClickHouse.

I got a chance to talk with Alexander in the video below. Check it out!

This Week in Data with Colin Charles 17: AWS Re:Invent, a New Book on MySQL Cluster and Another Call Out for Percona Live 2018

$
0
0
Colin Charles

Colin Charles Open Source Database evangelist for PerconaJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The CFP for Percona Live Santa Clara 2018 closes December 22, 2017: please consider submitting as soon as possible. We want to make an early announcement of talks, so we’ll definitely do a first pass even before the CFP date closes. Keep in mind the expanded view of what we are after: it’s more than just MySQL and MongoDB. And don’t forget that with one day less, there will be intense competition to fit all the content in.

A new book on MySQL Cluster is out: Pro MySQL NDB Cluster by Jesper Wisborg Krogh and Mikiya Okuno. At 690 pages, it is a weighty tome, and something I fully plan on reading, considering I haven’t played with NDBCLUSTER for quite some time.

Did you know that since MySQL 5.7.17, connection control plugins are included? They help DBAs introduce an increasing delay in server response to clients after a certain number of consecutive failed connection attempts. Read more at the connection control plugins.

While there are a tonne of announcements coming out from the Amazon re:Invent 2017 event, I highly recommend also reading Some data of interest as AWS reinvent 2017 ramps up by James Governor. Telemetry data from sumologic’s 1,500 largest customers suggest that NoSQL database usage has overtaken relational database workloads! Read The State of Modern Applications in the Cloud. Page 8 tells us that MySQL is the #1 database on AWS (I don’t see MariaDB Server being mentioned which is odd; did they lump it in together?), and MySQL, Redis & MongoDB account for 40% of database adoption on AWS. In other news, Andy Jassy also mentions that less than 1.5 months after hitting 40,000 database migrations, they’ve gone past 45,000 over the Thanksgiving holiday last week. Have you started using AWS Database Migration Service?

Releases

Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better in 2017, since it is now two days long!

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

This Week in Data with Colin Charles 25: Meltdown/Spectre still dominate, FOSDEM approaches and Timescale gets funding

$
0
0
Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Still on Meltdown/Spectre, this time MariaDB Corporation has published Meltdown Vulnerability Impact On MariaDB Server – interesting the comparison between glibc/tcmalloc. Worthy Facebook thread about this too, with a bit of chat about MongoDB performance. Officially MongoDB says a degradation of 10-15%. ScaleGrid has a good post, in which they test MongoDB against AWS/Azure/DigitalOcean. It’s interesting that on AWS they see a 4-5% hit on insert workloads, but on Azure its 10-20% and on DigitalOcean its 30%! (The comparison for the balanced workload is that on AWS its 2-3%, 20-205% on Azure, and 30% on DigitalOcean).

We’ve chosen tutorials for Percona Live Santa Clara, and will be announcing them soon; and by the first week of February most talks as well. Don’t forget you have the option to still get the best price at registration now.

Don’t forget that in the next week, we have FOSDEM — many Perconians will be there and look forward to meeting with you. We have a community table. Peter Zaitsev has a talk in the main track. Don’t forget the fringe events: on 2nd February there is a pre-FOSDEM MySQL Day, the MySQL & Friends Community Dinner, and also a CentOS Dojo that I will be speaking at. Looking forward to a packed & fun-filled Friday, Saturday and Sunday!

It’s worth reading this: Timescale raises $16M from Benchmark, NEA, Two Sigma Ventures for the database to help us understand our machines and our world. Timescale has been an active participant in the Percona Live events, and this round of funding bodes very well for the PostgreSQL database world.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

This Week in Data with Colin Charles 30: Schedule for Percona Live, and Tracking Those Missing Features

$
0
0
Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Have you registered for Percona Live already? The tutorial grid, the schedules for day 1 and day 2 are pretty amazing, and there is even an extra track being added, for a total of 10 concurrent/parallel tracks during day 1 & day 2. If you submitted a talk and it didn’t get accepted (competition was high), you should have received a discount code to register for the event.

I plan to write more dedicated blog posts around M|18 and the MariaDB Developer’s Unconference. Hang on till next week? I gave my presentation, targeted at developers, titled MySQL features missing in MariaDB Server. It was a room of maybe 40-45 people, and had some debate, but not much more; there were some other excellent presentations as well.

Next week, Percona will be at SCALE16x. We are sponsors, so there will also be a booth (where you can get some interesting schwag), and don’t forget that both Peter Zaitsev and I have a talk (one on Friday, the other on Saturday). Looking forward to seeing you all there.

Releases

Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 30: Schedule for Percona Live, and Tracking Those Missing Features appeared first on Percona Database Performance Blog.


Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial

$
0
0
Percona XtraDB Cluster Tutorial

Percona XtraDB Cluster 5.7 TutorialPlease join Percona’s Architect, Tibi Köröcz as he presents Percona XtraDB Cluster 5.7 Tutorial on Wednesday, April 18, 2018, at 7:00 am PDT (UTC-7) / 10:00 am EDT (UTC-4).

Never used Percona XtraDB Cluster before? Come join this 45-minute tutorial where we will introduce you to the concepts of a fully functional Percona XtraDB Cluster.

In this tutorial, we will show you how you can install Percona XtraDB Cluster with ProxySQL, and monitor it with Percona Monitoring and Management (PMM).

We will also cover topics like bootstrap, IST, SST, Certification, common-failure situations and online schema changes.

Register for the webinar now.

Percona XtraDB ClusterTibor Köröcz, Senior Consultant

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications that can help or work with MySQL. In his spare time, he likes to spend time with his friends, travel around the world and play ultimate frisbee.

The post Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial appeared first on Percona Database Performance Blog.

Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together

$
0
0

Please join Percona’s Principal Architect Alex Rubin as he presents MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together to give you a highly available cluster database environment on Tuesday, May 29th at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

In this webinar, Alex will discuss how to deploy a highly available MySQL database environment on Kubernetes/Openshift using Percona XtraDB Cluster (PXC) together with MySQL Proxy to implement read/write splitting.

If you have never used Kubernetes and Openshift, or never used PXC / MySQL Proxy, Alex will do a quick introduction to these technologies. There will also be a demo where Alex sets up a PXC cluster with ProxySQL in Openshift Origin and tries to break it.

By the end of this webinar you will have a better understanding of:

  • How to deploy Percona XtraDB Cluster with ProxySQL for HA solutions
  • How to leverage Kubernetes/Openshift in your environments
  • How to troubleshoot performance issues

Register for the webinar

Alexander Rubin, Principal Consultant

Alexander RubinAlexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

The post Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together appeared first on Percona Database Performance Blog.

Linux OS Tuning for MySQL Database Performance

$
0
0
Linux OS tuning for MySQL database performance

Linux OS tuning for MySQL database performanceIn this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud. Other posts have addressed MySQL parameters, like Alexander’s blog MySQL 5.7 Performance Tuning Immediately After Installation. That post remains highly relevant for the latest versions of MySQL, 5.7 and 8.0. Here we will focus more on the Linux operating system parameters that can affect database performance.

Server and Operating System

Here are some Linux parameters that you should check and consider modifying if you need to improve database performance.

Kernel – vm.swappiness

The value represents the tendency of the kernel  to swap out memory pages. On a database server with ample amounts of RAM, we should keep this value as low as possible. The extra I/O can slow down or even render the service unresponsive. A value of 0 disables swapping completely while 1 causes the kernel to perform the minimum amount of swapping. In most cases the latter setting should be OK:

# Set the swappiness value as root
echo 1 > /proc/sys/vm/swappiness
# Alternatively, using sysctl
sysctl -w vm.swappiness=1
# Verify the change
cat /proc/sys/vm/swappiness
1
# Alternatively, using sysctl
sysctl vm.swappiness
vm.swappiness = 1

The change should be also persisted in /etc/sysctl.conf:

vm.swappiness = 1

Filesystems – XFS/ext4/ZFS
XFS

XFS is a high-performance, journaling file system designed for high scalability. It provides near native I/O performance even when the file system spans multiple storage devices.  XFS has features that make it suitable for very large file systems, supporting files up to 8EiB in size. Fast recovery, fast transactions, delayed allocation for reduced fragmentation and near raw I/O performance with DIRECT I/O.

The default options for mkfs.xfs are good for optimal speed, so the simple command:

# Use default mkfs options
mkfs.xfs /dev/target_volume

will provide best performance while ensuring data safety. Regarding mount options, the defaults should fit most cases. On some filesystems you can see a performance increase by adding the noatime mount option to the /etc/fstab.  For XFS filesystems the default atime behaviour is relatime, which has almost no overhead compared to noatime and still maintains sane atime values.  If you create an XFS file system on a LUN that has a battery backed, non-volatile cache, you can further increase the performance of the filesystem by disabling the write barrier with the mount option nobarrier. This helps you to avoid flushing data more often than necessary. If a BBU (backup battery unit) is not present, however, or you are unsure about it, leave barriers on, otherwise you may jeopardize data consistency. With this options on, an /etc/fstab file should look like the one below:

/dev/sda2              /datastore              xfs     defaults,nobarrier
/dev/sdb2              /binlog                 xfs     defaults,nobarrier

ext4

ext4 has been developed as the successor to ext3 with added performance improvements. It is a solid option that will fit most workloads. We should note here that it supports files up to 16TB in size, a smaller limit than xfs. This is something you should consider if extreme table space size/growth is a requirement. Regarding mount options, the same considerations apply. We recommend the defaults for a robust filesystem without risks to data consistency. However, if an enterprise storage controller with a BBU cache is present, the following mount options will provide the best performance:

/dev/sda2              /datastore              ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro
/dev/sdb2              /binlog                 ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro

Note: The data=writeback option results in only metadata being journaled, not actual file data. This has the risk of corrupting recently modified files in the event of a sudden power loss, a risk which is minimised with a presence of a BBU enabled controller. nobh only works with the data=writeback option enabled.

ZFS

ZFS is a filesystem and LVM combined enterprise storage solution with extended protection vs data corruption. There are certainly cases where the rich feature set of ZFS makes it an essential option to consider, most notably when advance volume management is a requirement. ZFS tuning for MySQL can be a complex topic and falls outside the scope of this blog. For further reference, there is a dedicated blog post on the subject by Yves Trudeau:

Disk Subsystem – I/O scheduler 

Most modern Linux distributions come with noop or deadline I/O schedulers by default, both providing better performance than the cfq and anticipatory ones. However it is always a good practice to check the scheduler for each device and if the value shown is different than noop or deadline the policy can change without rebooting the server:

# View the I/O scheduler setting. The value in square brackets shows the running scheduler
cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
# Change the setting
sudo echo noop > /sys/block/sdb/queue/scheduler

To make the change persistent, you must modify the GRUB configuration file:

# Change the line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
# to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop"

AWS Note: There are cases where the I/O scheduler has a value of none, most notably in AWS VM instance types where EBS volumes are exposed as NVMe block devices. This is because the setting has no use in modern PCIe/NVMe devices. The reason is that they have a very large internal queue and they bypass the IO scheduler altogether. The setting in this case is none and it is the optimal in such disks.

Disk Subsystem – Volume optimization

Ideally different disk volumes should be used for the OS installation, binlog, data and the redo log, if this is possible. The separation of OS and data partitions, not just logically but physically, will improve database performance. The RAID level can also have an impact: RAID-5 should be avoided as the checksum needed to ensure integrity is costly. The best performance without making compromises to redundancy is achieved by the use of an advanced controller with a battery-backed cache unit and preferably RAID-10 volumes spanned across multiple disks.

AWS Note: For further information about EBS volumes and AWS storage optimisation, Amazon has documentation at the following links:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-optimized-instances.html

Database settings

System Architecture – NUMA settings

Non-uniform memory access (NUMA) is a memory design where an SMP’s system processor can access its own local memory faster than non-local memory (the one assigned local to other CPUs). This may result in suboptimal database performance and potentially swapping. When the buffer pool memory allocation is larger than size of the RAM available local to the node, and the default memory allocation policy is selected, swapping occurs. A NUMA enabled server will report different node distances between CPU nodes. A uniformed one will report a single distance:

# NUMA system
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65525 MB
node 0 free: 296 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 9538 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 65536 MB
node 2 free: 12701 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 65535 MB
node 3 free: 7166 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10
# Uniformed system
numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 64509 MB
node 0 free: 4870 MB
node distances:
node   0
  0:  10

In the case of a NUMA system, where numactl shows different distances across nodes, the MySQL variable innodb_numa_interleave should be enabled to ensure memory interleaving. Percona Server provides improved NUMA support by introducing the flush_caches variable. When enabled, it will help with allocation fairness across nodes. To determine whether or not allocation is equal across nodes, you can examine numa_maps for the mysqld process with this script:

# The perl script numa_maps.pl will report memory allocation per CPU node:
# 3595 is the pid of the mysqld process
perl numa_maps.pl < /proc/3595/numa_maps
N0        :     16010293 ( 61.07 GB)
N1        :     10465257 ( 39.92 GB)
N2        :     13036896 ( 49.73 GB)
N3        :     14508505 ( 55.35 GB)
active    :          438 (  0.00 GB)
anon      :     54018275 (206.06 GB)
dirty     :     54018275 (206.06 GB)
kernelpagesize_kB:         4680 (  0.02 GB)
mapmax    :          787 (  0.00 GB)
mapped    :         2731 (  0.01 GB)

Conclusion

In this blog post we examined a few important OS related settings and explained how they can be tuned for better database performance.

While you are here …

You might also find value in this recorded webinar Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

 

The post Linux OS Tuning for MySQL Database Performance appeared first on Percona Database Performance Blog.

Percona Software News and Roadmap Update with CEO Peter Zaitsev: Q1 2017

$
0
0
Percona Software News and Roadmap Update

This blog post is a summary of the webinar Percona Software News and Roadmap Update – Q1 2017 given by Peter Zaitsev on January 12, 2017.

Over the last few months, I’ve had the opportunity to meet and talk with many of Percona’s customers. I love these meetings, and I always get a bunch of questions about what we’re doing, what our plans are and what releases are coming.

I’m pleased to say there is a great deal going on at Percona, and I thought giving a quick talk about our current software and services, along with our plans, would provide a simple reference for many of these questions.

A full recording of this webinar, along with the presentation slide deck, can be found here.

Percona Solutions and Services

Let me start by laying out Percona’s company purpose:

To champion unbiased open source database solutions.

What does this mean? It means that we write software to offer you better solutions, and we use the best of what software and technology exist in the open source community.

Percona stands by a set of principles that we feel define us as a company, and are a promise to our customers:

  • 100% free and open source software
  • Focused on finding solution to maximize your success
  • Open source database strategy consulting and implementation
  • No vendor lock-in required

We offer trusted and unbiased expert solutions, support and resource in a broad software ecosystem, including:

We also have specialization options for PaaS, IaaS, and SaaS solutions like Amazon Web Services, OpenStack, Google Cloud Platform, OpenShift, Ceph, Docker and Kubernetes.

Percona’s immediate business focus includes building long-term partnership relationships through support and managed services.

The next few sections detail our current service offerings, with some outlook on our plans.

98% Customer Satisfaction Rating

Over the last six months, Percona has consistently maintained a 98% Customer Satisfaction Rating!

Customer Success Team

Our expanded Customer Success Team is here to ensure you’re getting most out of your Percona Services Subscription.

Managed Services Best Practices

  • Unification based on best practices
  • Organization changes to offer more personal service
  • Increased automation

Ongoing Services

Percona Software News and Roadmap Update

Consulting and Training. Our consulting and training services are available to assist you with whatever project or staff needs you have.

  • Onsite and remote
  • 4 hours to full time (weeks or months)
  • Project and staff augmentation

Advanced HA Included with Enterprise and Premier Support. Starting this past Spring, we included advance high availability (HA) support as part of our Enterprise and Premier support tiers. This advanced support includes coverage for:

  • Percona XtraDB Cluster
  • MariaDB Galera Cluster
  • Galera Cluster for MySQL
  • Upcoming MySQL group replication
  • Upcoming MySQL Innodb Cluster

Enterprise Wide Support Agreements. Our new Enterprise Wide Support option allows you to buy per-environment support coverage that covers all of the servers in your environment, rather than on a per-server basis. This method of support can save you money, because it:

  • Covers both “MySQL” and “MongoDB”
  • Means you don’t have to count servers
  • Provides highly customized coverage

 

Simplified Support Pricing. Get easy to understand support pricing quickly.

To discuss how Percona Support can help your business, please call us at +1-888-316-9775 (USA),
+44 203 608 6727 (Europe), or have us contact you.

New Percona Online Store – Easy to Buy, Pay Monthly

Percona Software

Below are the latest and upcoming features in Percona’s software. All of Percona’s software adheres to the following principles:

  • 100% free and open source
  • No restricted “Enterprise” version
  • No “open core”
  • No BS-licensing (BSL)

Percona Server for MySQL 5.7

Overview

  • 100% Compatible with MySQL 5.7 Community Edition
  • 100% Free and Open Source
  • Includes Alternatives to Many MySQL Enterprise Features
  • Includes TokuDB Storage Engine
  • Focus on Performance and Operational Visibility

Latest Improvements

Features about to be Released 

  • Integration of TokuDB and Performance Schema
  • MyRocks integration in Percona Server
  • MySQL Group Replication
  • Starting to look towards MySQL 8

Percona XtraBackup 2.4

Overview

  • #1 open source binary hot backup solution for MySQL
  • Alternative to MySQL Enterprise backup
  • Parallel backups, incremental backups, streaming, encryption
  • Supports MySQL, MariaDB, Percona Server

New Features

  • Support SHA256 passwords and secure connection to server
  • Improved Security (CVE-2016-6225)
  • Wrong passphrase detection

Percona Toolkit

Overview

  • “Swiss Army Knife” of tools
  • Helps DBAs be more efficient
  • Helps DBAs make fewer mistakes
  • Supports MySQL, MariaDB, Percona Server, Amazon RDS MySQL

New Features

  • Improved fingerprinting in pt-query-digest
  • Pause file for pt-online-schema-change
  • Provide information about transparent huge pages

Coming Soon

  • Working towards Percona Toolkit 3.0 release
  • Comprehensive support for MongoDB
  • New tools are now implemented in Go

Percona Server for MongoDB 3.2

Overview

  • 100% compatible with MongoDB 3.2 Community Edition
  • 100% open source
  • Alternatives for many MongoDB Enterprise features
  • MongoRocks (RocksDB) storage engine
  • Percona Memory Engine

New

  • Percona Server for MongoDB 3.2 – GA
  • Support for MongoRocks storage engine
  • PerconaFT storage engine depreciated
  • Implemented Percona Memory Engine

Coming Soon

  • Percona Server for MongoDB 3.4
  • Fully compatible with MongoDB 3.4 Community Edition
  • Updated RocksDB storage engine
  • Universal hot backup for WiredTiger and MongoRocks
  • Profiling rate limiting (query sampling)

Percona Memory Engine for MongoDB

Benchmarks

Percona Memory Engine for MongoDB® is a 100 percent open source in-memory storage engine for Percona Server for MongoDB.

Based on the in-memory storage engine used in MongoDB Enterprise Edition, WiredTiger, Percona Memory Engine for MongoDB delivers extremely high performance and reduced costs for a variety of use cases, including application cache, sophisticated data manipulation, session management and more.

Below are some benchmarks that we ran to demonstrate Percona Memory Engine’s performance.

Percona Software News and Roadmap Update

WiredTiger vs MongoRocks – write intensive

Percona XtraDB Cluster 5.7

Overview

  • Based on Percona Server 5.7
  • Easiest way to bring HA in your MySQL environment
  • Designed to work well in the cloud
  • Multi-master replication with no conflicts
  • Automatic node provisioning for auto-scaling and self-healing

Goals

  • Brought PXC development in-house to server our customers better
  • Provide complete clustering solution, not set of LEGO pieces
  • Improve usability and ease of use
  • Focus on quality

Highlights

  • Integrated cluster-aware load balancer with ProxySQL
  • Instrumentation with Performance Schema
  • Support for data at rest encryption (InnoDB tablespace encryption)
  • Your data is safe by default with “strict mode” – prevents using features that do not work correctly
  • Integration with Percona Monitoring and Management (PMM)

New in Percona XtraDB Cluster 5.7

  • One option to secure all network communication: pxc-encrypt-cluster-traffic
  • Zero downtime maintenance with ProxySQL and Maintenance Mode

Percona Monitoring and Management

Overview

  • Comprehensive database-focused monitoring
  • 100% open source, roll-your-own solution
  • Easy to install and use
  • Supports MySQL and MongoDB
  • Version 1.0 focuses on trending and query analyses
  • Management features to come

Examples of PMM Screens

What queries are causing the load?

Percona Software News and Roadmap Update

Why are they causing this load?

Percona Software News and Roadmap Update

How to fix them:

Percona Software News and Roadmap Update

System information:

Percona Software News and Roadmap Update

What happens on OS and hardware level:

Percona Software News and Roadmap Update

As well as the database level:

Percona Software News and Roadmap Update

New in Percona Monitoring and Management

  • Continuing to improve and expand dashboards with every release
  • Includes Grafana 4.0 (with basic Alerting)
  • SSL support for server-agent communications
  • Supports authentication for server-agent communication
  • Added support for Amazon RDS
  • Reduced space consumption by using advanced compression

Coming Soon 

  • PMM server available as AMI and Virtual Appliance image
  • Better MongoDB dashboards
  • Continuing work on dashboards Improvement
  • Query analytics application refinements
  • Short term continuing focus on monitoring functionality

Check out the Demo

Percona Live Open Source Database Conference 2017 is right around the corner!

high availibilityThe Percona Live Open Source Database Conference is the premier event for the diverse and active open source database community, as well as businesses that develop and use open source database software. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, PostgreSQL and other open source databases. Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience – all to help you learn how to tackle your open source database challenges in a whole new way

This conference has something for everyone!

Register now and get the early bird rate, but hurry prices go up Jan 31st.

Sponsorship opportunities are available as well. Become a Percona Live Sponsor, download the prospectus here.

 

The post Percona Software News and Roadmap Update with CEO Peter Zaitsev: Q1 2017 appeared first on Percona Database Performance Blog.

Installing Percona Monitoring and Management (PMM) for the First Time

$
0
0
Percona Monitoring and Management 2

Percona Monitoring and ManagementThis blog post is another in the series on the Percona Server for MongoDB 3.4 bundle release. This post is meant to walk a prospective user through the benefits of Percona Monitoring and Management (PMM), how it’s architected and the simple install process. By the end of this post, you should have a good idea of what PMM is, where it can add value in your environment and how you can get PMM going quickly.

Percona Monitoring and Management (PMM) is Percona’s open-source tool for monitoring and alerting on database performance and the components that contribute to it. PMM monitors MySQL (Percona Server and MySQL CE), Amazon RDS/Aurora, MongoDB (Percona Server and MongoDB CE), Percona XtraDB/Galera Cluster, ProxySQL, and Linux.

What is it?

Percona Monitoring and Management is an amalgamation of exciting, best in class, open-source tools and Percona “engineering wizardry,” designed to make it easier to monitor and manage your environment. The real value to our users is the amount of time we’ve spent integrating the tools, plus the pre-built dashboards we’ve constructed that leverage the ten years of performance optimization experience. What you get is a tool that is ready to go out of the box, and installs in minutes. If you’re still not convinced, like ALL Percona software it’s completely FREE!

Sound good? I can hear you nodding your head. Let’s take a quick look at the architecture.

What’s it made of?

PMM, at a high-level, is made up of two basic components: the client and the server. The PMM Client is installed on the database servers themselves and is used to collect metrics. The client contains technology specific exporters (which collect and export data), and an “admin interface” (which makes the management of the PMM platform very simple). The PMM server is a “pre-integrated unit” (Docker, VM or AWS AMI) that contains four components that gather the metrics from the exporters on the PMM client(s). The PMM server contains Consul, Grafana, Prometheus and a Query Analytics Engine that Percona has developed. Here is a graphic from the architecture section of our documentation. In order to keep this post to a manageable length, please refer to that page if you’d like a more “in-depth” explanation.

How do I use it?

PMM is very easy to access once it has been installed (more on the install process below). You will simply open up the web browser of your choice and connect to the PMM Landing Page by typing

http://<ip_address_of _PMM_server>
. That takes you to the PMM landing page, where you can access all of PMM’s tools. If you’d like to get a look into the user experience, we’ve set up a great demo site so you can easily test it out.

Where should I use it?

There’s a good chance that you already have a monitoring/alerting platform for your production workloads. If not, you should set one up immediately and start analyzing trends in your environment. If you’re confident in your production monitoring solution, there is still a use for PMM in an often overlooked area: development and testing.

When speaking with users, we often hear that their development and test environments run their most demanding workloads. This is often due to stress testing and benchmarking. The goal of these workloads is usually to break something. This allows you to set expectations for normal, and thus abnormal, behavior in your production environment. Once you have a good idea of what’s “normal” and the critical factors involved, you can alert around those parameters to identify “abnormal” patterns before they cause user issues in production. The reason that monitoring is critical in your dev/test environment(s) is that you want to easily spot inflection points in your workload, which signal impending disaster. Dashboards are the easiest way for humans to consume and analyze this data.

Are you sold? Let’s get to the easiest part: installation.

How do you install it?

PMM is very easy to install and configure for two main reasons. The first is that the components (mentioned above) take some time to install, so we spent the time to integrate everything and ship it as a unit: one server install and a client install per host. The second is that we’re targeting customers looking to monitor MySQL and MongoDB installations for high-availability and performance. The fact that it’s a targeted solution makes pre-configuring it to monitor for best practices much easier. I believe we’ve all seen a particular solution that tries to do a little of everything, and thus actually does no particular thing well. This is the type of tool that we DO NOT want PMM to be. Now, onto the installation procedure.

There are four basic steps to get PMM monitoring your infrastructure. I do not want to recreate the Deployment Guide in order to maintain the future relevancy of this post. However, I’ll link to the relevant sections of the documentation so you can cut to the chase. Also, underneath each step, I’ll list some key takeaways that will save you time now and in the future.

  1. Install the integrated PMM server in the flavor of your choice (Docker, VM or AWS AMI)
    1. Percona recommends Docker to deploy PMM server as of v1.1
      1. As of right now, using Docker will make the PMM server upgrade experience seamless.
      2. Using the default version of Docker from your package manager may cause unexpected behavior. We recommend using the latest stable version from Docker’s repositories (instructions from Docker).
    2. PMM server AMI and VM are “experimental” in PMM v1.1
    3. When you open the “Metrics Monitor” for the first time, it will ask for credentials (user: admin pwd: admin).
  2. Install the PMM client on every database instance that you want to monitor.
    1. Install with your package manager for easier upgrades when a new version of PMM is released.
  3. Connect the PMM client to the PMM Server.
    1. Think of this step as sending configuration information from the client to the server. This means you are telling the client the address of the PMM server, not the other way around.
  4. Start data collection services on the PMM client.
    1. Collection services are enabled per database technology (MySQL, MongoDB, ProxySQL, etc.) on each database host.
    2. Make sure to set permissions for PMM client to monitor the database: Cannot connect to MySQL: Error 1045: Access denied for user ‘jon’@’localhost’ (using password: NO)
      1. Setting proper credentials uses this syntax sudo pmm-admin add <service_type> –user xxxx –password xxxx
    3. There’s good information about PMM client options in the “Managing PMM Client” section of the documentation for advanced configurations/troubleshooting.

What’s next?

That’s really up to you, and what makes sense for your needs. However, here are a few suggestions to get the most out of PMM.

  1. Set up alerting in Grafana on the PMM server. This is still an experimental function in Grafana, but it works. I’d start with Barrett Chambers’ post on setting up email alerting, and refine it with  Peter Zaitsev’s post.
  2. Set up more hosts to test the full functionality of PMM. We have completely free, high-performance versions of MySQL, MongoDB, Percona XtraDB Cluster (PXC) and ProxySQL (for MySQL proxy/load balancing).
  3. Start load testing the database with benchmarking tools to build your troubleshooting skills. Try to break something to learn what troubling trends look like. When you find them, set up alerts to give you enough time to fix them.

The post Installing Percona Monitoring and Management (PMM) for the First Time appeared first on Percona Database Performance Blog.

Viewing all 110 articles
Browse latest View live