Great Strategies for Using Memcached and MySQL Better Together

The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use.

What is Memcached?

Memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached runs on Unix, Windows and MacOS and is distributed under a permissive free software license.

Memcached’s APIs provide a giant hash table distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order. Applications using Memcached typically layer requests and additions into core before falling back on a slower backing store, such as a database.

Architecture

The system uses a client–server architecture. The servers maintain a key–value associative array; the clients populate this array and query it. Keys are up to 250 bytes long and values can be at most 1 megabyte large.

Clients use client side libraries to contact the servers which, by default, expose their service at port 11211. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client’s library first computes a hash of the key to determine the server that will be used. Then it contacts that server. The server will compute a second hash of the key to determine where to store or read the corresponding value.

The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. A Memcached-protocol compatible product known as MemcacheDB provides persistent storage. There is also a solution called Membase from NorthScale that provides persistence, replication and clustering.

If all client libraries use the same hashing algorithm to determine servers, then clients can read each other’s cached data; this is obviously desirable.

A typical deployment will have several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server.

Security

Most deployments of Memcached exist within trusted networks where clients may freely connect to any server. There are cases, however, where Memcached is deployed in untrusted networks or where administrators would like to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional SASL authentication support. The SASL support requires the binary protocol.

Example code

Note that all functions described on this page are pseudocode only. Memcached calls and programming languages may vary based on the API used.

Converting a database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:

function get_foo(int userid) {
result = db_select(“SELECT * FROM users WHERE userid = ?”, userid);
return result;
}

After conversion to Memcached, the same call might look like the following

function get_foo(int userid) {
/* first try the cache */
data = memcached_fetch(“userrow:” + userid);
if (!data) {
/* not found : request database */
data = db_select(“SELECT * FROM users WHERE userid = ?”, userid);
/* then store in cache until next get */
memcached_add(“userrow:” + userid,  data);
}
return data;
}

The server would first check whether a Memcached value with the unique key “userrow:userid” exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.

However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached “view” of the data would become out of date. Therefore, in addition to creating an “add” call, an update call would be also needed, using the Memcached set function.

function update_foo(int userid, string dbUpdateString) {
/* first update database */
result = db_execute(dbUpdateString);
if (result) {
/* database update successful : fetch data to be stored in cache */
data = db_select(“SELECT * FROM users WHERE userid = ?”, userid);
/* last line could also look like   data = createDataFromDBString(dbUpdateString);   */
/* then store in cache until next get */
memcached_set(“userrow:” + userid, data);
}
}

This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.

Memcached and MySQL Go Better Together

There’s a little embrace and extend in the webinar as MySQL cluster is presented several times as doing much the same job as memcached, but more reliably. However, the recommended approach for using memcached and MySQL is:

1. Write scale the database by sharding. Partition data across multiple servers so more data can be written in parallel. This avoids a single server becoming the bottleneck.

2. Front MySQL with a memcached farm to scale reads. Applications access memcached first for data and if the data is not in memcached then the application tries the database. This removes a great deal of the load on a database so it can continue to perform it’s transactional duties for writes. In this architecture the database is still the system of record for the true value of data.

3. Use MySQL replication for reliability and read query scaling. There’s an effective limit to the number of slaves that can be supported so just adding slaves won’t work as scaling strategy for larger sites.

Using this approach you get scalable reads and writes along with high availability.

Given that MySQL has a cache, why is memcached needed at all?

1. The MySQL cache is associated with just one instance. This limits the cache to the maximum address of one server. If your system is larger than the memory for one server then using the MySQL cache won’t work. And if the same object is read from another instance its not cached.
2. The query cache invalidates on writes. You build up all that cache and it goes away when someone writes to it. Your cache may not be much of a cache at all depending on usage patterns.
3. The query cache is row based. Memcached can cache any type of data you want and it isn’t limited to caching database rows. Memcached can cache complex complex objects that are directly usable without a join.

Thanks
Manoj Chauhan

This entry was posted on Monday, June 28th, 2010 at 1:20 pm and is filed under Apache, CentOS, HTTP accelerators, Linux, Memcache. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Great Strategies for Using Memcached and MySQL Better Together”

  1. Jetta Godown Says:

    Great info! I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

  2. Caracciolo Says:

    I’m intrigued in paying for one way link on this blog. If possible contact me personally with the rate. Appreciate it.

  3. bad credit loans Says:

    Hello, Very neat and useful information.

  4. Trey Dimarino Says:

    Thanks for this usefull information.

Leave a Reply