Repcache & PHP = ‘Sessions on steroids’

When I learnt PHP, I always did think that sessions are a great stuff that guys of PHP has given to us, developers. But, playing with huge traffic websites (mainly when we have to handle multiple front ends), I always perceived it as huge limitation.

What I mean? Simply, sessions by default are saved on the machine that serves php. This means that we need to assure that our incoming users always go to the same web server. We solved that problem (until now) using the expertise of our Systems team, they ‘stick’ our sessions to one server. Great solution but with great problems. If a server falls down (and we know that never happens, jejeje), every single user ‘sticked’ to the server falls down with him.

Our great system guys can’t solve that problem for us because if server falls down and the sessions are stored on that server… well you know.

Our first attempts to solve this problem came from our system guys. They suggest that they can mount a shared disk resource on our network, trying to provide a unique place to store sessions. Unluckily the network resource was not so stable and fast as we think.

I have searched the solution over the network and many people talk about store the sessions on SQL servers, but, honestly, we didn’t wanted to put more trash on that hardly scalable servers.

That issue have been there a lot of time, but a few weeks ago, looking info about memcached (maybe some day I talk about it) I found what seems to be the perfect solution for our old issue. The (unfortunately undocumented) solution was repcache. The guys of KLab have made this simple but beautiful patch to memcached that provide us replication between distributed memcached servers . Obviously (i didn’t mention it before) we can’t use a single server to store that sensible data, because if it falls down every single user on our site will lose his session (we always need replication to provide a safe environment).

Repcache is nice and simple, when somebody perform a ‘SET’ on one of cache servers, that patch automagically put that value on the other server. If you turn off one of that servers the other one have all the values and if you turn on that server again patch fetch all the data from the survivor and everything it’s replicated, safe and nice again.

But the most wonderful thing is that under this patch, there’s a magnific piece of software called memcached. That means probably the most fast and distributed storage system that you may use to save your data.

Without repcache we can’t use memcached directly because (as they say on their site) memcached isn’t designed to have any data redundancy. And that means that we can’t use it to store data that we can’t loose and everything turns down using repcache.

Now I’ll describe our test environment and the process to make those pretty things happen. We have a set of 3 front-ends (apache & php servers to simplify the example) and we gonna use 2 servers to run repcache (in fact, those servers are not exclusively dedicated to repcache, but, that’s not important for that example, and will remove those outsiders from the equation).

step 1: Build repcache

We can download memcached and apply manually the patch but repcached guys give us a fully patched targz where they do everything for us:

wget http://freefr.dl.sourceforge.net/project/repcached/repcached/2.2-1.2.8/memcached-1.2.8-repcached-2.2.tar.gz

tar xfvz memcached-1.2.8-repcached-2.2.tar.gz

cd memcached-1.2.8-repcached-2.2/

./configure –enable-replication

make

The key is “–enable-replication” on configure, this tells builder that shoud add repcache steroids on the package.

Obviously we should build that package on both servers that we dedicate to repcache (note: Maybe you need to install libevent development packages to build repcached if you don’t have it previously installed, “libevent-dev” debian/ubuntu systems).

step 2: Run Repcache

Repcache runs by default on the same port of memcached, but maybe you need that port for one real memcached later (11211), by this reason we gonna use 11311 (there’s no esoteric reason to choose that number, that’s an notice for paranoids), but we shoud know that repcache gonna use 11212 to perform replication from one server to the other. That’s an notice for netstat lovers!

./memcached -d -m 128 -p 11311 -x <IP ADDRESS OF THE OTHER SERVER>

We should run that command on both servers with the only difference on the IP addres of the other server (that’s obvious). Let’s see what means each param :

  • d : Daemonize (run at the background)
  • m : Amount of memory (remember that repcache will allocate that amount of memory)
  • p : The main port of repcache
  • x : IP of the failover repcache server

That’s all folks! Yes, it’s easy as it seems. Now we gonna tweak PHP on each front-end to use that servers.

step 3: Tweak PHP

Obviously that step depends on our distro and where we have php.ini and configuration files, but we will asume that we have an Ubuntu Server (only because that’s what we have). On each front-end let’s execute:

sudo apt-get install php5-memcache

sudo vi /etc/php5/apache2/php.ini (diff format)
=; Handler used to store/retrieve data.
-session.save_handler = files
+; session.save_handler = files
+session.save_handler = memcache
+session.save_path = “tcp://<IP OF REPCACHE 1>:11311, tcp://<IP OF REPCACHE 2>:11311″

sudo vi /etc/php5/apache2/conf.d/memcache.ini (diff format)
=memcache.maxratio=0
+memcache.allow_failover=1

sudo apache2ctl graceful

And that’s all, it seems magic but from that point PHP sessions will be saved on repcache servers. One of the most important params that made it possible is “memcache.allow_failover=1″. Understand what that means it’s like understand everything. Now I’ll try to explain what happens now with sessions.

It works, but ¿What happens behind the curtains?

The main thing to understand is how php gonna save our sessions and why that way of save it makes that process so simple as it is.
First PHP generate a session (we don’t care how) and choose a server (from a pool of servers) to save that session. Really it’s not random choice, because PHP (and memcache magic) it’s based on the way of that chosen. Imagine (obviously that’s not the method to choose the server, but it’s a clarifying example) that we choose the last bit of our session, that bit should indicate wich server gonna use. It’s simple but wonderful at the same time because we can guarantee that for the same id the choose will gonna be the same. Maybe you’re thinking that’s not so important because now we have the data on each repcache server replicated, but remember that we need to recover the same value when we get it again, and what’s the best tecnique to achieve that? look at the same place where we save it (we are avoiding possible problems with repcache replication, repcache it’s very fast, but, it’s an extra security feature that we can’t refuse).

Ok, ok, but wha happens if one of those repcache servers falls down? Nothing because of allow_failover param, set that param to 1 says PHP that if he’s not able to reach the server that match with that specific session id, he will try the other one, and save that new possition for next attempts (and try it later, but not on the next access to that data). That close the circle.
Now it’s time to stop and start repcache server to test that sessions are shared between front-ends and are saved against the fail of 1 of 2 repcache servers.

That made our system robust, and open us a lot of new possibilities that maybe we never imagine before that. What about use cloud computing for our front-ends? Now, that’s possible because we can start and stop front-ends without thinking on user session because users will be redirect to anothe front-end (on a periods of low-demand) unpainfully. We don’t need sticky session anymore for our front-ends. And our users (and we) will be happy!

Tags: , , ,

 

3 Comments

  1. brokkol says:

    nice document and smart/clean solution.
    i once considered using couchdb/mongodb for this same shared-sessions-storage situation but that won’t be as transparent as this.

  2. Albert Horta says:

    Thanks, I really need this kind of support on my first post XD!
    Yeah, I like your consideration, we took a look over mongodb too but one of the things that made us decide is that we has been need to modify zero code lines (of our app) to make everything work.

  3. well written blog. Im glad that I could find more info on this. thanks

Leave a Reply





 
The NuCaptcha API requires the PHP mcrypt module.