After months of hard work, the new phillymetal.com is online. Go check it out. Lauren and I are extremely proud of it and feedback so far has been great. We haven’t made an official announcement about it through the email list or even on our personal Facebook pages because we want to make sure all the bugs are ironed out and the server is stable, but it’s not like it’s a secret so there’s no harm in writing about the experience.

As I detailed in my last post, the site used to be a giant mess of PHP code drawing from MySQL. It was, to put it professionally, a total clusterfuck and every time I go through the code or the database, I think it’s a marvel that it was so solid and actually worked. Of course, a big part of that was the simplicity of the backend and a lack of anything even remotely complicated as far as storing data was concerned. On the Shows side, almost nothing was truly relational other than an individual show and the bands booked to play it; however, each show’s bands were independent from one another, so a band booked on 50 shows had their information in the database 50 times. This made for some fast queries — the site was blazing — but that’s about all I can say about it.

We decided to move to Rails and Neo4j.rb because that’s what we’re using for a bigger project. In a sense, Phillymetal.com is our exhibition round. I was especially curious about the hosting. Would I be able to get away with modest hardware or would I need a beast of a machine (or machines?) just to function? Would backup and restore be tricky? Would it be stable? Would updates be a hassle? All of the material I could find on Neo4j.rb and Rails dealt with usage focused on development. I still haven’t found anything indicating that anyone else is even using this combination in a production website!

I learned all these things and more. In fact, I learned so much that I want to share this experience in case anyone else is interested about doing something similar. To be clear, this wasn’t my first Rails project, just my first app that uses Neo4j.

First, some basics. We are running Rails 4.0.3, JRuby 1.7.10, and Neo4j.rb 2.3. It’s very important that you use the latest commit to the Rails 4 branch from the Neo4j.rb Github in your gemfile, not the latest official release. Also make sure to explicitly include the latest commit to the Neo4j wrapper at its Github page for a critical bugfix that kills certain Rails form submissions. (I fixed it, you’re welcome.)

Our database is very small, only about 50MB and a couple hundred thousand nodes. As a result, we were able to get away with an absolutely tiny server from DigitalOcean with 2GB of RAM. Performance tuning was a huge part of that, though, and we’ll get to that in a minute, but your mileage may vary when it comes to what kind of hardware you’ll be able to get away with.

Now, without further ado, my lessons learned from migrating from PHP/MySQL to Rails/Neo4j.

1: Test everything

All Rails articles you read tell you how important testing is. Of course, coming from PHP, I had a lot of bad habits that came from having no standards, no guidelines… no rails, basically. I was also able to just pop into my server and fix code on the fly whenever something went wrong. With Rails, especially using JRuby, pushing updates is a bit of a pain because of the compilation and Java app deployment processes, so testing is not just good for the site’s stability but also for your own time. I don’t have the time to be unsure, so I need tests.

2: Just because some queries are easy, it doesn’t mean those queries are always cheap

If you’re on a small (or nonexistent, in my case) budget, you need to be very concerned with squeezing performance out of cheap hardware. Neo4j makes it very easy to go wild with relational queries, since its whole mantra focuses on blazing fast retrieval and logical organization of data and all that stuff, but that doesn’t mean you should always try to use those abilities.

Case in point: the “last post” and “replies” columns on this dumb page. When I first put this together, I was calculating those fields dynamically for each post. Rails and Neo4j make it easy to do that: Last post was a matter of topic.posts.to_a.last.poster.username and replies was just topic.posts.count – 1. Did it work? Of course. Was it fast? Yes… sort of… in small doses. Counting of posts wasn’t much a problem, Neo4j and Rails do that easily, but figuring out the last poster’s username got sort of expensive on budget hardware, and that is really the key here: budget hardware. In my tests, the more power I gave it, the better it worked, but I didn’t want to spend any more money on this than I had to.

More importantly, why did I think it was necessary to calculate those things on the fly? Just because I could? Those things changed so infrequently that, in the end, I decided to store them as properties on the Topic model itself and just update them when they changed. It may be a bit less high tech but it’s better for performance, and at the end of the day, I decided to prioritize user experience over exploiting every possible capability of my technology.

3: Cache to save cash

When I started building this, I was using Rails 3.2. Rails 4 seemed kind of interesting but nothing really made me feel like I had to upgrade immediately. It also didn’t help that I wasn’t the biggest fan of moving mass-assignment security out of models and into controllers. The protected_attributes gem didn’t work with Neo4j.rb’s Rails 4 branch so I’d have been forced to use strong_params, which I wasn’t ready to do.

What changed all that? Rails 4’s cache improvements. Even though I could have included the gem that provided that functionality to Rails 3, I figured it was a good enough reason to make the jump to Rails 4. After a failed attempt at making protected_attributes work with Neo4j.rb’s Rails 4 branch, I even upgraded that side of my code, too. (I still don’t love it.)

One of the problems that took me far too long to actually diagnose was that Neo4j.rb 2.3 models didn’t have the cache_key method required for cache digests to work properly. The reason I said to use the latest commit to the Rails 4 branch at the beginning of this post is that it includes my fix for this, which is based off the Mongoid implementation. With that in place, cache digests work great with Rails 4 and Neo4j.rb!

I cache all of the show information at https://phillymetal.com/shows. It is critical to the performance of the page. In development, I found that it wasn’t the queries themselves that were terrible — Neo4j really does move quickly — it was drawing the partial. I store a lot of show information in the relationships themselves; in particular, the band descriptions and links go in the relationship if the show promoter wants information different from the bands’ defaults in the database. Because of that, the server has to look at each band, compare its relationship description to default description, and present whichever is appropriate. This is fast when you’re looking at a single show, less fast when you’re looking at dozens of shows, and even worse when it’s on a public page that has multiple concurrent sessions. More importantly, it is information that is the same literally every time it is pulled up, so it belongs in a cache.

If you are using Torquebox, as I recommend through the rest of this post, you can enable the Torquebox cache in production.rb by simply setting **config.cache_store = :torquebox_store** and calling it a day.

4: Build solid admin tools

I was spoiled by PHPMyAdmin. Because Phillymetal.com is a small site that doesn’t do very much and isn’t very needy, I got used to performing certain tasks directly from the database. For instance, on the rare occasion that a discussion topic needed to be deleted, I’d do it from there. User needed to be banned? Database. IP lookup for a problem user? Database. Owner of a show? Database. At the worst, there was actually no password reset function built into the site. I would change a dummy user’s password, copy the password hash to the user requesting the reset, inform the user, and then change my dummy password back. Wow.

Neo4j makes that impossible. Not only is there a bug that prevents the Neo4j admin from working with my combination of Neo4j.rb, JRuby, and Rails, I wouldn’t be able to make changes as easily as I had in the past even if I wanted to because of the way data is organized. This is fine by me, though, since the site really did need admin tools (and a freaking password reset… holy shit, man! It’s 2014, come on!) and Rails made it easy enough to build them. Still, if you’re a solo admin running a small site, set aside some time to build admin tools for your management tasks that used to be handled directly in the database. You will not have the easy access to the database that you are used to.

5: Bone up on Linux, you’re going to be doing everything yourself

There is no Heroku, there is only you. Neo4j.rb 2.3 uses Neo4j embedded and is therefore incompatible with the most popular PaaS out there. Torquebox and JBoss are supported by OpenShift but if you’re going to take the time to learn that and you have budget concerns, you might as well save some money and get smarter by learning to do it yourself.

There is an excellent, quick walkthrough on AmberBit that takes you through installing Torquebox and deploying with Capistrano. Some changes you should make:

Use the latest version of Torquebox. As of the writing of this post, it was 3.1.0. I had some issues with that version on my first deployment, email me if your site doesn’t load — you may need to make a custom .knob file.

The upstart task needs to be modified for Ubuntu 12.04. Open /etc/init/torquebox and do this:

#start on started network-services

#stop on stopped network-services

start on runlevel [2345]

stop on runlevel [016]

Also do this for Neo4j:

#limit nofile 4096 4096

limit nofile 40000 40000

Make a folder called db in /home/torquebox/shared and in deploy.rb, modify your :finalize_update task with this:

run “rm -r #{release_path}/db”

run “ln -nfs #{shared_path}/db #{release_path}/”

I also modified it so the log file would use the shared path instead of the release path.

5b: Setup your backup

This is part of knowing Linux but it’s so important that I want to highlight it separately.

You need to configure a backup script for Neo4j since it can’t be copied while the server is running. This is actually extremely easy as long as you’re running the Enterprise version, which you can legally do as long as you have a license. If you’re a solo or small team of developers that meet the criteria, you can get a license for free, just register.

Include the neo4j-advanced and neo4j-enterprise gems in Gemfile.

Add the following lines to application.rb: 

config.neo4j[‘online_backup_enabled’]=true

config.neo4j[‘online_backup_server’]=’127.0.0.1:6362′**

**

The first one is self-explanatory. The second one is necessary because if you don’t explicitly tell it what IP to listen on, it will bind to 0.0.0.0 and allow literally anyone to run backups of your database over the internet. This feels like a terrible default, I hope it gets cleared up in the future!

When you register, Neo will send you to link for the latest version but we don’t want that, we want the version that matches the Neo4j embedded in our app. Change the filename in their link 1.9.5, same as Neo4j.rb is running, and you can download that version. Save it to your server and inside of /bin/ you’ll find neo4j-backup.sh. All you need to do now is write a script that will perform your backup. I find that there’s some sort of bug that prevents incremental backups from working correctly so for now, you’ll need to clear the directory every time the script runs. Here’s my very barebones script:

rm -r pm-backup

/root/neo4j-enterprise-1.9.5/bin/neo4j-backup -from single://127.0.0.1 -to /root/pm-backup

rm pm-backup.zip

zip -r pm-backup.zip /root/pm-backup

Cron runs it nightly, DigitalOcean takes a snapshot of my server nightly, I sleep soundly.

6: You must take your site down to update code

This is easily my least favorite part of this entire setup. Torquebox supports no-downtime updates of Java apps but since all releases share the same database and only one app can have the embedded DB open at a time, you have no choice but to stop your entire site every time you want to update. The way around this is with a Neo4j cluster, but if our goal here is to minimize hosting cost, this isn’t an option. Plan your maintenance windows carefully and get your management tools in place to minimize reboots.

If you do need to reboot, you can do it quickly by managing it carefully. I do mine in stages.

Every update starts by deploying with Capistrano. I modified my deploy.rb so it doesn’t try to restart the server, meaning Capistrano is basically staging the update files. This is useful if you want to automate this process, maybe by having a cron job that restarts your application server nightly. You can deploy at any time and know your updates will be processed later. (But that’s not what I do, so let’s keep going.)

Next, I have an Nginx site defined in /etc/nginx/sites-available that just loads a basic Offline message. When I’m ready to restart Torquebox, I run a script to unlink the live site and link the maintenance site conf files in Nginx, reload Nginx conf, and stops Torquebox. After that, I manually run service torquebox start and then tail /var/log/torquebox/torquebox.log -f until I see that it’s fully started. I run the pm-online.sh script to unlink my maintenance site, link the production site, reload Nginx, and I’m back up! The whole process, excluding the deployment, takes less than a minute. I  had two occasions in the past week where Torquebox gave an error at load, so following the log file is something I always recommend just in case.

Of course, I recognize that this is far from ideal. In a busier site, a Neo4j cluster would be crucial to prevent downtime. Thankfully, for small sites like this, 60 seconds of downtime in the middle of the night or for emergency maintenance (I discovered the public Neo4j backup port issue while writing this article on a Sunday afternoon and had to patch it immediately!) is acceptable.

7: Familiarize yourself with Java application deployment and management concepts

You may think of yourself as a Rails developer or a Linux administrator but as soon as you start using Torquebox, you are also a Java app server admin. Aside from the server restarting process, dealing with this was my least favorite part of this entire project, mostly because there are not many resources out there for people who are getting started with it. Spend some time reading through all of the Torquebox documentation very carefully. You should also buy Deploying with JRuby by Joe Kutner, available here. Even though it’s a little out of date at this point, Joe’s book makes a clear case for why Torquebox is the way to go. Be careful when reading through his sections on TorqueBox jobs, since newer versions handle them quite a bit different than his instructions. The basics remain the same and as an introduction to the benefits of Torquebox, it’s a great start.

One crucial part of tuning your environment is giving TorqueBox enough RAM to work with. Out of the box, its RAM defaults are very low, so tuning them for your system by opening /opt/torquebox/jboss/bin/standalone.conf and finding this line:

# Specify options to pass to the Java VM.

A few lines below that, you’ll see JAVA_OPTS=”-Xms[SOMETHING]m -Xmx[SOMETHING]m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true”. I don’t have the default one sitting around so I can’t tell you exactly what it says but you’re going to want to set those numbers to values appropriate for your environment. There is A LOT of information out there on setting these values for Java apps but in my testing, I found that setting Xms and Xmx to the same value gives me the best performance. In my environment, that line looks like this:

JAVA_OPTS=”-Xms1792m -Xmx1792m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true”

You want to make sure that Xmx + MaxPermSize does not exceed the maximum amount of RAM in your system. I have 2GB of RAM, so I’m probably cutting it a little close here. Neo4j will consume more RAM the longer it has been running and Xmx will define the upper limit for Torquebox and the processes it spawns. The larger your database is, the more RAM it will require. Despite my DB only consuming 50MB on disk, I find that Torquebox consistently uses at least 1.2GB of RAM. Testing also revealed that giving it much more than that is a waste, it just doesn’t grow large enough.

If you want to know more about how Java uses RAM, read up on garbage collection… but don’t go crazy with it. I spent days trying to make garbage collection faster and less frequent when the real solution was caching and smarter use of my database, as described above; still, it’s good to know. My favorite resources were herehere, and here.

8: All jobs must be run from within the app

What exactly do I mean? I mean that any cron or Torquebox job must interact with your webserver to perform work, not access the database or Rails directly, because you can only have one running instance of Neo4j embedded without a cluster.

In my case, I have a few different jobs, but one of them syncs Facebook stats for listed events every hour and saves the retrieved information to the Show nodes. If I was using PostgreSQL, I would just have a Torquebox job that executed Show.fb_sync. That is not an option here, though, so I POST a particular form to a particular path that, when received by the controller, executes Show.fb_sync from within the app. I do this sort of thing for any job that requires interaction with the database. Not terrible or truly shocking but it’s the sort of workaround you have to be prepared to implement.

9: Neo4j.rb and Rails are a perfectly reasonable pairing to consider instead of PostgreSQL and Rails for small projects

This really was the most important thing I learned and, really, what I hoped to discover at the end of this journey. My fear was that the lack of small projects discussing their experiences with Neo4j.rb was due to the fact that the hosting requirements in particular were too complicated or costly. In reality, anyone looking for the benefits of NoSQL without losing relational flexibility should consider the Neo4j.rb/Rails combination without fear.

So there you have it! You may have noticed that not much of this was really specific to Neo4j.rb; in fact, most of these lessons were really best-practices web development: don’t overuse queries, cache wherever you can, build proper administrative tools, test everything. Neo4j.rb + Rails + JRuby don’t require you to reinvent the wheel, they just force you to kinda… use a different kind of wheel, I guess. My experience made me feel a lot more confident in the technology going forward. Neo4j.rb 3.0 will allow developers to use the Neo4j REST API, removing the requirement of JRuby, and opening the door for deployment to Heroku. I’m looking forward to it reaching the point where I can use it for my projects but I’m not sure sure I’m curious to see if there is a noticeable performance tradeoff as a result. Time will tell. Until then, I’ve found that doing things this way isn’t terrible and I’m looking forward to my next project.