Redis

April 1, 2009
Redis

Redis

http://code.google.com/p/redis/

I just came across this really cool persistent alternative to memcached.  I haven’t spent too much time looking at the internals, but it accepts write operations and writes then asynchronously. This introduces the possibility of some data loss if you have pending writes, and the machine goes down or the like, but overall, the performance boost and persistence more than make up for it in most use cases.

I’ve played around with it on my macbook pro (very simple to build and get running), and it seems pretty cool.  It supports more features than memcached like, list and set operations (which are atomic), and push/pop operations.  This means that this could be a good candidate for distributed queueing and messaging systems.  Not to mention, it also supports master/slave replication!

Also, currently the ruby client supports consistent hashing (which I haven’t used yet), but that adds a lot to the scalability.  Given the speed (reported at 110,000 SETs/second, 81,000 GETs/second), I can see Redis coming into use in a lot of situations where you don’t need all the overhead and guarantees that a ‘real’ database gives you.

The only downside that I can see at the moment is that it has to read the entire dataset into memory.  That limits the size of your datasets, so estimate your data size before going this route.  Also, another glaring omission is a Java client.  There are some in the works (and I’ve thrown together a simple one), but nothing that is polished and ready to use.


Premature Optimization

March 12, 2009

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

-Knuth

This post can be summed up in the following sentance.  ‘Make sure you can justify each and every optimization of your code, before you make the change’.  Stated another way ‘If you haven’t profiled your code, then you don’t know what to optimize’.

Now for the justification…

In my career I’ve done everything from QA, to Development, to Developer Support at Borland.  In that time I’ve seen a lot of funky code.  By funky, I don’t mean strange indentation, or variable naming, but code that jumps through convoluted hoops in the name of ‘efficiency’.  I may be overly sensitive to this, as I saw a lot of this when I was working in support (people would contact us for help after they couldn’t fix the bugs in their own code, or sometime ours).

I will say that this type of problem seems to be much more prevalent with C/C++ programmers than Java programmers.  I think this stems from the fact that most Java programmers don’t really think about memory management and performance when coding, but in their defense, they don’t normally have to.  But I digress…

Most of the hackery that I’ve seen in regards to performance isn’t really effective.  Sure, optimizing that long running startup code brings up the program faster, but since that only happens once every N number of hours, it’s not making a big difference overall.  Programmers will see some area in their code where they can use a new and sexy data structure or technique to speed things up, and will do so, justifying it as optimizing.

In almost every case that I’ve seen, programs spend the vast majority of their time in a very small portion of their code.  Optimizing that portion of code, no matter how un-sexy, is where the big gains will come from.  Also, that portion of the code base should have the best set of unit tests, so you have that much more assurance that you haven’t inadvertently introduced a bug.

Also, a lot of times the bottleneck isn’t in your code directly, but in a library or downstream dependency.  If it takes 2 seconds for for your query to return results, then you need to make sure your tables have the correct indexes, add caching, etc.  *D0 Not* fork the hibernate code your using to try and eek out a little more performance.

So please, justify with an estimate of the performance gain of your proposed optimization before you start modifying the code. If you have no idea what the gain will be, then you probably don’t understand how/why the code is slow well enough to start making changes effectively.

Mat


Java vs C++ Services

February 27, 2009

There are any number of technology stacks that a company can use to base their service oriented architecture (SOA) on, the main ones are Java and C++, with Java having the definite advantage in terms of marketshare.  In fact, Java left C++ in the dust for many years (most notably during the .com boom of the 90′s), and C++ has only now started to catch back up with options like RogueWave’s Hydra offering and some interesting newcomers like http://www.pocomatic.com/ .  Having said that, there are still a host of CORBA frameworks that are still very much alive and kicking.

While the common refrain of ‘no one ever got fired for buying IBM’ may be misappropriated to say ‘no one every got fired for designing in Java’,  I am a firm believer in using the right tool for the job.  Unfortunately, in my experience that is more and more often Java instead of C++ for a couple of reasons. I’ll talk about cost here.

Cost: Java is just cheaper to develop systems in.  Not only in man-hours, but very often in hardware as well (more on this later).  The simpler design of the Java language makes writing tools much much easier (I’ve worked on both Java and C++ IDEs and toolchains and I can tell you that C++ is horrible to write tools for).  While I’m sure many will scoff at the cost difference in man-hours with a refrain of ‘just hire better programmers’, the cost difference is real regardless of how good your software engineers are.

A simple example of this is when I had to change a type a piece of code in C++.  Refactoring mean doing a ‘find . | xargs grep myTypeName‘  . Then editing each of those files with a search and replace in emacs, then rebuilding the code (which took quite a while because the type was used in header files as well as sources), then realizing that I’d forgotten one reference in a header, then recompiling, etc.  That whole process took about 30 minutes to change the code, compile and test that things still worked.

When doing essentially the same thing in Java, I right clicked on the type name, choose refactor from the menu, and changed the type name.  All references to the type were correctly updated.  This cost me all of about 30 seconds of my time.

The time difference outlined above is very real on an organizations bottom line.

The other major cost difference is related to hardware.  Multi-threading in Java is pretty easy.  Why am I talking about multi-threading when I started the paragraph talking about hardware cost?  Let me digress a bit, then I’ll come back and explain the relevance.

The language support for locking in Java is really handy.  C++ on the other hand doesn’t have any language support for threading, so you are stuck writing your own locking mechanisms (which is quite error prone), or using some of the prebuilt locking components available to you.  The problem with these is that native code running on a multiprocessor machine borders on non-deterministic.  That’s why you see idioms like the dbl-check-thread-lock; however, even that isn’t quite thread safe (see this for more gory details on why).

So again, how does this effect cost?  Simple, at Amazon, it was decided that C++ services would be single threaded due to problems like this. This means that once your C++ service is running at capacity, you have to start up another instance of it.  That means you have to have two (or more) completely separate processes running to handle your requests. That adds a lot of overhead, as you are loading system libraries unnecessarily and using more RAM (you can also run into problems with multiple processes accessing shared resources on the disk and creating contention).

Conversely, for our Java services, you just add more RAM to your JVM and spawn some more threads to handle the requests (note, I am talking about the service itself running out of steam, not any dependencies like databases or downstream services).

The yearly cost of hardware for our Java services is about half that of our C++ services.  While that isn’t too much for a single service, when you look at deploying hundreds of services across an organization, that cost savings really does add up (it could easily pay for another engineer or a raise for you, the frugal programmer).


Follow

Get every new post delivered to your Inbox.