Maven tips and tricks

November 5, 2009

Here’s a couple of maven commands that are handing when it comes to debugging build problems.

If you want to see what deps Maven is pulling in (or hand tweak them), then you can use.

mvn dependency:tree -Dverbose

I usually pipe that to less, however, be aware that maven will print out all deps regardless of whether they are going to get used or not. You will need to look at the end of the output which shows the resolved versions that will be used.

If the issue is one of a class not getting loaded correctly, then you can try this

mvn dependency:copy-dependencies  -DoutputDirectory=

then you can add them to your classpath manually and run the application, tweaking as needed to get your intended class to load.

The following will load a custom config for your maven build.

mvn -s ./profile.xml compil


Rails 3 features

November 4, 2009

Yehuda Katz over at Engine Yard has a great post on some of the upcoming changes to rails 3. I’m really excited to see how it ends up. I especially excited to see what performance gains can be achieved with it running on a 1.9 ruby stack. Also of interest is the class level responder addition. That makes the code so much more compact and easy to read.


Setting cookies in NET::HTTP Requests

October 28, 2009

I had a hard time figuring out how to set cookies on a POST request with ruby. The docs are strangely silent on the matter (in fact, the docs for the HTTP::POST are actually empty) .  The docs for http don’t use this form of post. Google was also failing me in my searches, so I’m posting this example here to hopefully help the next poor soul that needs to do this.

Here is the example

  http = Net::HTTP.new(my_hostname, port_number)
  path = '/myform/action'

  # POST
  data =  'field1=#{value1}&field2=#{value2}'

  headers = {
    'Cookie' => 'cookie1=cookie_value1; cookie2=cookie_value2',
    'Referer' => 'referring_host.mydomain.com',
    'Host' => 'referring_host.mydomain.com'
  }

  resp, data = http.post(path, data, headers)
end

The rule of ‘And’

July 23, 2009

I don’t know if someone else has already come up with this, and I’m too lazy to actually use google to look for it, but I’ve come up with a new rule for writing code.

Here’s my thesis. When describing the purpose of a class, you shouldn’t be able to use the word ‘and’.

Here’s the explanation. Any class that you write should have a very small and clearly defined purpose. For example, I was writing a class to wrap the incoming parameters of a rest endpoint.  The problem was, I was trying to do too much with that one class. It was holding the pagination information, the sorting information, and the filtering information.  That’s way too much for one poor class to do.  It started out simple, but as I added the rest of the features from the spec, it got way too bloated. I knew, but tried to force it to work anyway.

In code review, my coworker pointed out that it was bloated, and I should refactor it out.  I knew it already but didn’t want to admit.  With his not so gentle push in the right direction, I spent the next three hours refactoring the code into three seperate classes.  Each class was around 100 lines and served a very defined purpose.  I was able to write unit tests for all three easily to verify that they worked correctly, and as a bonus, I was able to wrap my head around what they were doing much easily.

In the previous bloated object, if someone has asked me for a confidence level of how well it did it’s job across all the possible input domains to it, I’d have said somewhere around 80%. I know it did the right thing in my tests, but my tests are obviously biased.  With the new design, I’d be able to say with a 99% confidence level that each piece does the right thing.  That’s quite an improvement.

Another nice change with the smaller classes, was that it took me all of an hour and a half to get around 90% unit test coverage.  If I’d tried writing unit tests for the heavy version of the object, I’d probably have had to write way too many tests with a lot of overlap and wasted time.

So to recap.  Refactoring my object into three smaller objects resulted in cleaner more managable code.  Better and faster unit test coverage.  The code looked much more pleasing to the eye. Win, win, win.  Ok, the last one isn’t really a win, but it is a nice side effect.


soundex algorithm in C++

July 19, 2009

I couldn’t actually find a C++ version of the soundex algorithm (all the ones I found were C code with a .cpp extension), so I threw one together.
Here it is if anyone is interested.

static char     lookup[] = {
‘0′,    /* A */
‘1′,    /* B */
‘2′,    /* C */
‘3′,    /* D */
‘0′,    /* E */
‘1′,    /* F */
‘2′,    /* G */
‘0′,    /* H */
‘0′,    /* I */
‘2′,    /* J */
‘2′,    /* K */
‘4′,    /* L */
‘5′,    /* M */
‘5′,    /* N */
‘0′,    /* O */
‘1′,    /* P */
‘0′,    /* Q */
‘6′,    /* R */
‘2′,    /* S */
‘3′,    /* T */
‘0′,    /* U */
‘1′,    /* V */
‘0′,    /* W */
‘2′,    /* X */
‘0′,    /* Y */
‘2′,    /* Z */
};

std::string computeSoundex(const std::string &input, const int resultLength){

//keep the first character intact
std::string result = input.substr(0,1);

//compute value for each character thereafter
for(int i=1;i<input.length(); i++){

//skip non-alpha characters
if(!isalpha(input[i])){
continue;
}
//uppercase the input value
const char lookupInput = islower(input[i]) ? toupper(input[i]) : input[i];
//lookup it’s value
const char *lookupVal = &lookup[lookupInput-'A'];

//make sure this isn’t a dupe value
if(result.find(lookupVal, 0) != 0 ){
result.append(lookupVal);
}
}

//make sure we could actually encode something
if(result.length() >= resultLength){
return result.substr(0,resultLength-1);

}

//In cases of empty strings (or strings with no encodable
characters, return Z000
return “Z000″;
}


Redis

April 1, 2009
Redis

Redis

http://code.google.com/p/redis/

I just came across this really cool persistent alternative to memcached.  I haven’t spent too much time looking at the internals, but it accepts write operations and writes then asynchronously. This introduces the possibility of some data loss if you have pending writes, and the machine goes down or the like, but overall, the performance boost and persistence more than make up for it in most use cases.

I’ve played around with it on my macbook pro (very simple to build and get running), and it seems pretty cool.  It supports more features than memcached like, list and set operations (which are atomic), and push/pop operations.  This means that this could be a good candidate for distributed queueing and messaging systems.  Not to mention, it also supports master/slave replication!

Also, currently the ruby client supports consistent hashing (which I haven’t used yet), but that adds a lot to the scalability.  Given the speed (reported at 110,000 SETs/second, 81,000 GETs/second), I can see Redis coming into use in a lot of situations where you don’t need all the overhead and guarantees that a ‘real’ database gives you.

The only downside that I can see at the moment is that it has to read the entire dataset into memory.  That limits the size of your datasets, so estimate your data size before going this route.  Also, another glaring omission is a Java client.  There are some in the works (and I’ve thrown together a simple one), but nothing that is polished and ready to use.


Unicode Characters

March 17, 2009

Someone at work just sent this handy site out on a mailing list. Since I tend to forget this stuff as soon as I’m done troubleshooting a problem, I’m quite happy to have it. I’m posting it here for those poor souls that are trying to troubleshoot character encoding issues
http://www.fileformat.info/info/unicode/char/00df/index.htm

Mat


Premature Optimization

March 12, 2009

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

-Knuth

This post can be summed up in the following sentance.  ‘Make sure you can justify each and every optimization of your code, before you make the change’.  Stated another way ‘If you haven’t profiled your code, then you don’t know what to optimize’.

Now for the justification…

In my career I’ve done everything from QA, to Development, to Developer Support at Borland.  In that time I’ve seen a lot of funky code.  By funky, I don’t mean strange indentation, or variable naming, but code that jumps through convoluted hoops in the name of ‘efficiency’.  I may be overly sensitive to this, as I saw a lot of this when I was working in support (people would contact us for help after they couldn’t fix the bugs in their own code, or sometime ours).

I will say that this type of problem seems to be much more prevalent with C/C++ programmers than Java programmers.  I think this stems from the fact that most Java programmers don’t really think about memory management and performance when coding, but in their defense, they don’t normally have to.  But I digress…

Most of the hackery that I’ve seen in regards to performance isn’t really effective.  Sure, optimizing that long running startup code brings up the program faster, but since that only happens once every N number of hours, it’s not making a big difference overall.  Programmers will see some area in their code where they can use a new and sexy data structure or technique to speed things up, and will do so, justifying it as optimizing.

In almost every case that I’ve seen, programs spend the vast majority of their time in a very small portion of their code.  Optimizing that portion of code, no matter how un-sexy, is where the big gains will come from.  Also, that portion of the code base should have the best set of unit tests, so you have that much more assurance that you haven’t inadvertently introduced a bug.

Also, a lot of times the bottleneck isn’t in your code directly, but in a library or downstream dependency.  If it takes 2 seconds for for your query to return results, then you need to make sure your tables have the correct indexes, add caching, etc.  *D0 Not* fork the hibernate code your using to try and eek out a little more performance.

So please, justify with an estimate of the performance gain of your proposed optimization before you start modifying the code. If you have no idea what the gain will be, then you probably don’t understand how/why the code is slow well enough to start making changes effectively.

Mat


Rails request details

March 6, 2009

I was recently looking around for some details on the way Rails handles requests. Detailed info was actually pretty hard to come by (I didn’t really want to start reading the source code), but I did manage to find this by the guy who started Engineyard.com

Rails Request Handling

Definitely worth a read.


search results tag clouds

March 6, 2009

I just came across this and thought it was really really cool. (Why doesn’t Google do this already?)

search clouds example

search clouds example

This would make the search experience way easier.

Here’s a newcomer to search that doing something similar. I really don’t like the interface though

Quintura.com

The search space has been dominated by Google for quite some time, and innovation has been lacking.  I’d really like to see some fresh ideas being presented.