[this review was written for my buddy John, a BigData speed enthusiast who couldn't make the Aparapi presentation]


Your graphics card might have 1200, even 3600 processors. (SIMD). Using a JNI layer to OpenCL and lots of nasty boilerplate, or alternately using APARAPI and straight java, you may be able to speed up high compute low data operations by sending to these SIMD processors. The gains might be insanely disproportionate to the effort required to do this.

It's not a slam dunk - not all code can be written for parallel execution, and your code WILL require a rewrite.

But if you've ever watched Conway's Game of Life sped up 36x in real time, it's a fun thing to consider.

Aparapi Presentation By Gary Frost: A Review:

Easily a top ten in the 500 or so presenters I have seen at my local Java User Group over the past 10+ years. If you're looking for someone to fly in for a superb technical presentation, this is the one.

This is AMD funded, open sourced project linked here and Gary Frost is it's creator, an employee of AMD.

The Aparapi website gets technical before you even understand the bottom line - but Gary's presentation brings it all into clear focus. (I had to ask Gary what a GPU was.)

Misc Aparapi Notes:

  • This is for special functions only, not modules or entire classes.
  • Not mutually exclusive with Hadoop, MapReduce etc. If you're a BigData guy, think of the gains from each as additive, and the effort as synergistic.
  • Lots of compensating costs and issues to consider and design around. Too many to list here, but if your code fits a certain profile and it is written correctly, you are golden. 
    • Best fit for this code which does heavy computations on large arrays of 32 bit and 64 bit primitives in an un-ordered fashion.
  • Branching is generally problematic.
    • Not that it can be avoided, you just have to design accordingly.
  • 2 dimensional arrays are inherently problematical in this paradigm because of the way java is written.  
  • Thinking in Parallel Execution is not OO-natural.
    • This has nothing to do with Java per se, it's a different way to think
  • There are some serious optimizations you need to consider, especially if you are using something like MapReduce
    • Example you might even fork run throwaway cycles and pick from them at a join.
      • Hardly sensible practice in normal OO.
      • For example
        • an xml parser.
        • regular expression parser
  • Gary has asked me to clarify that 300x may be towards the outer reaches of possible speed gains. I've seen the examples running, so for me the distinction is less than meaningful - this stuff is pretty impressive, compared to other alternatives one might have, such as throwing Hadoop and/or hardware at the problem.

Follow Up from Gary Frost:

Why take my word for it? Here is the link to the svn with sample code:


The Austin Java User's Group put up the presentation slides, if you want to go over them.

Note to reader: This blog may be revised over time if I am advised about incorrectly made statements or other information.