Last time we saw how to run some simple code on the gpu. Now let’s look at some particular aspects related to parallel programming we should be aware of. Since gpus are massively parallel processors, you’d expect you could write your kernel code for a single data piece, and by running enough copies of the kernel you’d be maximizing your device’s performance. Well, you’d be wrong! I’m going to focus on the three most obvious issues which could hamper your parallel code’s performance:
- Each of the individual cores is actually a vector processor, which means it can perform an operation on multiple numbers at a time.
- At some point the individual threads might need to write to the same position in memory (i.e. to accumulate a value). To make sure the result is correct, they need to take turns doing it, which means they spend time waiting for each other doing nothing.
- Most code is limited by memory bandwidth, not compute performance. This means that the gpu can’t get the data to the processing cores as fast as they can actually perform the computation required.
Continue reading “Parallel programming with opencl and python: vectors and concurrency”
I developed my cell segmentation project bigcellbrother on linux but it seems all the experimental collaborators use macs. So now that I have something which kind of half works I decided it was time to compile the application for the mac and create a distributable bundle. I used the lovely macports to install all the dependencies I used to create the project (essentially openCV and its dependencies).
The first step was to compile the core part of the application to a shared library, as I’d done on linux. This is easier said than done. Supposedly you only need to add a -dynamiclib flag to the compilation command, but since I am compiling on Snow Leopard there are compiler/architecture issues cropping up at seemingly every step. Apple doesn’t want you to develop with C++, this much is clear. Then I compiled the QT GUI part of the application with qt creator and automagically it created an application bundle with my app. That was the easy bit.
Of course, all the shared libraries the app requires are scattered everywhere throughout the drive. If only I could put them all in the application bundle. This tutorial was helpful, essentially the idea is to toss all dependencies into a Frameworks folder inside the app bundle. You can use otool -L to check which libraries are being called and then install_name_tool to change the paths of the libraries. Of course the shared libraries themselves have dependencies. So this would be a few hours’ worth of trouble if not for macdylbbundler which does this stuff automatically. yay! To bundle the Qt frameworks there’s another program called macdeployqt which takes care of everything and comes with Qt.
Update: some libraries don’t have enough space to change the dependencies’ path and dylibbundler fails to compile. If you run across this, you’ll need to apply a patch to macports and reinstall the affected libraries compiling from source: sudo port -v -s install libawesome.