Skip to main content

Do You Really Mind What's Inside Your Computer? (An Introduction to High Performance Computing)

As a programmer, have you ever thought about what is inside your computer or what would be inside your client's computer? I know the answer is "Yes" because any programmer who is willing to deploy his or her product should consider about the platform in order to make sure the product runs smoothly. But I know many programmers cannot promise that they optimally utilize the available computer hardware resources for the software product.

I'm going to remind a few things that a programmer should know when the program is critical in execution time, in other word when a program which is computationally expensive and should be executed in real time. Precisely I'm going to explain how can we get into high performance computing using a few important things we already have inside our computers but may never be used.

What is High Performance Computing (HPC)

According to the web definition HPC is solving advanced computation problems using supercomputers and computer clusters. As we know supercomputers are not with in a reachable domain of a typical programmer and many clients. But still we can experience HPC with our personal computers. Isn't it a surprise that we can bring the power of a super computer or a computer cluster to our personal computers?

Idea Behind HPC

When we need to process a big data or need to do computationally expensive process to small or a big data then we can program a solution which can be executed in parallel. If we have a big data then we can divide it to set of small data and process them in parallel. If we have computationally expensive process then we can divide the process into several sub processes and execute in parallel with sharing the data for each process. Execution in parallel can greatly reduce the execution time of certain applications.

When Do We Need HPC?

People use HPC in many cases. The followings are the major cases where HPC is used.
  1. Having a big data (sized from a few gegabytes or more) to process (Ex: Scientists regularly encounter large data sets in many areas, including meteorology and genomics to analyze)
  2. Having a computationally expensive process and need to be executed within a limited time (Ex: real time  object detection in a video requires many filtering tasks, feature detection tasks and many other calculations which should do over each and every pixels in every frame in the video).
  3. The task includes lot of non-correlated processes and need to be executed within a limited time (Ex: in Google search they extract many information out of our keywords and the way we give the keywords before return the search result)
At a glance it looks like HPC is not required in executing software applications that we use in day to day life. But still we use small scale HPC in modern computer games. We may need HPC in many day to day tasks such as video processing, compressing, video type conversion, image processing, business data analyzing, simulating, 3D image rendering and any photo editing software etc.

What Do We Need to Do HPC

As I described before, typical HPC needs supercomputer or computer clusters. But we can do HPC in several ways without supercomputers or computer clusters. 
  • Using GPGPU (General-Purpose Computing on Graphics Processing Unit)
  • Using Coprocessor devices
  • Using multicore CPU
  • Using set of SIMD (Single Instruction Multiple Data) instructions of CPU
Other than the above mentioned we need to train our mind to think in a new way of programming. It is not surprising that we all used to do sequential programming nowadays. But it is the time we need to think in parallel programming. It is not that easy to program which executes as a combination of large number of parallel processes. Let see the difference between the thinking patterns.
Above image shows 2 ways of calculating the summation of all the elements in an array. It shows how we can reduce the depth of process by executing in parallel. The depth is proportional to the execution time. Hence reducing the depth decreases the execution time.

However most of the real world problems do not fit to the above simple model. They may have multiple dimensions and multiple steps in the process. Most of the time those steps should be executed one after another in orderly manner. Therefore the parallel implementation depends on the nature of the problem.

There are some known techniques to optimize parallel implementations of certain problems and also many ongoing researches that address the optimization of many general type problems.

The main barrier that we may have is not our knowledge since there are several places that we can get the knowledge about HPC. We need one or several above hardware configuration in order to do HPC. I will explain the above hardware features in the next section.

GPGPU (General-Purpose Computing on Graphics Processing Unit)

If you are a gamer or a graphics designer you may already know a lot about modern graphics cards and their capabilities and this may be the best option if you really concern about money. Many of modern graphic cards are capable of handling processes related to not only graphics but also general computing. Popular brands like ATI and NVIDIA have introduced their own technologies to bring high performance computing.

There are a few popular programming languages that are used in general computing with GPGPU. OpenCL is the currently dominant open general-purpose GPU computing language. The dominant proprietary framework is Nvidia's CUDA which works only with NVIDIA hardware.

I am using a few software that has the capability to utilize NVIDIA CUDA. "Any Video Converter" is a video format converter which capable of running highly complex conversion algorithms on GPGPU. The following image shows that the software shows its capability of using NVIDIA CUDA technology. I have experienced a good speed gain for X264/H264 (Comes with MPEG-4) video conversion even with my NVIDIA GT210.

Using NVIDIA CUDA technology can do the conversion processes a few times faster than it runs in conventional way.

Cyberlink Power Drector is a video editing software and it also have the capability of using NVIDIA CUDA. You can see the NVIDIA CUDA logo with PowerDirector.

Even Adobe Photoshop can utilize your GPU.

Not only these software but almost all pro level video editing software have the capability of utilizing GPGPU available in your computer. It might be a good idea to look for a good GPGPU card instead of a bigger RAM or a CPU with many cores if you need to speedup this kind of software.
When ever you use this kind of software try to find whether it has these capabilities and if so you can get the maximum benefit out of your HPC capable graphics card.

Check here to know whether your graphics card has the CUDA capability.

Co-processor Devices

Normally we have co-processors in our computers inbuilt. But here I am concern about another kind of co-processors which is really an additional device that can be attach to our computers. These devices normally contain thousands of cores which can handle thousands of processes in parallel. These devices are bit expensive compared to GPGPUs but they can convert our PC into a supercomputer with suitable software. AMD Stream Processor (aka ATI Firestream) and NVIDIA Tesla are popular co-processors available up to date.

At the glance the above image looks like an NVIDIA graphic card. But if you look at it carefully you can't see a video out port. That is because this device is made for general processing. This single device may contain more than thousand cores. If we need more processing power we can scale up the device by combining more devices. The below image shows an array of tesla cards fitted in a single machine.

The secret of this devices is the principle that "thousands of lite weight processors are better than a few heavy weight processors for most of the applications". The best real world example I have ever heard for explaining this principle is: if a set of 100 people have to travel 100km then they can use either 2-3 buses (analogy to costly and heavy duty processors) or 50 scooters (analogy to low cost lite duty processors), but definitely using 50 scooters will do the job faster than the buses.

To be continued...


Post a Comment

Popular posts from this blog

Sri Lanka Maps in Garmin GPS

Recently I received a Garmin GPS (nuvi 50) from my brother who is studying in China. The GPS looks fine but there are no Sri Lanka base maps installed in it. Then I tried to find a Sri Lanka road map that supports to the device. As I went through the articles I got to know that the format of the maps used in Garmin devices is a proprietary one. The map blocks are archived in to a single file which has the extension ".img" but not similar to DVD or floppy image file.

I found there are three methods to get Sri Lanka map to Garmin devices.

Download from the Garmin map resourcesDownload Sri Lanka maps from UMP (Unofficial Map Project)Download and convert maps from OpenStreetMap  (PS: I found this link of OpenStreetMap which seems to support routable maps and very easy to download maps of any country including Sri Lanka.
The first method is bit expensive and I don't think that it is worth to buy map from Garmin because they don't give enoug…

How to Send Executable (.exe, .ocx, .dll, .com, .bat) Files in Gmail Without Changing the File Extension?

Why Gmail doesn't like exe files? If you use gmail as your email service probably you should be getting frustrated with it when you want to send files with the extensions exe, ocx, dll, com or bat. These executable stands for some files which can be executed independently within a typical operating system and there is a huge probability to contain computer viruses or malware in these types of files. Since these kind of files can be executed independently any virus that the file carried will infect our computers very easily.

Although this is not a problem in other free email services like yahoo, as Google has grabbed a big part from the services which we use for our day to day cyber needs, we can't move in to another service just because of this problem.

What happen when we are trying to upload an executable file in Gmail? When we attach an executable file first it upload the whole file and check on several criteria such as file extension (whether it contains .exe, .dll etc) a…

What minHessian, Octaves and Layers mean in SURF (Speeded-up Robust Feature)? QA

My previous article in this blog is about a discussion on measuring image similarities with BOF in a large database. It is an extracted part from a forum of an article posted in CodeProject "Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)". This article is also an extracted part from the commenting section of the same article in the code project. As I described in my previous article, many people who used visual features do not have a proper understanding over the feature extraction and description algorithms because of these algorithms contain a lot of mathematical procedures which are difficult to understand with an average mathematical knowledge. The question which is about to discuss in this article has proved the above said fact and also the fact may cause the users to limit the usage of such features in their studies and applications.
Lets begin the discussion.

Q. I just wanted to ask why the minHessian value is 400, the number of octaves is 4, and th…