Sunday, 15 December 2013

WebCL Parallel processing for web

Posted by Mahesh Doijade 4 Comments
WebCL Parallel processing for web, WebCL, OpenCL, GPUs, Parallel Porcessing
    Nowadays from Desktops to Smartphones all are coming with multicore CPUs and Graphics Processing Units (GPUs) which have immense computational capability provided the task is carried out in parallel fashion. GPUs were mostly used for graphics related processing but with the advent of programming languages like CUDA, OpenCL carrying out general purpose computation on them also became possible. Though CUDA is restricted to GPUs, one can use OpenCL to write parallel programs for multicore CPUs, GPUs and some other modern processors too. But the web world involving complex web applications which essentially requires considerable amount of computation in several cases was still not able to leverage this computational capability offered by the technological boost, until WebCL arrived. WebCL is a framework for carrying out heterogeneous parallel computing in web applications/websites.

WebCL Parallel processing for web

 

    WebCL, that is, Web Computing Language is JavaScript binding for OpenCL to leverage heterogeneous parallel computing at client side initiated by Khronos Group. It announced first working draft of WebCL around April, 2012. WebCL makes feasible to carry out computationally intensive applications such as physics engines for games, photos and video editing to name a few in web browser. So, web developers can tap into massive parallel computation capabilities offered by multicore CPUs,  GPUs and future manycore accelerators. Currently, WebCL is not natively supported by web browsers, but one can find its different implementations developed by Nokia, Motorola, Mozilla and Samsung to get essence of it. Its API has very close to resemblance to OpenCL C API so that it is easy for the developer to adopt it.
    Below is sample JavaScript WebCL detection function which can be called like any other JavaScript function, one need to install OpenCL drivers/SDK as well as WebCL browser extensions form one of the sources mentioned above, for instance one can download OpenCL SDK/drivers from here and WebCL extension from here.

function detectWebCL() {
    if (window.WebCL == undefined) {
       alert("Your system does not support WebCL. Please install both OpenCL driver" +
             "and the WebCL browser extension.");
       return false;
    }

    // Get a list of available WebCL platforms, and another list of the
    // available devices on each platform.
    try {
          var devices = [];
          var platforms = WebCL.getPlatformIDs();
          for (var i in platforms)
          {
              var platform = platforms[i];
              devices[i] = platform.getDeviceIDs(WebCL.CL_DEVICE_TYPE_ALL);
          }
          alert("Great! Your system supports WebCL.");
    }
    catch (e)
    {
         alert("Your platform or device inquiry failed.");
    }
}

Read More

Thursday, 25 July 2013

Android 4.3 released by Google

Posted by Mahesh Doijade
Android 4.3 released by Google, android 4.3

   Google announced today an incremental update to Android, by releasing Android 4.3 - an updated version of JellyBean, they call it an even sweeter JellyBean. Its perhaps due to the incredible new features which comes with it. It consists of several performance optimization and other exciting features for users as well as developers. Android 4.3 comes with restricted profile features which can be used for restricting app usage and along with content usage. So for instance, it can be used by a parent to set up profiles for specific family member.
   Also, games will perhaps look much better as android 4.3 comes with OpenGL ES 3.0 resulting into realistic 3D, high performance graphics. It contains Dial Pad Autocomplete, so now just start touching numbers or letters and the dial pad will auto suggest phone numbers or names. To use this feature, open your phone app settings and enable "Dial pad autocomplete".
   It comes with Bluetooth Smart support so your smartphones becomes Bluetooth Smart Ready. Bluetooth AVRCP 1.3 support is also there. There is also this new feature of Location detection through Wi-Fi, so now use Wi-Fi to detect location without turning on Wi-Fi all the time.
   Text input is improved and made easier due an optimized algorithm for tap typing recognition. Also, there is faster user switching, so switching users from the lock screen is faster now. Also there is support for Africaans, Swahili, Zulu, Amharic and Hindi language. 
   And here is a good news for nexus device owners. Google is rolling out Android 4.3 update immediately for Nexus 4, Nexus 7 and Nexus 10.

 

Read More

OpenMP Getting Started

Posted by Mahesh Doijade

 
openmp, OpenMP

What is OpenMP ?

OpenMP is an API for writing portable multi-threaded parallel programs. It consists of  specification of a set of compiler directives, environment variables and library routines carved to enable shared memory programming for C/C++ as well as Fortran programs. OpenMP stands for Open Multi Processing.

• OpenMP is managed by OpenMP Architecture Review Board(ARB).
In most of the cases it is much more easier to program in OpenMP than using POSIX threads(pthreads).
• It needs explicit compiler support, compilers such as GCC 4.2 onwards, Portland Group's pgcc, Solaris cc, Intel’s icc, SGI cc, IBM xlc, etc.

The beauty of OpenMP is that it enables one to carry out incremental parallelization of the sequential code. 


OpenMP Getting Started :

OpenMP is based on fork-join model, as shown in the figure. So, initially you have a single master thread which is doing the sequential part, which is forked into multiple threads whenever the parallel region is encountered. Once the work in the parallel region is accomplished, then the master thread again begins the succeeding work after parallel region.
OpenMP, openmp, OpenMP, openmp, OpenMP, openmp, OpenMP, openmp OpenMP Fork Join, OpenMP Getting Started, OpenMP Fork Join, OpenMP Getting Started, OpenMP Fork Join, OpenMP Getting Started, OpenMP Fork Join, OpenMP Getting Started, OpenMP Fork Join, OpenMP Getting Started, OpenMP Fork Join, OpenMP Getting Started
  • Initially to start with, the parallel region is been specified with the following compiler directive.
          #pragma omp parallel
          {
                 //code for work to be done by each thread to be placed here.
          } 
  • To set the desired number of threads for your code can be done through the library routine omp_set_num_threads() , and another way for setting number of threads is by setting the environment variable OMP_NUM_THREADS to the desired number of threads value.
  • To get how many threads are running in the given parallel region one can use the library routine omp_get_num_threads()  , its return value is number of threads.
  •  Each thread can get its thread ID amongst the team of threads in the parallel region by using the library routine, omp_get_thread_num()
  •  To synchronize the threads to carry out some work which needs mutual exclusion, one can use, 
        #pragma omp critical

        {               // Code which requires mutual exclusion
        }
         Another mechanism is to use library routine omp_set_lock(lock), also if  mutual exclusion is needed for a single instruction then its good to use,  #pragma omp atomic  
Following is the Hello World OpenMP program using C consisting of 4 threads.
#include<stdio.h>
#include<omp.h>

int main()
{
    omp_set_num_threads(4);

    #pragma omp parallel
    {
        printf("\nHello World from thread ID - %d from total_threads - %d\n", omp_get_thread_num(), omp_get_thread_num());
    }
}
For compiling the above code using GCC 4.2 or higher version following command need to be used.

gcc "your_prog_name.c" -fopenmp
The output of this program would be something like
 






 

Read More

Saturday, 20 July 2013

What is Parallel Programming? Why do you need to care about it ?

Posted by Mahesh Doijade

    
what is parallel programming and why you need to care about it?, parallel programming, parallel computing,
        Parallel Programming involves solving a particular problem by dividing it into multiple independent sub-problems, creating multiple execution entities for them thereafter solving them independently through these execution entities onto different processors and establishing communication amongst them for any coordination if required in order accomplish the given problem. So, it is the mechanism for creating programs which enables one to effectively utilize the available computational resources by executing the code simultaneously on several computational nodes. Though there are numerous parallel programming languages available but the dominant ones  are MPI for distributed memory programming and OpenMP for shared memory programming.
           Parallel programming is historical considered to be one of the difficult areas hackers can tackle. As it comes with several issues such as race conditions, deadlock, non-determinism, scaling limits due to inherent sequential parts in the code, to name a few. If you really require good performance in your application then only you need to care about parallel programming, because performance is the crucial factor for going to parallel programming. If you are not concerned with performance then just write your sequential code and be happy, as it will perhaps be much easier and can be get done relatively faster.

              Some of the reasons for which one need to go for parallel programming are :
  • Firstly, as the commodity processors have completely adopted parallel architecture all the processors nowadays are coming multiple computing cores in a single processor, so this means that writing sequential code and waiting for couple of years for CPUs to get faster is not an option now. Thereby, it can be inferred that Parallel Programming have come to mainstream, so in order to make optimal use of the available resources parallel programming is the way to go.
  • Also, efficient parallel programs will implicitly result into minimal time for execution and also will be cost effective.
  • As there are several limitations of sequential computing such as Power Wall, Limits to miniaturization of transistors, to just name a few.
  •  To solve very large problem such as Weather Forecasting, Medical Imaging, Financial and economic modelling, Bio-Informatics, to name a few.., parallel computing is the only way.
Read More

Sunday, 14 July 2013

Nested Parallelism OpenMP example

Posted by Mahesh Doijade
nested parallelism openmp, openmp, openmp parallel

       Nested Parallelism was introduced in OpenMP since OpenMP 2.5. Nested Parallelism enables the programmer to create parallel region within a parallel region itself. So, each thread in the outer parallel region can spawn more number of threads when it encounters the another parallel region. Nested parallelism can be put into effect at runtime by setting various environment variables prior to execution of the program.  It can be used by either setting the environment variable OMP_NESTED as TRUE or the other way is to call the function omp_set_nested() and passing it parameter '1' to enable nested parallelism or '0' to disable nested parallelism. The nested parallelism with OpenMP basic example is given below, compile this code using g++ with -fopenmp flag

#include <iostream>
#include <stdio.h>
#include <omp.h>

using namespace std;

int main()
{
    int number;
    int i;
    omp_set_num_threads(4);
    omp_set_nested(1); // 1 - enables nested parallelism; 0 - disables nested parallelism.

    #pragma omp parallel // parallel region begins
    {       
        printf("outer parallel region Thread ID == %d\n", omp_get_thread_num());
        /*
            Code for work to be done by outer parallel region threads over here.
        */
        #pragma omp parallel num_threads(2) // nested parallel region
        {   
            /*
                Code for work to be done by inner parallel region threads over here.
            */       
            printf("inner parallel region thread id %d\n", omp_get_thread_num());
           
            #pragma omp for
            for(i=0;i<20;i++)
            {
                // Some independent iterative computation to be done.
            }
        }
    }
    return 0;
}

Read More

Friday, 28 June 2013

GnuPG - GPG Encryption and Signing Tutorial

Posted by Mahesh Doijade 2 Comments
Introduction :
            GnuPG is an open-source implementation of the OpenPGP standard. GnuPG enables one to encrypt, sign and also implement various cryptographic algorithms in order to carry out your data and communication in secure manner, it features an all-round key management system as well as access modules for all kinds of public key directories. GnuPG, also known as GPG, GPG is a command line utility which features easy integration with other applications. Several libraries and frontend applications are available for gpg. GPG also provides support for S/MIME, so one can use gpg to carry out secure email communication.

Using gpg :
                    Suppose we have two users viz., userA and userB. userA wants to send some important file "crucial_file.txt" to userB. Then in order to carry out this communication securely, userA need to encrypt this "crucial_file.txt" using some cryptographic technique in this example we use public key cryptography and then send it to userB. This example demonstrates using gpg command line tool in linux.

 Initially both userA and userB need to generate public and private keys for themselves so both of them execute the following command in their respective terminal.
$ gpg --gen-key

Then follow the procedure as asked during the command execution, and please remember your passphrase. You can select what default public key cryptographic algorithm you want, and as regards symmetric cryptography default algorithm used by gpg is CAST5.

To check whether keys are generated by gpg and what all keys are present in the given users ring, a user can do it by executing.
$ gpg --list-keys

Now in order to encrypt the file userA need public key of userB, so userB need to send its public key to userA, also userA sends his public key to userB.
$ gpg --armor --export userB@example.com > userB_pk

So the public key userB is now in userB_pk, same can be done by userA to export his public key to some text file, they can send their respective public key

To import other users public key
$ gpg --import userB_pk

For encryption using gpg at suppose userA end, he need to execute the following command and give receivers public key, in our case suppose userB's
$ gpg --output plain_file_enc --encrypt plain_file.txt

For decryption at receivers end.
$ gpg --output decrypted_plain_file.txt --decrypt plain_file_enc

For signing file with senders private key..
$ gpg --output plain_file_signed --sign plain_file.txt

For just verifying the sender of signed file
$ gpg --verify plain_file_signed

To do signing and then encryption, that is, implementing digital signatures.
1.) signing : $ gpg --output plain_file_signed --sign plain_file.txt
2.) encryption : $ gpg --output plain_file_enc --encrypt plain_file_signed

Decrypting the above generated encrypted data.
1.) decryption : $ gpg --output plain_file_decrypted --decrypt plain_file_enc
2.) decrypt sign : $ gpg --output plain_file_original.txt --decrypt plain_file_decrypted




Read More

Wednesday, 26 June 2013

What is Android ?

Posted by Mahesh Doijade
android

      Android is the most popular mobile operating system in the world used by large number of smartphones, tablets and many more handheld devices.It is based on Linux Kernel. Android is open-source and is released by Google under Apache License.The android open source code and provided licensing allows android software to be modified freely and distributed by mobile device manufacturers, wireless carriers and enthusiastic developers.
      So, the above description gives a brief idea about what is android ?, To dive into android's brief history, It was initially developed by Android Inc. founded in 2003 in Palo Alto, USA. Google later acquired Android Inc. in 2005, thereby making it a completely owned subsidiary of Google. After the initial release there have been large number of updates in the initial version of Android. The current popularity of android is evident from the fact that as of May 2013, around 900 million Android devices have been activated and 48 billion apps have been installed from the Google Play store. The evolution of Android version by version is detailed below:
android
      As Google maintains and develops Android, it is shipped with a number of Google services installed right out of the box. Google Web search, Gmail, Google Maps, and  Google Calendar are all pre-installed, and default web page for the Web browser is google. But as Android been open source, carriers can  modify this according to their convenience. Android is shipped with Google Play store which is an online software repository to obtain android apps. Using Play store Android users can select, and download applications developed by third party developers and use them. Till April 2013, there are around 8.5 lacs+ application, games and widgets available on the play store for users.
      Finally we can conclude that, Android is an incredible platform for consumers and developers. It is in principles contrary to the iPhone in many ways. As the iPhone aims to create the best user experience by restricting hardware and software standards, Android aims in insuring to open up as much of the platform as possible.


  


Read More

Sunday, 23 June 2013

GIT : distributed version control, getting started

Posted by Mahesh Doijade
git tutorial, git distributed version control, git getting started


Git is a distributed version control and Source Code Management (SCM). Git was initially developed by Linus Torvalds for maintaining Linux Kernel development. Git is largely inspired by BitKeeper and Monotone distributed version control. Git is entirely developed in C. Though Git is developed on Linux it is also supported for different operating such as Microsoft Windows, Solaris, Mac OS X.

One can install Git on Redhat/CentOS through yum using the following command:
$ yum install git

On ubuntu and debian based systems  :-->  $ sudo apt-get install git




Creating Git repository :

For creating Git repository, suppose your source code is in some folder named "all_src".
Then you need to go in that directory that is "all_src" in our case type the following command

$ git init  

this will create a git repository in that directory and you can start using other git commands on your created repository.

Basic Branching in Git 

A single git repository can maintain several branches of development. To create a new branch named "experimentation", use
$ git branch experimentation
If you now run
$ git branch
you'll get a list of all existing branches:
  experimentation
* master
The "experimentation" branch is the one you just created, and the "master" branch is a default branch that was created for you automatically. The asterisk marks the branch you are currently on; type
$ git checkout experimentation
to switch to the experimentation branch. Now edit a file, commit the change, and switch back to the master branch:
(edit file)
$ git commit -a
$ git checkout master
Check that the change you made is no longer visible, since it was made on the experimentation branch and you're back on the master branch.
You can make a different change on the master branch:
(edit file)
$ git commit -a
at this point the two branches have diverged, with different changes made in each. To merge the changes made in experimentation into master, run
$ git merge experimentation
If the changes does not conflict, its is done. If there are conflicts, markers will be left in the problematic files showing the conflict;
$ git diff
will show this. Once you've edited the files to resolve the conflicts,
$ git commit -a
will commit the result of the merge. Finally,
$ gitk
will display a good graphical representation of the resulting history.
At this point you could delete the experimentation branch with
$ git branch -d experimentation
This command ensures that the changes in the experimentation branch are already in the current branch.
If you develop on a branch test-idea, then regret it, you can always delete the branch with
$ git branch -D test-idea
Branches are less expensive and easy, so this is a good method to try something out.

These are basics to get started with Git, different features of Git will be posted in forthcoming posts. 
 




     
Read More

Thursday, 20 June 2013

MPI

Posted by Mahesh Doijade
Message Passing Interface, MPI, What is MPI

         

MPI

(Message Passing Interface) is an open standardized message passing library specification to carry out portable message passing on a wide variety of parallel computers and is designed by a wide group of researchers from academia and industry. The first MPI standard draft was prepared in 1992 and after some community reviews MPI 1.0 standard was released in 1994 since then a lot of improvements have been done in the MPI standard, the latest MPI standard is MPI 3.0 released in September 2012.

What MPI is not ?

          MPI is not a programming language nor any compiler specification. Also MPI is not any specific implementation or any software product. MPI is just a message library specification which any MPI implementer can use to develop a message passing library or one can say message passing implementation.

          Various message passing implementations currently exists both in open source as well as commercial vendor provided implementations. Some popular open source MPI implementations are MPICH, OpenMPI, MVAPICH, MPICH is been maintained by Argonne National Laboratory, OpenMPI is been developed and maintained by the open source community whereas MVAPICH is been developed and maintained by Ohio State University and it is a highly optimized implementation for Infiniband, 10Gigabit Ethernet/iWARP and RoCE kind of networks.  All the mentioned open source implementations are available for C, C++ and Fortran. Also, the vendor specific implementations are Intel MPI, HP MPI, SGI MPI etc.
           Following is a minimal MPI program :

#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
{
    MPI_Init( &argc, &argv );
    printf( "Hello, world!\n" );
    MPI_Finalize();
    return 0;
}
          Only whatever happens between MPI_Init() and MPI_Finalize() is been defined by MPI standard.The program below is much better MPI program providing idea about which processes are actually in action by giving its rank through MPI_Comm_rank()  and how many MPI processes are there overall through MPI_Comm_size(). 
    
#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
{
     int rank, size;

    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    printf( “Hello I am %d of %d\n", rank, size );

    MPI_Finalize();

    return 0;
 }           
         In order to run a MPI program, like ones written above, one need to install any of the previously mentioned MPI implementation and can compile it as follows in any linux machine "mpicc 'your_source_code.c'   " and can then run through "mpiexec -np 'no_of_processes_required' 'path_to_your_executable' " command. Suppose by compiled MPI executable is 'a.out' and in my working directory and I want to launch 4 processes then my launching command would be.
mpiexec -np 4 ./a.out  
         


 
Read More

Monday, 17 June 2013

CUDA

Posted by Mahesh Doijade 2 Comments
cuda, CUDA, what is cuda

      CUDA (Compute Unified Device Architecture) is a parallel computing platform and first programming model that enabled high level programming for GPUs (Graphics Processing Units) and thereby made them available for general purpose computing. Previously, if one had to use GPUs for doing general purpose computation then they had to use the Graphics API provided by OpenGL, DirectX and other related graphics APIs and map there computation onto them. All these issues was overcome by CUDA and so now GPUs are also been called GPGPUs that is General Purpose computing on Graphics Processing Units. So writing code in CUDA programmers can speed up their algorithms to a massive extent due to amount of parallelism offered by GPUs.
      One can start programming in CUDA using extensions provided for C, C++ and Fortran. So, NVIDIA’s basic CUDA setup consists of 3 parts: The NVIDIA graphics card drivers, the CUDA SDK, and the CUDA Toolkit. The drivers are available for most Linux distributions as well as for Windows.  For development environment consisting of compiler, tools for debugging and performance optimization one need to download the CUDA Toolkit. For other basic requisite stuff such as sample examples, different basic image processing, primitives for linear algebra operations related libraries based on CUDA one can get it from CUDA SDK(Software Development Kit).  CUDA platform allows creation of very large number of concurrently executed threads at very low system resource cost.
CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda
      
     
    The figure above shows the architecture of a CUDA capable GPU. It consists of multiple numbers of Stream Multiprocessors(SMs) which contains several Streaming Processors (SPs) ( CUDA cores ) in them, both the number of SMs and CUDA cores depends upon the type, generation of the GPU device. The memory hierarchy from low speed to high speed is in this order Global memory(Device memory), Texture memory, Constant memory, Shared memory, Registers. So, the fastest is access from registers and slowest is global memory. Hence, CUDA programmer need to write their code considering this hierarchy in order to gain maximum performance benefit.
     
Read More

Sunday, 16 June 2013

OpenCL example

Posted by Mahesh Doijade


what is opencl, opencl


     The code below illustrates cube calculation using OpenCL. This OpenCL example is structured as follows :
1.) Initially the OpenCL kernel is written in const char *KernelSource.

2.) At the start of main we define all the requisite OpenCL related and other normal variables.

3.) Next in this opencl example, we set OpenCL required environment for running the kernel using functions like clGetDeviceIDs(), clCreateContext(), clCreateCommandQueue().

4.) Then on we create program with source mentioned in char *KernelSource using the function clCreateProgramWithSource() followed by building it with clBuildProgram() and creating our kernel object using clCreateKernel().

5.) Then we allocate memory for input and output using on the selected OpenCL device using the function clCreateBuffer().

6.) Hence we write our input data into the allocated memory using function clEnqueueWriteBuffer()  and set the arguments for the compute kernel with clSetKernelArg() as shown in the OpenCL example below.

7.) We get the maximum work group size for executing the kernel on the device through clGetKernelWorkGroupInfo() followed by executing the kernel over the entire range of our 1d input data set using the maximum number of work group items for this device through clEnqueueNDRangeKernel().

8.) Finally we wait for the commands to get serviced before reading back the results using clFinish() and thereby print the results after reading the output buffer from the opencl device using clEnqueueReadBuffer()    

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <OpenCL/opencl.h>

#define INP_SIZE (1024)

// Simple compute kernel which computes the cube of an input array 

const char *KernelSource = "\n" \
"__kernel void square( __global float* input, __global float* output, \n" \
" const unsigned int count) {            \n" \
" int i = get_global_id(0);              \n" \
" if(i < count) \n" \
" output[i] = input[i] * input[i] * input[i]; \n" \
"}                     \n" ;

int main(int argc, char** argv)
{

 int err; // error code
 float data[INP_SIZE]; // original input data set to device
 float results[INP_SIZE]; // results returned from device
 unsigned int correct; // number of correct results returned

 size_t global; // global domain size 
 size_t local; // local domain size 

 cl_device_id device_id; // compute device id 
 cl_context context; // compute context
 cl_command_queue commands; // compute command queue
 cl_program program; // compute program
 cl_kernel kernel; // compute kernel
 cl_mem input; // device memory used for the input array
 cl_mem output; // device memory used for the output array

 // Fill our data set with random values
 int i = 0;
 unsigned int count = INP_SIZE;

 for(i = 0; i < count; i++)
 data[i] = rand() / 50.00;

 
 // Connect to a compute device
 // If want to run your kernel on CPU then replace the parameter CL_DEVICE_TYPE_GPU 
 // with CL_DEVICE_TYPE_CPU

 err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);

 if (err != CL_SUCCESS)
 {
     printf("Error: Failed to create a device group!\n");
     return EXIT_FAILURE;
 }


 // Create a compute context
 //Contexts are responsible for managing objects such as command-queues, memory, program and kernel objects and for executing kernels on one or more devices specified in the context.

 context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);

 if (!context)
 {
     printf("Error: Failed to create a compute context!\n");
     return EXIT_FAILURE;
 }

 // Create a command commands
 commands = clCreateCommandQueue(context, device_id, 0, &err);
 if (!commands)
 {
     printf("Error: Failed to create a command commands!\n");
     return EXIT_FAILURE;
 }

 // Create the compute program from the source buffer
 program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
 if (!program)
 {
     printf("Error: Failed to create compute program!\n");
     return EXIT_FAILURE;
 }

 // Build the program executable
 err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
 if (err != CL_SUCCESS)
 {
    size_t len;
    char buffer[2048];
    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
    printf("%s\n", buffer);
    exit(1);
 }

 // Create the compute kernel in the program we wish to run
 kernel = clCreateKernel(program, "square", &err);
 if (!kernel || err != CL_SUCCESS)
 {
    printf("Error: Failed to create compute kernel!\n");
    exit(1);
 }

 // Create the input and output arrays in device memory for our calculation
 input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);
 output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * count, NULL, NULL);

 if (!input || !output)
 {
    printf("Error: Failed to allocate device memory!\n");
    exit(1);
 } 

 // Write our data set into the input array in device memory 
 err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);
 if (err != CL_SUCCESS)
 {
    printf("Error: Failed to write to source array!\n");
    exit(1);
 }

 // Set the arguments to our compute kernel
 err = 0;
 err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
 err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output);
 err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &count);

 if (err != CL_SUCCESS)
 {
    printf("Error: Failed to set kernel arguments! %d\n", err);
    exit(1);
 }

 // Get the maximum work group size for executing the kernel on the device
 err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
 if (err != CL_SUCCESS)
 {
    printf("Error: Failed to retrieve kernel work group info! %d\n", err);
    exit(1);
 }

 // Execute the kernel over the entire range of our 1d input data set
 // using the maximum number of work group items for this device
 global = count;
 err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
 if (err)
 {
    printf("Error: Failed to execute kernel!\n");
    return EXIT_FAILURE;
 }

 // Wait for the command commands to get serviced before reading back results
 clFinish(commands);

 // Read back the results from the device to verify the output
 err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL ); 
 if (err != CL_SUCCESS)
 {
    printf("Error: Failed to read output array! %d\n", err);
    exit(1);
 }

  // Print obtained results from OpenCL kernel
 for(i=0; i<count); i++ )
 {
    printf("result[%d] = %f", i, result[i]) ;
 }

 // Cleaning up
 clReleaseMemObject(input);
 clReleaseMemObject(output);
 clReleaseProgram(program);
 clReleaseKernel(kernel);
 clReleaseCommandQueue(commands);
 clReleaseContext(context);

 return 0;
}


Read More

What is OpenCL

Posted by Mahesh Doijade
what is opencl, what is opencl, what is opencl, what is opencl,what is opencl,what is opencl,what is opencl,what is opencl,what is opencl, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL




            OpenCL is an open specification of a low level API and C-like language for writing portable parallel programs for heterogeneous system comprising of varied modern processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), Accelerated Processing Units (APUs) and other processors. It is maintained by the non-profit technology consortium Khronos Group. OpenCL has been supported by Apple, Intel, Advanced Micro Devices (AMD), IBM, Nvidia, Altera, Samsung, ARM Holdings to name a few.
            OpenCL was initially proposed by Apple and submitted to Khronos Group. OpenCL 1.0 specification was released in end of 2008 and by October 2009, IBM released its first OpenCL implementation. Latest specification is OpenCL 1.2 released in November 2011. OpenCL implementations already exists for AMD and NVIDIA GPUs, x86 CPUs. In principle, OpenCL could also target DSPs, Cell, and perhaps also FPGAs (Field Programmable Gate Arrays) . Currently, OpenCL Software Development Kits (SDKs) is been provided by Intel, NVIDIA, AMD. So, one can download any one of the mentioned SDKs and start writing parallel programs in OpenCL with help of those. NVIDIA and AMD have also since a couple of years started  releasing all their GPUs compliant with OpenCL. So one can write a single OpenCL program and run it across a wide range of modern processors. The OpenCL example over here provides more insight into the essence of programming in OpenCL. 
            
   
Read More

Saturday, 15 June 2013

OpenMP min max reduction code

Posted by Mahesh Doijade 8 Comments

OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp,

       One can use reduction clause in OpenMP in order to carry out various operations like addition, multiplication, logical OR, logical AND operations to name some.
      
       reduction(operation:var)
 where,
operation
The operator for the operation to perform on the variables (var) at the end of the parallel region.
var
One more more variables on which to perform scalar reduction. If more than one variable is specified, separate variable names with a comma.

       This clause specifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region. But one of the desirable reduction operation is finding maximum or minimum element  from a given list in parallel fashion, this feature was missing in OpenMP previously for a long while, in order to achieve this functionality one had to use critical sections, locks etc., which incurred lot of performance penalties resulting into worst performance in several cases then sequential version. But OpenMP 3.1 specification has come to the rescue giving this crucial functionality that too in performance efficient manner and obviously with minimal burden on the programmer.     
       OpenMP 3.1 added predefined min and max reduction operators for C and C++, and extensions to the atomic construct that allow the value of the shared variable that the construct updates to be captured or written without being read. OpenMP 3.1 is available in GCC 4.7 and further versions. In case you don't have,  you can download it from here.
Following is the sample code which illustrates max operator usage in OpenMP :


#include <stdio.h>
#include <omp.h>


int main()
{
    double arr[10];
    omp_set_num_threads(4);
    double max_val=0.0;
    int i;
    for( i=0; i<10; i++)
        arr[i] = 2.0 + i;

    #pragma omp parallel for reduction(max : max_val)
    for( i=0;i<10; i++)
    {
        printf("thread id = %d and i = %d", omp_get_thread_num(), i);
        if(arr[i] > max_val)
        {
            max_val = arr[i];  
        }
    }
  
    printf("\nmax_val = %f", max_val);
}

      Similarly, one can use this for finding min by replacing max operator with min and changing the comparison in if condition in the for loop.




Read More