Friday, 28 June 2013

GnuPG - GPG Encryption and Signing Tutorial

Posted by Mahesh Doijade 2 Comments
Introduction :
            GnuPG is an open-source implementation of the OpenPGP standard. GnuPG enables one to encrypt, sign and also implement various cryptographic algorithms in order to carry out your data and communication in secure manner, it features an all-round key management system as well as access modules for all kinds of public key directories. GnuPG, also known as GPG, GPG is a command line utility which features easy integration with other applications. Several libraries and frontend applications are available for gpg. GPG also provides support for S/MIME, so one can use gpg to carry out secure email communication.

Using gpg :
                    Suppose we have two users viz., userA and userB. userA wants to send some important file "crucial_file.txt" to userB. Then in order to carry out this communication securely, userA need to encrypt this "crucial_file.txt" using some cryptographic technique in this example we use public key cryptography and then send it to userB. This example demonstrates using gpg command line tool in linux.

 Initially both userA and userB need to generate public and private keys for themselves so both of them execute the following command in their respective terminal.
$ gpg --gen-key

Then follow the procedure as asked during the command execution, and please remember your passphrase. You can select what default public key cryptographic algorithm you want, and as regards symmetric cryptography default algorithm used by gpg is CAST5.

To check whether keys are generated by gpg and what all keys are present in the given users ring, a user can do it by executing.
$ gpg --list-keys

Now in order to encrypt the file userA need public key of userB, so userB need to send its public key to userA, also userA sends his public key to userB.
$ gpg --armor --export > userB_pk

So the public key userB is now in userB_pk, same can be done by userA to export his public key to some text file, they can send their respective public key

To import other users public key
$ gpg --import userB_pk

For encryption using gpg at suppose userA end, he need to execute the following command and give receivers public key, in our case suppose userB's
$ gpg --output plain_file_enc --encrypt plain_file.txt

For decryption at receivers end.
$ gpg --output decrypted_plain_file.txt --decrypt plain_file_enc

For signing file with senders private key..
$ gpg --output plain_file_signed --sign plain_file.txt

For just verifying the sender of signed file
$ gpg --verify plain_file_signed

To do signing and then encryption, that is, implementing digital signatures.
1.) signing : $ gpg --output plain_file_signed --sign plain_file.txt
2.) encryption : $ gpg --output plain_file_enc --encrypt plain_file_signed

Decrypting the above generated encrypted data.
1.) decryption : $ gpg --output plain_file_decrypted --decrypt plain_file_enc
2.) decrypt sign : $ gpg --output plain_file_original.txt --decrypt plain_file_decrypted

Read More

Wednesday, 26 June 2013

What is Android ?

Posted by Mahesh Doijade

      Android is the most popular mobile operating system in the world used by large number of smartphones, tablets and many more handheld devices.It is based on Linux Kernel. Android is open-source and is released by Google under Apache License.The android open source code and provided licensing allows android software to be modified freely and distributed by mobile device manufacturers, wireless carriers and enthusiastic developers.
      So, the above description gives a brief idea about what is android ?, To dive into android's brief history, It was initially developed by Android Inc. founded in 2003 in Palo Alto, USA. Google later acquired Android Inc. in 2005, thereby making it a completely owned subsidiary of Google. After the initial release there have been large number of updates in the initial version of Android. The current popularity of android is evident from the fact that as of May 2013, around 900 million Android devices have been activated and 48 billion apps have been installed from the Google Play store. The evolution of Android version by version is detailed below:
      As Google maintains and develops Android, it is shipped with a number of Google services installed right out of the box. Google Web search, Gmail, Google Maps, and  Google Calendar are all pre-installed, and default web page for the Web browser is google. But as Android been open source, carriers can  modify this according to their convenience. Android is shipped with Google Play store which is an online software repository to obtain android apps. Using Play store Android users can select, and download applications developed by third party developers and use them. Till April 2013, there are around 8.5 lacs+ application, games and widgets available on the play store for users.
      Finally we can conclude that, Android is an incredible platform for consumers and developers. It is in principles contrary to the iPhone in many ways. As the iPhone aims to create the best user experience by restricting hardware and software standards, Android aims in insuring to open up as much of the platform as possible.


Read More

Sunday, 23 June 2013

GIT : distributed version control, getting started

Posted by Mahesh Doijade
git tutorial, git distributed version control, git getting started

Git is a distributed version control and Source Code Management (SCM). Git was initially developed by Linus Torvalds for maintaining Linux Kernel development. Git is largely inspired by BitKeeper and Monotone distributed version control. Git is entirely developed in C. Though Git is developed on Linux it is also supported for different operating such as Microsoft Windows, Solaris, Mac OS X.

One can install Git on Redhat/CentOS through yum using the following command:
$ yum install git

On ubuntu and debian based systems  :-->  $ sudo apt-get install git

Creating Git repository :

For creating Git repository, suppose your source code is in some folder named "all_src".
Then you need to go in that directory that is "all_src" in our case type the following command

$ git init  

this will create a git repository in that directory and you can start using other git commands on your created repository.

Basic Branching in Git 

A single git repository can maintain several branches of development. To create a new branch named "experimentation", use
$ git branch experimentation
If you now run
$ git branch
you'll get a list of all existing branches:
* master
The "experimentation" branch is the one you just created, and the "master" branch is a default branch that was created for you automatically. The asterisk marks the branch you are currently on; type
$ git checkout experimentation
to switch to the experimentation branch. Now edit a file, commit the change, and switch back to the master branch:
(edit file)
$ git commit -a
$ git checkout master
Check that the change you made is no longer visible, since it was made on the experimentation branch and you're back on the master branch.
You can make a different change on the master branch:
(edit file)
$ git commit -a
at this point the two branches have diverged, with different changes made in each. To merge the changes made in experimentation into master, run
$ git merge experimentation
If the changes does not conflict, its is done. If there are conflicts, markers will be left in the problematic files showing the conflict;
$ git diff
will show this. Once you've edited the files to resolve the conflicts,
$ git commit -a
will commit the result of the merge. Finally,
$ gitk
will display a good graphical representation of the resulting history.
At this point you could delete the experimentation branch with
$ git branch -d experimentation
This command ensures that the changes in the experimentation branch are already in the current branch.
If you develop on a branch test-idea, then regret it, you can always delete the branch with
$ git branch -D test-idea
Branches are less expensive and easy, so this is a good method to try something out.

These are basics to get started with Git, different features of Git will be posted in forthcoming posts. 

Read More

Thursday, 20 June 2013


Posted by Mahesh Doijade
Message Passing Interface, MPI, What is MPI



(Message Passing Interface) is an open standardized message passing library specification to carry out portable message passing on a wide variety of parallel computers and is designed by a wide group of researchers from academia and industry. The first MPI standard draft was prepared in 1992 and after some community reviews MPI 1.0 standard was released in 1994 since then a lot of improvements have been done in the MPI standard, the latest MPI standard is MPI 3.0 released in September 2012.

What MPI is not ?

          MPI is not a programming language nor any compiler specification. Also MPI is not any specific implementation or any software product. MPI is just a message library specification which any MPI implementer can use to develop a message passing library or one can say message passing implementation.

          Various message passing implementations currently exists both in open source as well as commercial vendor provided implementations. Some popular open source MPI implementations are MPICH, OpenMPI, MVAPICH, MPICH is been maintained by Argonne National Laboratory, OpenMPI is been developed and maintained by the open source community whereas MVAPICH is been developed and maintained by Ohio State University and it is a highly optimized implementation for Infiniband, 10Gigabit Ethernet/iWARP and RoCE kind of networks.  All the mentioned open source implementations are available for C, C++ and Fortran. Also, the vendor specific implementations are Intel MPI, HP MPI, SGI MPI etc.
           Following is a minimal MPI program :

#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
    MPI_Init( &argc, &argv );
    printf( "Hello, world!\n" );
    return 0;
          Only whatever happens between MPI_Init() and MPI_Finalize() is been defined by MPI standard.The program below is much better MPI program providing idea about which processes are actually in action by giving its rank through MPI_Comm_rank()  and how many MPI processes are there overall through MPI_Comm_size(). 
#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
     int rank, size;

    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    printf( “Hello I am %d of %d\n", rank, size );


    return 0;
         In order to run a MPI program, like ones written above, one need to install any of the previously mentioned MPI implementation and can compile it as follows in any linux machine "mpicc 'your_source_code.c'   " and can then run through "mpiexec -np 'no_of_processes_required' 'path_to_your_executable' " command. Suppose by compiled MPI executable is 'a.out' and in my working directory and I want to launch 4 processes then my launching command would be.
mpiexec -np 4 ./a.out  

Read More

Monday, 17 June 2013


Posted by Mahesh Doijade 2 Comments
cuda, CUDA, what is cuda

      CUDA (Compute Unified Device Architecture) is a parallel computing platform and first programming model that enabled high level programming for GPUs (Graphics Processing Units) and thereby made them available for general purpose computing. Previously, if one had to use GPUs for doing general purpose computation then they had to use the Graphics API provided by OpenGL, DirectX and other related graphics APIs and map there computation onto them. All these issues was overcome by CUDA and so now GPUs are also been called GPGPUs that is General Purpose computing on Graphics Processing Units. So writing code in CUDA programmers can speed up their algorithms to a massive extent due to amount of parallelism offered by GPUs.
      One can start programming in CUDA using extensions provided for C, C++ and Fortran. So, NVIDIA’s basic CUDA setup consists of 3 parts: The NVIDIA graphics card drivers, the CUDA SDK, and the CUDA Toolkit. The drivers are available for most Linux distributions as well as for Windows.  For development environment consisting of compiler, tools for debugging and performance optimization one need to download the CUDA Toolkit. For other basic requisite stuff such as sample examples, different basic image processing, primitives for linear algebra operations related libraries based on CUDA one can get it from CUDA SDK(Software Development Kit).  CUDA platform allows creation of very large number of concurrently executed threads at very low system resource cost.
CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda, CUDA, cuda
    The figure above shows the architecture of a CUDA capable GPU. It consists of multiple numbers of Stream Multiprocessors(SMs) which contains several Streaming Processors (SPs) ( CUDA cores ) in them, both the number of SMs and CUDA cores depends upon the type, generation of the GPU device. The memory hierarchy from low speed to high speed is in this order Global memory(Device memory), Texture memory, Constant memory, Shared memory, Registers. So, the fastest is access from registers and slowest is global memory. Hence, CUDA programmer need to write their code considering this hierarchy in order to gain maximum performance benefit.
Read More

Sunday, 16 June 2013

OpenCL example

Posted by Mahesh Doijade

what is opencl, opencl

     The code below illustrates cube calculation using OpenCL. This OpenCL example is structured as follows :
1.) Initially the OpenCL kernel is written in const char *KernelSource.

2.) At the start of main we define all the requisite OpenCL related and other normal variables.

3.) Next in this opencl example, we set OpenCL required environment for running the kernel using functions like clGetDeviceIDs(), clCreateContext(), clCreateCommandQueue().

4.) Then on we create program with source mentioned in char *KernelSource using the function clCreateProgramWithSource() followed by building it with clBuildProgram() and creating our kernel object using clCreateKernel().

5.) Then we allocate memory for input and output using on the selected OpenCL device using the function clCreateBuffer().

6.) Hence we write our input data into the allocated memory using function clEnqueueWriteBuffer()  and set the arguments for the compute kernel with clSetKernelArg() as shown in the OpenCL example below.

7.) We get the maximum work group size for executing the kernel on the device through clGetKernelWorkGroupInfo() followed by executing the kernel over the entire range of our 1d input data set using the maximum number of work group items for this device through clEnqueueNDRangeKernel().

8.) Finally we wait for the commands to get serviced before reading back the results using clFinish() and thereby print the results after reading the output buffer from the opencl device using clEnqueueReadBuffer()    

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <OpenCL/opencl.h>

#define INP_SIZE (1024)

// Simple compute kernel which computes the cube of an input array 

const char *KernelSource = "\n" \
"__kernel void square( __global float* input, __global float* output, \n" \
" const unsigned int count) {            \n" \
" int i = get_global_id(0);              \n" \
" if(i < count) \n" \
" output[i] = input[i] * input[i] * input[i]; \n" \
"}                     \n" ;

int main(int argc, char** argv)

 int err; // error code
 float data[INP_SIZE]; // original input data set to device
 float results[INP_SIZE]; // results returned from device
 unsigned int correct; // number of correct results returned

 size_t global; // global domain size 
 size_t local; // local domain size 

 cl_device_id device_id; // compute device id 
 cl_context context; // compute context
 cl_command_queue commands; // compute command queue
 cl_program program; // compute program
 cl_kernel kernel; // compute kernel
 cl_mem input; // device memory used for the input array
 cl_mem output; // device memory used for the output array

 // Fill our data set with random values
 int i = 0;
 unsigned int count = INP_SIZE;

 for(i = 0; i < count; i++)
 data[i] = rand() / 50.00;

 // Connect to a compute device
 // If want to run your kernel on CPU then replace the parameter CL_DEVICE_TYPE_GPU 

 err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);

 if (err != CL_SUCCESS)
     printf("Error: Failed to create a device group!\n");
     return EXIT_FAILURE;

 // Create a compute context
 //Contexts are responsible for managing objects such as command-queues, memory, program and kernel objects and for executing kernels on one or more devices specified in the context.

 context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);

 if (!context)
     printf("Error: Failed to create a compute context!\n");
     return EXIT_FAILURE;

 // Create a command commands
 commands = clCreateCommandQueue(context, device_id, 0, &err);
 if (!commands)
     printf("Error: Failed to create a command commands!\n");
     return EXIT_FAILURE;

 // Create the compute program from the source buffer
 program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
 if (!program)
     printf("Error: Failed to create compute program!\n");
     return EXIT_FAILURE;

 // Build the program executable
 err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
 if (err != CL_SUCCESS)
    size_t len;
    char buffer[2048];
    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
    printf("%s\n", buffer);

 // Create the compute kernel in the program we wish to run
 kernel = clCreateKernel(program, "square", &err);
 if (!kernel || err != CL_SUCCESS)
    printf("Error: Failed to create compute kernel!\n");

 // Create the input and output arrays in device memory for our calculation
 input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);
 output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * count, NULL, NULL);

 if (!input || !output)
    printf("Error: Failed to allocate device memory!\n");

 // Write our data set into the input array in device memory 
 err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);
 if (err != CL_SUCCESS)
    printf("Error: Failed to write to source array!\n");

 // Set the arguments to our compute kernel
 err = 0;
 err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
 err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output);
 err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &count);

 if (err != CL_SUCCESS)
    printf("Error: Failed to set kernel arguments! %d\n", err);

 // Get the maximum work group size for executing the kernel on the device
 err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
 if (err != CL_SUCCESS)
    printf("Error: Failed to retrieve kernel work group info! %d\n", err);

 // Execute the kernel over the entire range of our 1d input data set
 // using the maximum number of work group items for this device
 global = count;
 err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
 if (err)
    printf("Error: Failed to execute kernel!\n");
    return EXIT_FAILURE;

 // Wait for the command commands to get serviced before reading back results

 // Read back the results from the device to verify the output
 err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL ); 
 if (err != CL_SUCCESS)
    printf("Error: Failed to read output array! %d\n", err);

  // Print obtained results from OpenCL kernel
 for(i=0; i<count); i++ )
    printf("result[%d] = %f", i, result[i]) ;

 // Cleaning up

 return 0;

Read More

What is OpenCL

Posted by Mahesh Doijade
what is opencl, what is opencl, what is opencl, what is opencl,what is opencl,what is opencl,what is opencl,what is opencl,what is opencl, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL, OpenCL

            OpenCL is an open specification of a low level API and C-like language for writing portable parallel programs for heterogeneous system comprising of varied modern processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), Accelerated Processing Units (APUs) and other processors. It is maintained by the non-profit technology consortium Khronos Group. OpenCL has been supported by Apple, Intel, Advanced Micro Devices (AMD), IBM, Nvidia, Altera, Samsung, ARM Holdings to name a few.
            OpenCL was initially proposed by Apple and submitted to Khronos Group. OpenCL 1.0 specification was released in end of 2008 and by October 2009, IBM released its first OpenCL implementation. Latest specification is OpenCL 1.2 released in November 2011. OpenCL implementations already exists for AMD and NVIDIA GPUs, x86 CPUs. In principle, OpenCL could also target DSPs, Cell, and perhaps also FPGAs (Field Programmable Gate Arrays) . Currently, OpenCL Software Development Kits (SDKs) is been provided by Intel, NVIDIA, AMD. So, one can download any one of the mentioned SDKs and start writing parallel programs in OpenCL with help of those. NVIDIA and AMD have also since a couple of years started  releasing all their GPUs compliant with OpenCL. So one can write a single OpenCL program and run it across a wide range of modern processors. The OpenCL example over here provides more insight into the essence of programming in OpenCL. 
Read More

Saturday, 15 June 2013

OpenMP min max reduction code

Posted by Mahesh Doijade 8 Comments

OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP min reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, OpenMP max reduction, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp, openmp,

       One can use reduction clause in OpenMP in order to carry out various operations like addition, multiplication, logical OR, logical AND operations to name some.
The operator for the operation to perform on the variables (var) at the end of the parallel region.
One more more variables on which to perform scalar reduction. If more than one variable is specified, separate variable names with a comma.

       This clause specifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region. But one of the desirable reduction operation is finding maximum or minimum element  from a given list in parallel fashion, this feature was missing in OpenMP previously for a long while, in order to achieve this functionality one had to use critical sections, locks etc., which incurred lot of performance penalties resulting into worst performance in several cases then sequential version. But OpenMP 3.1 specification has come to the rescue giving this crucial functionality that too in performance efficient manner and obviously with minimal burden on the programmer.     
       OpenMP 3.1 added predefined min and max reduction operators for C and C++, and extensions to the atomic construct that allow the value of the shared variable that the construct updates to be captured or written without being read. OpenMP 3.1 is available in GCC 4.7 and further versions. In case you don't have,  you can download it from here.
Following is the sample code which illustrates max operator usage in OpenMP :

#include <stdio.h>
#include <omp.h>

int main()
    double arr[10];
    double max_val=0.0;
    int i;
    for( i=0; i<10; i++)
        arr[i] = 2.0 + i;

    #pragma omp parallel for reduction(max : max_val)
    for( i=0;i<10; i++)
        printf("thread id = %d and i = %d", omp_get_thread_num(), i);
        if(arr[i] > max_val)
            max_val = arr[i];  
    printf("\nmax_val = %f", max_val);

      Similarly, one can use this for finding min by replacing max operator with min and changing the comparison in if condition in the for loop.

Read More