Thursday, 4 June 2015

PBS Pro Tutorial

Posted by Krishna Arutwar
What is PBS Pro?
Portable Batch System (PBS) is a software which is used in cluster computing to schedule jobs on multiple nodes. PBS was started as contract project by NASA.
            PBS is available in three different versions as below
1) Torque: Terascale Open-source Resource and QUEue Manager (Torque) is developed from OpenPBS. It is developed and maintain by Adaptive Computing Enterprises. It is used as a distributed resource manager can perform well when integrated with Maui cluster scheduler to improve performance.
2) PBS Professional (PBS Pro): It is commercial version of PBS offered by Altair Engineering.
3) OpenPBS: It is open source version released in 1998 developed by NASA. It is not actively developed.
In this article we are going to concentrate on tutorial of PBS Pro it is similar to some extent with Torque.
Fig. 1.1 PBS complex cluster with eight execution host
PBS contain three basic units server, MoM (execution host), scheduler.
1)      Server: It is heart of the PBS, with executable named “pbs_server”. It uses IP network to communicate with the MoMs. PBS server create a batch job, modify the job requested from different MoMs. It keeps track of all resources available, assigned in the PBS complex from different MoMs. It will also monitor the PBS license for jobs. If your license expires it will throw an error.
2)      Scheduler: PBS scheduler uses various algorithms to decide when job should get executed on which node or vnode by using detail of resources available from server. It has executable as “pbs_sched”.
3)      MoM: MoM is the mother of all execution job with executable “pbs_mom”. When MoM gets job from server it will actually execute that job on the host. Each node must have MoM running to get participate in execution.

Installation and Setting up of environment (cluster with multiple nodes)
Extract compressed software of PBS Pro and go the path of extracted folder it contain “INSTALL” file, make that file executable you may use command like “chmod +x ./INSTALL”. As shown in the image below run this executable. It will ask for the “execution directory” where you want to store the executable (such as qsub, pbsnodes, qdel etc.) used for different PBS operations and “home directory” which contain different configuration files. Keep both as default for simplicity.
            There are three kind of installation available as shown in figure:
Fig. 1.2 PBS installation
1) Server node: PBS server, scheduler, MoM and commands are installed on this node. PBS server will keep track of all execution MoMs present in the cluster. It will schedule jobs on this execution nodes. As MoM and commands are also installed on server node it can be used to submit and execute the jobs.
2) Execution node: This type installs MoM and commands. This nodes are added as available nodes for execution in a cluster. They are also allowed to submit the jobs at server side with specific permission by server as we are going to see below. They are not involved in scheduling. This kind of installation ask for PBS server which is used to submit jobs, get status of jobs etc.
3) Client node: This are the nodes which are only allowed to submit a PBS job at server with specific permission by the server and allowed to see the status of the jobs. They are not involved in execution or scheduling.

Creating vnode in PBS Pro:
We can create multiple vnodes in a single node which contain some part of resources in a node. We can execute job on this vnodes with specified allocated resources. We can create vnode using qmgr command which is command line interface to PBS server. We can use command given below to create vnode using qmgr.

Qmgr: create node Vnode1,Vnode2 resources_available.ncpus=8, resources_available.mem=10gb, 
resources_available.ngpus=1, sharing=default_excl 

The command above will create two vnodes named Vnode1 and Vnode2 with 8 cpus cores, 10gb of memory and 1 GPU with sharing mode as default_excl which means this vnode can execute exclusively only one job at a time independent of number of resources free. This sharing mode can be default_shared which means any number of jobs can run on that vnode until all resources are busy. To know more about all attributes which can be used with vnode creation are available in PBS Pro reference guide. 

You can also create a file in "/var/spool/PBS/mom_priv/config.d/" this folder with any name you want I prefer hostname-vnode with sample given below. It will select all files even temporary files with (~) and replace configuration for same vnode so delete unnecessary files to get proper configuration of vnodes.

$configversion 2
Here in this example we assigned default node configuration to resource available as 0 because by default it will detect and allocate all available resources to default node with sharing attribute as is default_shared. Which cause problem as all the jobs will by default get scheduled on that default vnode because its sharing type is default_shared. If you want to schedule jobs on your customized vnodes you should allocate resources available as 0 on default vnode. Every time whenever you restart the PBS server this vnodes get create unlike vnodes created manually using command line.

PBS get status:
get status of Jobs:
qstat will give details about jobs there states etc.
useful options:
To print detail about all jobs which are running or in hold state: qstat -a 
To print detail about subjobs in JobArray which are running or in hold state: qstat -ta
To print all finished jobs: qstat -x

get status of PBS nodes and vnodes:
"pbsnode -a" command will provide list of all nodes present in PBS complex with there resources available, assigned, status etc.
To get details of all nodes and vnodes you created use "pbsnodes -av" command.
You can also specify node or vnode name to get detail information of that specific node or vnode.
pbsnodes wolverine (here wolverine is hostname of the node in PBS complex which is mapped with IP address in /etc/hosts file)
Job submission (qsub):
PBS MoM will submit jobs to the PBS server. Server maintain queue of jobs by default all jobs are submitted to default queue named “workq”. You may create multiple queues by using “qmgr” command which is administrator interface mainly used to create, delete & modify queues and vnodes. PBS server will decide which job to be scheduled on which node or vnode based on scheduling policy and privileges set by user. To schedule jobs server will continuously ping to all MoMs in the PBS complex to get detail of resources available and assigned. PBS assigns unique job identifier to each and every job called JobID.
For job submission PBS uses “qsub” command. It has syntax as shown below
qsub script
Here script may be a shell (sh, csh, tchs, ksh, bash) script. PBS by default uses /bin/sh. You may refer simple script given below
echo “This is PBS job”
sleep 100

When PBS completes execution of job it will store errors in file name with JobName.e{JobID} e.g. Job1.e1492
Output with file name
JobName.o{JobID} e.g. Job1.o1492
By default it will store this files in the current working directory (can be seen by pwd command). You can change this location by giving path with -o option.

you may specify job name with -N option while submitting the job
qsub -N firstJob ./

If you don't specify job name it will store files by replacing JobName with script name.
e.g. qsub ./ this command will store results in file with and in current working directory.
qsub -N firstJob -o /home/user1/ ./ this command will store results in file with and in /home/user1/ directory.
If submitted job terminate abnormally (errors in job is not abnormal, this errors get stored in JobName.e{JobID} file) it will store its error and output files in "/var/spool/PBS/undelivered/" folder.

Useful Options:
Select resources:
qsub -l select="chunks":ncpus=3:ngpus=1:mem=2gb script 

qsub -l select=2:ncpus=3:ngpus=1:mem=2gb /home/titan/PBS/scripts/

This Job will selects 2 copies with 3 cpus, 1 gpu and 2gb memory which mean it will select 6 cpus, 2 gpus and 4 gb ram.

qsub -l nodes=megamind:ncpus=3 /home/titan/PBS/input/

This job will select one node specified with hostname.
To select multiple nodes you may use command given below
qsub -l nodes=megamind+titan:ncpus=3 /home/titan/PBS/input/
Submit multiple jobs with same script (JobArray):
qsub -J 1-20 script

If you specify resources to Job Array each subjob will require resources specified. JobId in Job Array start with JobArrayID[0], JobArrayID[1], .....,, JobArrayID[n-1]

Submit dependant jobs:
In some cases you may require job which should run after successful or unsuccessful completion of some specified jobs for that PBS provide some options such as

qsub -W depend=afterok:316.megamind /home/titan/PBS/input/

This specified job will start only after successful completion of job with job ID "316.megamind". Like afterok PBS has other options such as beforeok, beforenotok to , afternotok. You may find all this details in the man page of qsub.

Submit Job with priority:
There are two ways using which we can set priority to jobs which are going to execute.
1) Using single queue with different jobs with different priority:
       To change sequence of jobs queued in a execution queue open "$PBS_HOME/sched_priv/sched_config" file, normally $PBS_HOME is present in "/var/spool/PBS/" folder. Open this file and uncomment the line below if present otherwise add it.
job_sort_key : "job_priority HIGH"
After saving this file you will need to restart  the pbs_sched daemon on head node you may use command below
service pbs restart
After completing this task you have to submit the job with -p option to specify priority of job within queue. This value may range between (-1024) to 1023, where -1024 is the lowest priority and 1023 is the highest priority in the queue.
qsub -p 100 ./
qsub -p 101 ./
qsub -p 102 ./ 
 In this case PBS will execute jobs as explain in the diagram given below

2) Using different queues with specified priority: We are going to discuss this point in PBS Queue section.

In this example all jobs in queue 2 will complete first then queue 3 then queue 1, since priority of queue 2 > queue 3 > queue 1.
Because of this job execution flow is as shown below

 J4=> J5=> J6=>J7=> J8=> J9=> J1=> J2=> J3 
PBS Queue:
PBS Pro can manage multiple queue as per users requirement. By default every job is queued in "workq" for execution. There are two types of queue are available execution and routing queue. Jobs in execution queue are used by pbs server for execution. Jobs in routing queue can not be executed they can be redirected to execution queue or another routing queue by using command qmove command. By default queue "workq" is an execution queue. The sequence of job in queue may change by using priority defined while job submission as specified above in job submission section.

Useful qmgr commands:
 First type qmgr which is Manager interface of PBS Pro.
To create queue: 
Qmgr: create queue test2

To set type of queue you created: 
Qmgr: set queue test2 queue_type=execution

Qmgr: set queue test2 queue_type=route

To enable queue: 
Qmgr: set queue test2 enabled=True

To set priority of queue: 
Qmgr: set queue test2 priority=50

Jobs in queue with higher priority will get first preference. After completion of all jobs in the queue with higher priority jobs in lower priority queue are scheduled. There is huge probability of job starvation in queue with lower priority.

To start queue: 
Qmgr: set queue test2 started = True

To activate all queue (present at particular node): 
Qmgr: active queue @default

To set queue for specified users: You require to set acl_user_enable attribute to true which indicate PBS to only allow user present in acl_users list to submit the job.
 Qmgr: set queue test2 acl_user_enable=True

To set users permitted (to submit job in a queue): 
Qmgr: set queue test2 acl_users="user1@..,user2@..,user3@.."

(in place of .. you have to specify hostname of compute node in PBS complex. Only user name without hostname will allow users (with same name) to submit job from all nodes (permitted to submit job) in a PBS Complex).

To delete queues we created: 
Qmgr: delete queue test2

To see details of all queue status: 
qstat -Q 

You may specify specific queue name: qstat -Q test2
To see full details of all queue: qstat -Q -f 
You may specify specific queue name: qstat -Q -f test2

Read More

Tuesday, 5 May 2015

Linux commands for Text Manipulating

Posted by Mahesh Doijade
Unix commands for text filtering and manipulating; Shell commands for text filtering and manipulating
    Text Filtering/Manipulating is usually building block for most of the problems we solve. Linux and its shell turns out to be very rich in providing tools/utilities for this consideration. It enriches the user with a numerous handy commands/tools for text filtering and manipulation. Here, I try to cover most of the linux manipulation tools which should certainly apply for most of your needs related to text manipulation. They are listed and detailed in alphabetical order below:

awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ... ] : Awk is an interpreted programming language in itself, which executes complex pattern matching on streams of textual data. It heavily uses associative arrays, strings and regular expressions.Its usefulness for parsing system data and generation of automated reports is commendable.

Few essential arguments:
-F fs Sets the input field separator to the regular expression fs.
-v var=value Assigns the value value to the variable var before executing the awk program.
'prog' An awk program.
-f progfile Specify a file, progfile, which contains the awk program to be executed.
file ... A file to be processed by the specified awk program.

comm [options] [FILE1 FILE2] : comm compares two sorted files FILE1, FILE2 line by line.
1 : Suppress lines unique to the left file
2 : Suppress lines unique to the right file
3 : Suppress lines that appear in both the files

For example:
comm -12 file1 file2
                 Print only lines present in both file1 and file2.
comm -3   file1 file2
                 Print lines in file1 not in file2, and vice versa.

csplit [options] [file] [pattern] : Splits a file into sections depending upon the context/pattern

Few essential arguments:
-f, prefix=PREFIX   Use PREFIX instead of 'xx'
-z   Remove empty output files
-n, digits=DIGITS   Use specified number of digits instead of 2

cut [options] [file pattern] : This command can be used for extracting a portion of text from a file by selecting columns.
-c range :  Outputs only the characters in the range

For example:
This is your text file "file.txt"

$ cat file.txt
This is a test for cut.
Linux text filtering commands.
Linux text manipulating commands. 

The following example displays 4th character from each line of a file "file.txt".

$ cut -c4 file.txt

diff [options] [file1] [file2] : This commands differentiates between the given two files. A handy utility for having a quick check to see for difference between two files.
Few essential arguments:
-a   Treat all files as text and compare them line-by-line, even if they do not seem to be text.
-b     Ignore changes in amount of white space.
-c   Use the context output format.

echo [options] [string] : Prints the given input string.
Few essential arguments:
-n    do not output the trailing newline
-e    enable interpretation of the backslash-escaped characters listed below

For example: echo is useful in checking what values your environment variables holds.
$ echo $PATH 

fold [options] [files] : Wraps each line / text file to fit in a specified width. By default the output is on stdout one can redirect to a file if needed.

fold -sw [SIZE] [input.txt] > [output.txt]

-s   break at spaces
-w   {SIZE} use SIZE as WIDTH columns instead of default 80.

grep  [options and pattern to find] [file] : This is a crucial utility for finding the lines having existence of a pattern in a given plain-text data set. The name grep comes from globally search  a regular expression and print. Some of the more often used options are:

-m <num>  Stops reading a file after <num> of matching lines.
-c <num>   Prints <num> lines of output context.
-x                Selects only those matches which exactly match the whole line.
-i                 Do case insensitive matching.
-l                 Just print the files that match the pattern.
-R , -r          Read all files within a directory recursively.
-w               Select only those lines containing matches that form whole words.             

head [options] [file] : Prints the first 10 lines of the given file.
Essential arguments:
-n N         print the first N lines instead of the first 10

nl [options] [file] : Numbers the lines of the given file. Adds line number to the lines of the given file displaying it on standard output.

$ cat fruits.txt
  Jack fruit
$ nl fruits.txt
  1    apples
  2    bananas
  3    Orange
  4    Jack fruit

sed [expression] [file] : sed (stream editor) is a marvelous utility, IMHO you can do almost any kind of text filtering and transformation if one learns its intricacies. In certain ways it is similar to an editor which allows scripted edits, sed does only one pass over the input(s), and is consequently more efficient. sed’s power to filter text in a pipeline makes it to stand out from other types of editors.
Few essential arguments:
-n suppress automatic printing of pattern space
-e script add the script to the commands to be executed
-f script-fileadd the contents of script-file to the commands to be executed
-l N specify the desired line-wrap length for the ‘l’ command
-r use extended regular expressions in the script.
Essential command: s - substitution
There are many things which can be covered to learn sed which out of scope of this article But s - substitution the mostly used and known command by many of them. The example shown below will replace the passed input old to sed to new.
$ echo old | sed s/old/new/  

sort [options] [file] : Sort is easy to use useful command which sorts the lines in the given file in alphabetically and numerically.

Few essential arguments:
-b   Ignore leading blanks.
-d   Consider only blanks and alphanumeric characters.
-g   Compare according to general numerical value.
-i   Consider only printable characters.
-R   Sort by random hash of keys.
-r   Reverse the result of comparisons.

tail [options] [file] : Prints last 10 lines of the file on the standard output.
Essential arguments:
-f   Output appended data as the file grows.
-n <N>   Print last N lines of the file instead of the default last 10 lines.

tee [options] [file] : Sends the current output stream to the file. It does this at the same time, that is, displaying it on standard output and sending the stream to the file as well.
For example : The command below prints the output of ls command to both the standard output as well as file.txt
$ ls | tee file.txt
One can also let tee to send output to multiple files through command show below:
$ ls | tee file1.txt file2.txt file3.txt

uniq [options] [input] [output] : Removes/filter outs duplicate lines in the input file. If [input] is not specified it reads from stdin and if [output] is not specified it writes to stdout.
Essential arguments:
-c  Prefix lines with a number representing how many times they occurred.
-d  Only print duplicate lines
-i  By default comparisons are case-sensitive, this option enable case-insensitive comparisons.
-u  Only print unique lines
-w <N>  Compare not more than N characters in the lines.

wc [options] [file] : Prints the count of newlines, words and bytes for each input file.
Essential arguments:
-c   Print the byte counts.
-l   Print the newline counts.
-m   Print the character counts.
-w   Print the word counts.
Read More

Sunday, 30 November 2014

Sudoku Solver without recursion in C

Posted by Mahesh Doijade
Sudoku Solver without recursion; Sudoku Solver; Sudoku solving
     I am sure you know basics about the Sudoku problem, in case you don't this link will provide you with comprehensive details about it. The code posted below, is a non-recursive stack-based backtracking implementation of Sudoku Solving. The aim is to provide you a working solution for automated Sudoku Solving. All the backtracking based search for sudoku solving happens in the function sudokuSolver(sudoku_elem **grid)
    Also, some sample extremely difficult sudoku problems are provided in the commented lines below char *sudoku , you can use them for further testing. In case you intend to give any other sudoku problem it should be fed to the implementation in the similar format the way been currently given to char *sudoku .
    Due to iterative nature of the implementation it can work for any extreme problems involving all the kind of depths of the sudoku solving search tree. Along with this, it also has reasonable performance, almost all the tested the sudoku problem where crunched within few seconds. For any questions about the code below, please post your questions in the comments section, it will be a pleasure answering them.

Sudoku Solver without recursion

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <assert.h>  
 #include <omp.h>  
 #define GRID_SIZE 9  
 struct sudoku_elem_t  
   int val;  
   int fixed;  
 typedef struct sudoku_elem_t sudoku_elem;  
 struct triedValue_t  
   int rowPos;  
   int colPos;  
   int value;  
 typedef struct triedValue_t triedValue;  
 struct allTriedValues_t  
   int count;  
   int size;  
   triedValue *allValues;  
 typedef struct allTriedValues_t allTriedValues;  
 sudoku_elem** allocInitSudoku()  
   sudoku_elem **grid;  
   grid = (sudoku_elem **) malloc(sizeof(sudoku_elem *)*GRID_SIZE);  
   if (grid == NULL)  
     return NULL; // Failed to allocate memory returning  
   int i, j;  
   for (i = 0; i < GRID_SIZE; i++)  
     grid[i] = (sudoku_elem *) calloc(9, sizeof(sudoku_elem));  
     if (grid[i] == NULL)  
       return NULL;  
     for (j=0; j < GRID_SIZE; j++)  
       grid[i][j].fixed = 0;  
       grid[i][j].val = 0;  
   return grid;  
 int initAllTriedValues(allTriedValues *a)  
   a->count = 0;  
   a->size = 30;  
   a->allValues = (triedValue *) malloc(sizeof(triedValue) * a->size);  
   if (a->allValues != NULL)  
     return 1;  
   return -1;  
 int insertIntoTriedValues(allTriedValues *a, int row, int col, int value)  
   if (a->count == a->size)  
     a->size *= 2;  
     a->allValues = realloc(a->allValues, sizeof(triedValue)*a->size);  
     if (a->allValues == NULL)  
       return -1; // Reallocation failed no memory  
   a->allValues[a->count].rowPos = row;  
   a->allValues[a->count].colPos = col;  
   a->allValues[a->count].value = value;  
   return 1;  
 /* Returns 1: if no duplicates found in the sudoku constraints */  
 int verifyRules(sudoku_elem** grid, int rowPos, int colPos, int valToCheck)  
   int i, j;  
   // Check if the given number exists in the row or the column  
   for (i = 0; i < GRID_SIZE; i++)  
     if (grid[rowPos][i].val == valToCheck)  
       return -1;  
     if (grid[i][colPos].val == valToCheck)  
       return -1;  
   // Check if the same number exists in the 3x3 tile of rowPos and colPos  
   int bound3x3TopLeft_x = (rowPos / 3) * 3;  
   int bound3x3TopLeft_y = (colPos / 3) * 3;  
   int bound3x3BottomRight_x = ((rowPos / 3) + 1) * 3;  
   int bound3x3BottomRight_y = ((colPos / 3) + 1) * 3;  
   for (i = bound3x3TopLeft_x; i < bound3x3BottomRight_x; i++)  
     for (j = bound3x3TopLeft_y; j < bound3x3BottomRight_y; j++)  
       if (grid[i][j].val == valToCheck)  
         return -1;  
   return 1;  
 int insertValuesInSudokuGrid(sudoku_elem **grid)  
   int rowPos = 0;  
   int colPos = 0;  
   // Randomly select the row and column  
   while (1)  
     rowPos = rand() % 9;  
     colPos = rand() % 9;  
     if (grid[rowPos][colPos].val == 0)  
   int i;  
   for (i = 1; i <=9; i++)  
     int retVal = verifyRules(grid, rowPos, colPos, i);  
     if (retVal == 1)  
       grid[rowPos][colPos].val = i;  
       grid[rowPos][colPos].fixed = 1;  
       return 1;  
   return 1;  
 int getUnfilledPosition(sudoku_elem **grid, int *row, int *col)  
   int i, j;  
   for (i = 0; i < 9; i++)  
     for (j = 0; j < 9; j++)  
       if (grid[i][j].val == 0 && grid[i][j].fixed != 1)  
         *row = i;  
         *col = j;  
         return 0;  
   return 1; // All the positions are filled appropriately  
 // All the magic happens here.  
 void sudokuSolver(sudoku_elem **grid)  
   allTriedValues allValuesStack;  
   int ret = initAllTriedValues(&allValuesStack);  
   int row = 0, col = 0;  
   if (ret == -1)  
     printf("Failed to allocate memory returning\n");  
   int temp_cnt = 0;  
   while (getUnfilledPosition(grid, &row, &col) == 0)  
     int startVal = 1;  
     for (startVal = 1; startVal <= 9; startVal++)  
       int retVal = verifyRules(grid, row, col, startVal);  
       if (retVal == 1)  
         grid[row][col].val = startVal;  
         insertIntoTriedValues(&allValuesStack, row, col, startVal);  
     if (grid[row][col].val == 0)  
       int shouldBacktrack = 1;  
       int backtrack_level;  
       int backtrack_row = row;  
       int backtrack_col = col;  
       int temprow = row;  
       int tempcol = col;  
       while (grid[backtrack_row][backtrack_col].val == 0)  
         if (shouldBacktrack && (allValuesStack.count > 0))  
           backtrack_level = allValuesStack.count;  
           temprow = allValuesStack.allValues[backtrack_level].rowPos;  
           tempcol = allValuesStack.allValues[backtrack_level].colPos;  
           int inc_val = grid[temprow][tempcol].val + 1;  
           grid[temprow][tempcol].val = 0;  
           for (inc_val = inc_val; inc_val <= 9; inc_val++)  
             int retVal = verifyRules(grid, temprow, tempcol, inc_val);  
             if (retVal == 1)  
               grid[temprow][tempcol].val = inc_val;  
               shouldBacktrack = 0;  
               insertIntoTriedValues(&allValuesStack, temprow, tempcol, inc_val);  
           if (grid[temprow][tempcol].val == 0)  
             shouldBacktrack = 1;  
         int start_row, start_col;  
         if (tempcol == 8)  
           start_col = 0;  
           start_row = temprow + 1;  
           start_col = tempcol + 1;  
           start_row = temprow;  
         int backtrackPos = backtrack_row*9 + backtrack_col;  
         int startPos = start_row*9 + start_col;  
         if (startPos > backtrackPos)  
           shouldBacktrack = 1;  
         while (startPos <= backtrackPos)  
           start_row = startPos / 9;  
           start_col = startPos % 9;  
           if (grid[start_row][start_col].fixed == 0)  
             grid[start_row][start_col].val = 0;  
             for (startVal = 1; startVal <= 9; startVal++)  
               int retVal = verifyRules(grid, start_row, start_col, startVal);  
               if (retVal == 1)  
                 grid[start_row][start_col].val = startVal;  
                 insertIntoTriedValues(&allValuesStack, start_row, start_col, startVal);  
             if (grid[start_row][start_col].val == 0)  
               shouldBacktrack = 1;  
               shouldBacktrack = 0;  
 void printSudokuGrid(sudoku_elem **grid)  
   int i, j;  
   for (i = 0; i < GRID_SIZE; i++)  
     for (j = 0; j < GRID_SIZE; j++ )  
       printf("%d-%d ", grid[i][j].val, grid[i][j].fixed);  
 int isValidSudoku(sudoku_elem **grid)  
   int i, j;  
   for (i=0; i < GRID_SIZE; i++)  
     for (j=0; j < GRID_SIZE; j++)  
       if (grid[i][j].val != 0)  
         int temp = grid[i][j].val;  
         grid[i][j].val = 0;  
         int retval = verifyRules(grid, i, j, temp);  
         grid[i][j].val = temp;  
         if (retval == -1)  
           return 0;  
   return 1;  
 int main()  
   int i, j;  
   sudoku_elem **newGrid = allocInitSudoku();  
   //sudoku_elem newGrid[GRID_SIZE][GRID_SIZE];  
   char *sudoku = "........3..1..56...9..4..7......9.5.7.......";  
   // Some of the other extreme Sudoku puzzles, you can try this out.  
   // ".......12........3..23..4....18....5.6..7.8.......9.....85.....9...4.5..47...6...";  
   // "..............3.85..1.2.......5.7.....4...1...9.......5......73..2.1........4...9";  
   for (i=0; i < GRID_SIZE; i++)  
     for (j=0; j < GRID_SIZE; j++)  
       if (sudoku[i*GRID_SIZE + j] != '.')  
         newGrid[i][j].fixed = 1;  
         newGrid[i][j].val = sudoku[i*GRID_SIZE + j] - '0';  
   int ret = isValidSudoku(newGrid);  
   if (ret == 0)  
     printf("Invalid sudoku exiting.....\n");  
   printf("\n -0 ==> initially empty cell\n -1 ==> fixed cell, i.e., initially filled cell\n");  
   double startime = omp_get_wtime();  
   double endtime = omp_get_wtime();  
   printf("\n\n Solved Sudoku ");  
   printf("\n\nSolving time = %lf\n", endtime - startime);  
Read More

Thursday, 26 June 2014

Parallel Reduction in CUDA

Posted by Mahesh Doijade 2 Comments
Parallel Reduction, Parallel Reduction using shared memory CUDA

    Operations which are associative and commutative can be reduction operations some of them are addition, multiplication, bitwise AND/OR/XOR, logical AND/OR/XOR, finding maximum/minimum amongst a given set of numbers. Sequential computation complexity of these operation is over 'n' number of elements is O(n), whereas best theoretical parallel computation complexity can be O(log n). The code below implements parallel reduction in CUDA, here in this case we do addition of floating point array, providing near optimal implementation for arbitrary data sizes and thread block sizes taking transparent scalability into account using shared memory for facilitating faster access and reusing intermediate data.
    In case of any questions/clarifications related to this program, please post them in comment, I will be pleased to answer any queries pertaining to it.

 Parallel Reduction in CUDA

#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>

#define BLOCK_SIZE 512 // You can change this
#define NUM_OF_ELEMS 4096 // You can change this

#define funcCheck(stmt) {                                            \
    cudaError_t err = stmt;                                          \
    if (err != cudaSuccess)                                          \
    {                                                                \
        printf( "Failed to run stmt %d ", __LINE__);                 \
        printf( "Got CUDA error ...  %s ", cudaGetErrorString(err)); \
        return -1;                                                   \
    }                                                                \

__global__  void total(float * input, float * output, int len) 
    // Load a segment of the input vector into shared memory
    __shared__ float partialSum[2*BLOCK_SIZE];
    int globalThreadId = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int t = threadIdx.x;
    unsigned int start = 2*blockIdx.x*blockDim.x;

    if ((start + t) < len)
        partialSum[t] = input[start + t];      
        partialSum[t] = 0.0;
    if ((start + blockDim.x + t) < len)
        partialSum[blockDim.x + t] = input[start + blockDim.x + t];
        partialSum[blockDim.x + t] = 0.0;

    // Traverse reduction tree
    for (unsigned int stride = blockDim.x; stride > 0; stride /= 2)
        if (t < stride)
            partialSum[t] += partialSum[t + stride];

    // Write the computed sum of the block to the output vector at correct index
    if (t == 0 && (globalThreadId*2) < len)
        output[blockIdx.x] = partialSum[t];

int main(int argc, char ** argv) 
    int ii;

    float * hostInput; // The input 1D vector
    float * hostOutput; // The output vector
    float * deviceInput;
    float * deviceOutput;

    int numInputElements = NUM_OF_ELEMS; // number of elements in the input list
    int numOutputElements; // number of elements in the output list
    hostInput = (float *) malloc(sizeof(float) * numInputElements);

    for (int i=0; i < NUM_OF_ELEMS; i++)
        hostInput[i] = 1.0;     // Add your input values

    numOutputElements = numInputElements / (BLOCK_SIZE<<1);
    if (numInputElements % (BLOCK_SIZE<<1)) 
    hostOutput = (float*) malloc(numOutputElements * sizeof(float));

    //@@ Allocate GPU memory here
    funcCheck(cudaMalloc((void **)&deviceInput, numInputElements * sizeof(float)));
    funcCheck(cudaMalloc((void **)&deviceOutput, numOutputElements * sizeof(float)));

    // Copy memory to the GPU here
    cudaMemcpy(deviceInput, hostInput, numInputElements * sizeof(float), cudaMemcpyHostToDevice);

    // Initialize the grid and block dimensions here
    dim3 DimGrid( numOutputElements, 1, 1);
    dim3 DimBlock(BLOCK_SIZE, 1, 1);

    // Launch the GPU Kernel here
    total<<<DimGrid, DimBlock>>>(deviceInput, deviceOutput, numInputElements);

    // Copy the GPU memory back to the CPU here
    cudaMemcpy(hostOutput, deviceOutput, numOutputElements * sizeof(float), cudaMemcpyDeviceToHost);

     * Reduce output vector on the host
    for (ii = 1; ii < numOutputElements; ii++) 
        hostOutput[0] += hostOutput[ii];

    printf("Reduced Sum from GPU = %f\n", hostOutput[0]);   

    // Free the GPU memory here

    return 0;
Read More

Saturday, 19 April 2014

Python is compiled language or interpreted language ?

Posted by Amit Khomane

So you can code into python with ease (kudos to its simplicity). So next is to go deep and learn more about decorators, __add__ method, __get__ method, and more exciting stuff, that's great. But when somebody (most probably in interviews) ask you what exactly is python, compiled or interpreted language? You stuck for some time and explain them (or try to explain) with some heck of technical terms. With my personal experience, Yes! people do get stuck in this question even if they can code smoothly in python (I personally did).
Therefore, I decided to sit and explore about it. With some understanding, I prefer simpler way to explain it to you.

Which Python?
Confused with question? Then you should read this article. We are going to consider CPython for this time (or it doesn't matter to understand concept here).

What is Python?
First thing you need to understand is python is not language (What the heck). Yes TRUE, its merely an interface for language developers. You may be aware of interface in Java or C#. As with interfaces, it gives you an abstraction about how particular functionality should work leaving implementation work for you (or for your class). In simple words it contains behavior that class will implement.
Being interpreted or compiled is not relative terms to interface rather it is related to implementation. So considering Python, which is merely an interface cannot be compiled or interpreted.
Rather, if you you ask, is Cpython compiled or interpreted ? We can say that its interpreted with some compilation (I know bit confusing but have patience for few more lines please).

You need to understand….

How your python code gets executed?
The python code you write is compiled into python bytecode, which creates file with extension .pyc. If compiles, again question is, why not compiled language.

Note that this isn't compilation in the traditional sense of the word. Typically, we’d say that compilation is taking a high-level language and converting it to machine code. But it is a compilation of sorts. Compiled in to intermediate code not into machine code (Hope you got it Now).

Back to the execution process, your bytecode, present in pyc file, created in compilation step, is then executed by appropriate virtual machines, in our case, the CPython VM (actually we call it interpreter, right?).

Execution of Python Code
So for Cpython, we can say that its interpreted language. Aha, So that made to confuse you as Python is an "interpreted language"(which in term True for Cpython, a most famous implementation of python).

So my pyc file contains cross platform code right?. Yes, your bytecode is cross platform but its version dependent ( python 2.x or 3.x).

Is .pyc created every time I run code?
Answer is No. Actually it depends on your modification in py file. The time-stamp (called as magic number) is used to validate whether .py file is changed or not, depending on that new pyc file is created. If pyc is of current code then it simply skips compilation step.

Basically the way the programs are run is always the same. The compiled code is interpreted. The way the programs are loaded differs. If there is a current pyc file, this is taken as the compiled version, so no compile step has to be taken before running the command. Otherwise the py file is read, the compiler has to compile it (which takes a little time) but then the compiled version in memory is interpreted just the same way as always.

Again Its all same for them also, all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter. Difference is only in VM they use. Jython use JVM where as Ironpython use CLR.

Is there any python compiler?
Now we are talking about compilers (according to above definition).You may have heard about PyPy it is JIT compiler for python code. Nuitka, is one of the notable compilers. Nuitka attempts to translate pure Python not into bytecode, but into machine code (via C++ compiler), while using libpython at run time. Another one is ShedSkin. It compiles implicitly statically typed Python to C++, stand-alone programs or extension modules.

    Hence, perfect answer to the question, Python is compiled language or interpreted language ?, It totally depends upon which python is in consideration?, After that you can proceed with further explanation as per python in consideration (mostly answer is interpreter), and you can explain the process to clarify. I hope this article helped you.

Read More

Sunday, 6 April 2014

Linux Filesystem Commands

Posted by Mahesh Doijade 2 Comments
Linux Filesystem simplified, Linux Filesystem Commands
df [options] [device name] : It displays the filesystem usage related information
-a : Displays all the filesystems
-i : Gives inode usage information
-h : Displays in human readable format. Shows quantified byte information

Example -:
$ df -a
Filesystem       1K-blocks    Used Available Use% Mounted on
/dev/sda1          7608792 2469924   4729320  35% /

proc                     0       0         0    - /proc
sysfs                    0       0         0    - /sys
none                     0       0         0    - /sys/fs/fuse/connections
none                     0       0         0    - /sys/kernel/debug
none                     0       0         0    - /sys/kernel/security
udev                244408       4    244404   1% /dev
devpts                   0       0         0    - /dev/pts
tmpfs               101288     772    100516   1% /run
none                  5120       0      5120   0% /run/lock
none                253216     124    253092   1% /run/shm
gvfs-fuse-daemon         0       0         0    - /home/mahesh/.gvfs
/dev/sr0             62658   62658         0 100% /media/VBOXADDITIONS_4.3.6_
du [options] [pattern]  :  Displays space usage on files and directories
-c : Displays grand total for all the arguments
-h : Displays in human readable format. Shows quantified byte information
Example -:
$ du -ahc
0 ./centos/6
8.0K ./centos
4.0K ./samplefile.txt
0 ./ubuntu/ub10
4.0K ./ubuntu
16K .
16K total

ls [options] [filepattern] : Lists out directories and file entries from the given pattern
-a : Displays all the files including . &  ..
-r : Lists all the files in directories recursively
-l : Displays long list consisting of permissions on each file and other details
-d : Lists directories and not their content
-x : Displays sorted list by extension of the file
-s : Sorts output according to file size
-u : Displays sorted list by the access time
Example -:
$ ls -la
total 160
drwxr-xr-x 20 mahesh mahesh  4096 Mar 24 00:00 .
drwxr-xr-x  3 root   root    4096 Mar  9 01:23 ..
-rw-r--r--  1 mahesh mahesh   220 Mar  9 01:23 .bash_logout
-rw-r--r--  1 mahesh mahesh  3486 Mar  9 01:23 .bashrc
drwx------ 14 mahesh mahesh  4096 Mar 24 00:00 .cache
drwx------ 10 mahesh mahesh  4096 Mar 24 00:11 .config
drwx------  3 mahesh mahesh  4096 Mar  9 01:36 .dbus
drwxr-xr-x  2 mahesh mahesh  4096 Mar 24 00:11 Desktop
-rw-r--r--  1 mahesh mahesh    25 Mar 23 19:50 .dmrc
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Documents
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Downloads
-rw-r--r--  1 mahesh mahesh  8445 Mar  9 01:23 examples.desktop
drwx------  3 mahesh mahesh  4096 Mar 23 19:51 .gconf
drwx------  4 mahesh mahesh  4096 Mar  9 01:37 .gnome2
-rw-rw-r--  1 mahesh mahesh   142 Mar 23 19:51 .gtk-bookmarks
dr-x------  2 mahesh mahesh     0 Mar 23 19:51 .gvfs
-rw-------  1 mahesh mahesh   724 Mar 23 19:51 .ICEauthority
drwxr-xr-x  3 mahesh mahesh  4096 Mar  9 01:36 .local
drwx------  3 mahesh mahesh  4096 Mar  9 01:37 .mission-control
drwx------  4 mahesh mahesh  4096 Mar 24 00:00 .mozilla
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Music
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Pictures
-rw-r--r--  1 mahesh mahesh   675 Mar  9 01:23 .profile
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Public
drwx------  2 mahesh mahesh  4096 Mar 23 19:51 .pulse
-rw-------  1 mahesh mahesh   256 Mar  9 01:36 .pulse-cookie
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Templates
drwxr-xr-x  2 mahesh mahesh  4096 Mar  9 01:36 Videos
-rw-------  1 mahesh mahesh    62 Mar 23 19:50 .Xauthority
-rw-------  1 mahesh mahesh 23406 Mar 24 02:44 .xsession-errors
-rw-------  1 mahesh mahesh 12494 Mar 16 15:11 .xsession-errors.old 

mkdir [options] [directory-name] : Creates a new directory by the given directory-name
-m : Sets file mode
-p  : Creates parent directories if it does not exists.
Example -:
$ mkdir testdir
The above command will create a directory named "testdir"

 touch [options] [pattern] : Updates the timestamp of a file, If the file does not exists by default options creates it.
-a : Changes the access time only
-c : Do not create the file
-t  : Give a timestamp to use instead of current time
Example -:
$ touch testfile.txt
Creates "testfile.txt" if it does not exists, else it only updates timestamp of the "testfile.txt".

shred [options] [file_pattern] : Enables to overwrite a file to hide its content, also allows to delete file data in secured manner.
-s : This followed by number of bytes, will shred those many bytes
-n : Number of pattern iterations to run
-z : Add a final overwrite with zero to hide shredding
-u : Truncate and remove the file after overwriting
 Example -:
$ shred file1.txt file2.txt
The above command will destroy file1.txt and file2.txt completely, so not be able to recover even using any recovery utilities

rm [options] [file_pattern] : Remove the file
-f : Force removal
-i : Prompt before removal of each file 
-r : Deletes directories and their contents recursively.
Example -:
$ rm -rf testdir
Removes forcefully and recursively all the files in "testdir" as well as removes "testdir" directory.

mv [options] [from_pattern] [to_file] : Move or Rename a file
-f  : Do not prompt before overwriting
-i  : Interactive, prompts before moving
-n : Do not overwrite an existing file
Example -:
$ mv old_name new_name
Above command renames the "old_name" file/directory to "new_name".
$ mv alphonso basket/mango
Moves a file to another directory and rename it:

tar [options] [tar_file] [pattern] : Creates, Extracts an archive.
-c  : Creates a new tar archive
-f  : Specify an tar archive file to use
-t  : List the contents
-v : Verbose mode
-x : Extracts the archive contents
-z : Compress/Decompress through gzip
-j : Compress/Decompress through bzip2
Example -:
$ tar -cvf techdarting-04-14.tar /home/techdarting/

cd [directory] : Change directory

chown [options]  [mode] [file_name] : Enables to change the ownership of a directory, file etc.
-R : Recursively change ownership.

chmod [options] [user] [file_name] :  Enables to change the permission of a file, directory, etc.
-R : Recursively change permissions

Read More