UMBC logo
UMBC High Performance Computing Facility
How to run Intel CnC programs on tara

Introduction

On this page we'll see how to use Intel's Concurrent Collections (CnC) on the cluster. CnC attempts to abstract low-level details of parallel programming from the user, so that they can focus on implementation of their algorithm. Before proceeding, make sure you've read the How To Run tutorial first.

This page is experimental, so contact us and let us know if you have any problems or corrections.

Before proceeding with the examples, we must ensure CnC is loaded in our current session

[araim1@tara-fe1 hello_cnc_multithread]$ export CNC_INSTALL_DIR=/usr/cluster/intel-cnc/0.6
[araim1@tara-fe1 hello_cnc_multithread]$ module load tbb
[araim1@tara-fe1 hello_cnc_multithread]$ module load intel-cnc
[araim1@tara-fe1 hello_cnc_multithread]$ $CNC_INSTALL_DIR/bin/intel64/cncvars.sh
The cnc executable should now be accessible
[araim1@tara-fe1 hello_cnc_multithread]$ cnc
Intel(R) Concurrent Collections for C++ Translator, Prototype Edition version 0.6.0
Copyright(C) 2007-2010 Intel Corporation All Rights Reserved.
(0): error: No input file specified

[araim1@tara-fe1 hello_cnc_multithread]$ 

Single node multi-threaded example

Consider a simple multi-threaded "hello world" example based on the following CnC file.
// Declarations
// The tag values
<int tagvalue>;

// The return values
[unsigned threadid <int>];
[unsigned hostid <int>];

// Step prescription 
<tagvalue> ::  (compute);

// Step execution
(compute) -> [threadid];
(compute) -> [hostid];

// Input from the environment: initialize all tags
env -> <tagvalue>; 

// Output to the environment is the collection of fvalues
[threadid] -> env;
[hostid] -> env;
<tagvalue> -> env;

Download: ../code/hello_cnc_multithread/hello.cnc
This represents a collection with items indexed by some integers. For every tag placed into the collection, a corresponding result will be created with attributes "thread ID" and "host ID", corresponding to the thread and host that responded to the tag. In this example, the host ID will always be the same.

The following C++ file implements the logic of this program

To compile the code, we'll use the following Makefile
ARCH := intel64
M_UNAME := $(shell uname -m)
ifeq ($(M_UNAME), i686)
ARCH := ia32
endif

SOURCES := main.cpp
TARGETS := hello
CNCFILE := hello.cnc
DEST_OBJS=$(SOURCES:.cpp=.o)
GEN_HEADER=$(CNCFILE:.cnc=.h)
HINTSFILE := hello_codinghints.txt

OPT := -O2

all: main

main: $(DEST_OBJS)
    $(CXX) -o $(TARGETS) $(DEST_OBJS) -L$(CNC_INSTALL_DIR)/lib/$(ARCH) -lcnc -ltbb -ltbbmalloc

%.o: %.cpp $(GEN_HEADER)
    $(CXX) -c -I$(CNC_INSTALL_DIR)/include $(OPT) -o $@ $<

$(GEN_HEADER): $(CNCFILE)
    $(CNC_INSTALL_DIR)/bin/$(ARCH)/cnc $(CNCFILE)

clean:
    rm -f $(TARGETS) $(DEST_OBJS) $(GEN_HEADER) $(HINTSFILE)

Download: ../code/hello_cnc_multithread/Makefile
Finally to run the program we'll use the following batch script
#!/bin/bash
#SBATCH --job-name=cnc_multithread
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop

./hello 16 8

Download: ../code/hello_cnc_multithread/run.slurm
[araim1@tara-fe1 hello_cnc_multithread]$ ls
hello.cnc  main.cpp  Makefile  run.slurm
[araim1@tara-fe1 hello_cnc_multithread]$ make
/usr/cluster/intel-cnc/0.6/bin/intel64/cnc hello.cnc
Intel(R) Concurrent Collections for C++ Translator, Prototype Edition version 0.6.0
Copyright(C) 2007-2010 Intel Corporation All Rights Reserved.
g++ -c -I/usr/cluster/intel-cnc/0.6/include -O2 -o main.o main.cpp
g++ -o hello main.o -L/usr/cluster/intel-cnc/0.6/lib/intel64 -lcnc -ltbb -ltbbmalloc
[araim1@tara-fe1 hello_cnc_multithread]$ ls
hello  hello.cnc  hello_codinghints.txt  hello.h  main.cpp  main.o  Makefile  run.slurm
[araim1@tara-fe1 hello_cnc_multithread]$ 
Let us now run the executable
[araim1@tara-fe1 hello_cnc_multithread]$ ./hello 
Usage ./hello <numJobs> [numThreads]
[araim1@tara-fe1 hello_cnc_multithread]$ sbatch run.slurm 
Submitted batch job 102935
[araim1@tara-fe1 hello_cnc_multithread]$ cat slurm.err 
[araim1@tara-fe1 hello_cnc_multithread]$ cat slurm.out 
CnC process started on host n1 with N = 16
Setting number of threads per host to 8
Size of ctx.tagvalue tag collection is 16
Tag        j   threadid hostid
001        0  357384752 000
002        3  357384752 000
003        6  357384752 000
004        9  357384752 000
005       12  357384752 000
006       15  357384752 000
007        2  357384752 000
008        5  357384752 000
009        8  357384752 000
010       11  357384752 000
011       14  357384752 000
012        1  357384752 000
013        4  357384752 000
014        7  357384752 000
015       10  357384752 000
016       13  357384752 000
total elapsed sec = 0.00
[araim1@tara-fe1 hello_cnc_multithread]$ 
Notice that all tags were handled by the same thread. Increasing the number of tags to 64 or so will yield several thread IDs in the output.

Distributed example

Now we will demonstrate running our Hello World example across several nodes on the cluster. Socket-based communication will be used across nodes, and multi-threading will be used within nodes. This section is based on the instructions by Intel, with some additional customization for tara. Running the program gives the following results
[araim1@tara-fe1 hello_cnc_dist]$ make
g++ -c -I/usr/cluster/intel-cnc/0.6/include -O2 -o main.o main.cpp
g++ -o hello main.o -L/usr/cluster/intel-cnc/0.6/lib/intel64 -lcnc -ltbb -ltbbmalloc
[araim1@tara-fe1 hello_cnc_dist]$ sbatch run.slurm 
Submitted batch job 102937
[araim1@tara-fe1 hello_cnc_dist]$ cat slurm.err 
[araim1@tara-fe1 hello_cnc_dist]$ cat slurm.out 
start clients manually with contact string: 0:1025_111@172.20.101.3
Starting client 001 with SOCKETs: 0:1025_111@172.20.101.3
Starting client 002 with SOCKETs: 0:1025_111@172.20.101.3
Starting client 003 with SOCKETs: 0:1025_111@172.20.101.3
Starting client 004 with SOCKETs: 0:1025_111@172.20.101.3
--> established socket connection 1, 3 still missing ...
--> established socket connection 2, 2 still missing ...
--> established socket connection 4, 1 still missing ...
--> established all socket connections to the host.
--> establishing client connections to client 1 ... done
--> establishing client connections to client 2 ... done
--> establishing client connections to client 3 ... done
CnC process started on host n3 with N = 64
Setting number of threads per host to 8
Size of ctx.tagvalue tag collection is 64
Tag        j   threadid hostid
001        0 1110051136 003
002       55 1110051136 003
003       42 1097333056 006
004       29 1768096304 003
005       16 1093925184 004
006        3 1093617984 005
...
063       26 1093925184 004
064       13 1119131968 005
total elapsed sec = 0.00
[araim1@tara-fe1 hello_cnc_dist]$