MICRO_MEM(1) USER COMMANDS MICRO_MEM(1)
Table of Contents
NAMEmicro_mem - Multiprocessor Memory Hierarchy and Network Interconnect Micro Benchmark - Version 0.3, Oct 1994
micro_mem procs [ -A arraysize ] [ -L lower ] [ -U upper ] [ -I iterations ] [ -P exec_procs ] [ -H init_procs ] [ -d ] [ -a ] [ -f filename ]
micro_mem is used to obtain the physical and performance profiles of the memory hierarchy and network interconnect of shared-memory multiprocessors.
The only required argument by micro_mem is the number of threads. Consider the example :
micro_mem 4
Here four threads are used to measure the average time require to
read, modify and write a single element of a double floating point
array over different values of size, and stride. Variable size
ranges from lower (default 1) to upper (default 2^20) elements. For
each particular value of size, stride is varied from 1 to
size / 2. Each thread starts each experiment at position first_elem =
(arraysize * thread_id / procs), where thread_id is the logical id
number of the thread. If (first_elem + size) is larger than arraysize
the thread wraps around element 0. The following figure illustrates
the accesses made by each thread when size = (arraysize / 4), stride
= 1 :
0000000000111111111122222222223333333333
0123456789012345678901234567890123456789
<------------- arraysize -------------->
thread-0 <------ size ------>
thread-1 <------------------>
thread-2 <------------------>
thread-3 <--------> <-------->
When size is less than arraysize / procs, there is no data contention
amongst the threads. The maximum amount of contention occurs when
size = arraysize and stride = (size / 2). File Super_93_1.ps,
which is included in the distribution, explains in more detail how
the micro benchmark is used to explore the performance space. (Several of the following options take as an argument a restricted-number (a multiple of a power of two), which is given as n m, and represents number n*2^m.)
size stride
--------- --------
512 elems 1 elem
1 elems 1 elem
1 elems 2 elems
The following examples illustrate some interesting experiments:
1) To measure the performance characteristics of the memory hierarchy as seen from a particular processor (in this case 7):
micro_mem 1 -P 7
2) To measure the effect of data contention on 8 processors
(23 45 65 9 23 1 4 10) on an array of 4 MBytes (assuming 8
bytes per element):
micro_mem 8 -A 1 19 -L 1 17 -P 23 45 65 9 23 1 4 10
3) To measure the effects of contention in the interconnect
as a function of the number of threads, but under constant
data contention (in this case only 2 procs content for the
same element), we can run the following four configurations:
micro_mem 2 -A 1 20 -L 1 20 -U 1 20
micro_mem 4 -A 2 20 -L 1 20 -U 1 20
micro_mem 8 -A 4 20 -L 1 20 -U 1 20
micro_mem 10 -A 5 20 -L 1 20 -U 1 20
micro_mem 12 -A 6 20 -L 1 20 -U 1 20
micro_mem 14 -A 7 20 -L 1 20 -U 1 20
micro_mem 16 -A 8 20 -L 1 20 -U 1 20
FILESThe micro benchmark consists of three modules: micro_mem.c, utils.c, and machine.c. The first two are machine- independent and do not require changes when porting them to a new system. Machine.c, on the other hand, is machine- dependent and contains critical functions for thread/process creation, synchronization, and memory allocation. Details about these functions are given in file machine.c. Currently, we support three different versions of machine.c: machine_ksr.c (KSR1 using POSIX primitives), machine_dash.C (Dash multiprocessor using ANL macros) and machine_uni.c (vanilla uniprocessor). For each version of machine_mach.c, where mach is a mnemonic identifying a particular machine, there is a file Makefile_mach that contains specific instructions about how to compile the sources.
Argument HAS TO BE the first argument and is always required. If it is missing or in the wrong place, expect problems.
SEE ALSO (man pages not available) mm_filter(1), build_gnuplot(1).