由买买提看人间百态

topics

全部话题 - 话题: openmp
1 2 3 4 5 6 7 8 下页 末页 (共8页)
x*****u
发帖数: 3419
1
来自主题: Computation版 - Using OpenMP 3 zz
http://www.linux-mag.com/2004-03/extreme_01.html
Linux Magazine / March 2004 / EXTREME LINUX
Using OpenMP, Part 3
EXTREME LINUX
Using OpenMP, Part 3
by Forrest Hoffman
This is the third and final column in a series on shared memory parallel
ization using OpenMP. Often used to improve performance of scientific models
on symmetric multi-processor (SMP) machines or SMP nodes in a Linux cluster
, OpenMP consists of a portable set of compiler directives, library calls, a
nd environment variables.
l******9
发帖数: 579
2
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
l******9
发帖数: 579
3
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
l******9
发帖数: 579
4
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
l******9
发帖数: 579
5
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
y**b
发帖数: 10166
6
来自主题: Programming版 - openmp并行计算疑问
用openmp并行化一个模拟程序,发现:
计算只要运行足够步长,openmp每次运行给出的结果都不一样,
而串行运算的结果始终一样。这正常吗?
直觉上openmp由于每次运行都采用不同的计算顺序(各线程的
先后顺序是随机的),从而可能改变误差的积累方式,一般怎么
处理这类问题?谢谢。
l******9
发帖数: 579
7
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
l******9
发帖数: 579
8
Hi,
I am trying to do parallelization for a computing intensive problem.
I am working on a Linux cluster where each node is a multicore processor.
e.g. 2 or 4 quad-core processor per node.
I want to reduce latency and improve performance as much as possible.
I plan to use multiprocessing and multithreading at the same.
Each process run on a distinct node and each process spawn many threads
on each node. This is a 2 level parallelism.
For multiprocessing, I would like to choose MPI.
For multithre... 阅读全帖
x*****u
发帖数: 3419
9
来自主题: Computation版 - OpenMP Multi-Processing 2 zz
http://www.linux-mag.com/2004-02/extreme_01.html
Linux Magazine / February 2004 / EXTREME LINUX
OpenMP Multi-Processing, Part 2
EXTREME LINUX
OpenMP Multi-Processing, Part 2
by Forrest Hoffman
This month, we continue our focus on shared-memory parallelism using Ope
nMP. As a quick review, remember that OpenMP consists of a set of compiler d
irectives, a handful of library calls, and a set of environment variables th
at can be used to specify run-time parameters. Available for both FORTRAN an
b*****l
发帖数: 9499
10
来自主题: Linux版 - OpenMP 求救。。。 (转载)
【 以下文字转载自 Thoughts 讨论区 】
发信人: bigsail (河马·旋木), 信区: Thoughts
标 题: OpenMP 求救。。。
发信站: BBS 未名空间站 (Sat Apr 30 02:18:47 2011, 美东)
在学 OpenMP,第一步就不通:设多线程失败。。。
TestOMP.cpp 的 code 很简单:开 5 个线程,每个介绍一下自己,就完事了.
#include
#include
using namespace std;
main () {
omp_set_num_threads(5);
cout << "Fork! " << endl;
#pragma omp parallel
{
// Obtain and print thread id
cout<< "Hello World from thread = " << omp_get_thread_num()
<< " of " << omp_get_num_threads() << endl;
... 阅读全帖
O*******d
发帖数: 20343
11
来自主题: Programming版 - A helloworld OpenMP question?
You have to activate OpenMP in your compiler. For Visual Studio 2008,
Project->"your project"->C/C++->Language->OpenMP Support
O*******d
发帖数: 20343
12
来自主题: Programming版 - A helloworld OpenMP question?
The default number of threads in OpenMP is the number of CPU on your
computer if you do not call omp_set_num_threads(). Of course you have to
activate OpenMP support of your compiler.
O*******d
发帖数: 20343
13
来自主题: Programming版 - A helloworld OpenMP question?
比较新的compiler一般都支持OpenMP。 但是可能需要激活,至少Visual Studio 2008
是这样的。 激活就是把compiler
支持OpenMP的功能调用起来。 如果不激活,不管你有几个CPU,就只有一个thread。你
call omp_set_num_threads()
在没有激活的compiler下是无效的,但也不会给错。 这是为了backward
compatibility.
y****n
发帖数: 15
14
来自主题: Programming版 - 一个OpenMP问题求教
有一个关于openmp的问题想请教各位大牛。原始程序(如A)需要分配一个临时数组再释
放。用OpenMP改成并行实现后(如B),不同线程不能共享这个数组,每个线程需要独立
分配这段内存。
如果在循环体内分配内存,那一共分配了nk=121次,效率很低。实际上如果存在4个线
程,只要在每个线程中分配一次就行了。不知道应该如何实现,请大牛们指点。
非常感谢。
-------------------------------------
Program A:
-------------------------------------
float* pfSdx = (float *) calloc( N );
for (int k = 0; k < nk; k++)
{
...
}
free( (float *) pfSdx );
-------------------------------------
Program B:
-------------------------------------
#pragma omp parallel for
for (int k = 0; k < ... 阅读全帖
x*****u
发帖数: 3419
15
来自主题: Computation版 - 1 Multi-Processing with OpenMP zz
http://www.linux-mag.com/2004-01/extreme_01.html
Linux Magazine / January 2004 / EXTREME LINUX
Multi-Processing with OpenMP
EXTREME LINUX
Multi-Processing with OpenMP
by Forrest Hoffman
In this column's previous discussions of parallel programming, the focus
has been on distributed memory parallelism, since most Linux clusters are b
est suited to this programming model. Nevertheless, today's clusters often c
ontain two or four (or more) processors per node. While one could simply sta
rt mult
b*****l
发帖数: 9499
16
来自主题: Thoughts版 - OpenMP 求救。。。
在学 OpenMP,第一步就不通:设多线程失败。。。
TestOMP.cpp 的 code 很简单:开 5 个线程,每个介绍一下自己,就完事了.
#include
#include
using namespace std;
main () {
omp_set_num_threads(5);
cout << "Fork! " << endl;
#pragma omp parallel
{
// Obtain and print thread id
cout<< "Hello World from thread = " << omp_get_thread_num()
<< " of " << omp_get_num_threads() << endl;
// Only master thread does this
if (omp_get_thread_num() == 0)
cout << "Master thread: number of threads = " <<
omp... 阅读全帖
x******n
发帖数: 9057
17
来自主题: Thoughts版 - OpenMP 求救。。。
晕死,头一回听说openmp
x*z
发帖数: 1010
18
Most MPI libraries have shared memory implemented, which actually has
less overhead than OpenMP or threading.
l******9
发帖数: 579
19
In MPI libraries with shared memory implemented, we have inter-process
communication or inter-thread communication ?
If it is former, why process has less overhead than thread ?
If it is later, why it has less overhead than openMP and threading?
Does MPI has some built-in advantages over them ?
Any help is really appreciated.
Thanks
Q*T
发帖数: 263
20
来自主题: Linux版 - OpenMP 求救。。。 (转载)
Enable OpenMP support when linking:
g++ -fopenmp -c -o TestOMP.o TestOMP.cpp
G*****7
发帖数: 1759
y****e
发帖数: 23939
22
来自主题: Programming版 - A helloworld OpenMP question?
这里没有人用过OpenMP的?
y****e
发帖数: 23939
23
来自主题: Programming版 - A helloworld OpenMP question?
谢谢你的回复。不过我还是有点不明白,我现在是在Linux里面用g++ compile的。
compile没有问
题,不知道你说的activate OpenMP是什么意思?
我的系统是intel dual core的,应该算两个processor吧。
而且我确实call了omp_set_num_threads()了呀。
但只起来了一个thread。

to
p******m
发帖数: 353
24
来自主题: Programming版 - A helloworld OpenMP question?
我尝试用intel 9 编译器在vc 6.0的环境里编译openmp, 但其中一个线程老是被重复
执行, 不知道为什么? 有谁遇到过类似的问题吗?
p******m
发帖数: 353
25
来自主题: Programming版 - OpenMP能编译产生DLL吗?
请问有没有人用过OpenMP?
能编译产生DLL吗? 被调用的DLL还有并行功能吗?
p******m
发帖数: 353
26
来自主题: Programming版 - OpenMP能编译产生DLL吗?
我尝试用intel 9 编译器在vc 6.0的环境里编译openmp, 但其中一个线程老是被重复
执行, 不知道为什么? 有谁遇到过类似的问题吗?
s*******e
发帖数: 664
27
☆─────────────────────────────────────☆
petersam (google) 于 (Fri Oct 2 16:06:00 2009, 美东) 提到:
我尝试用intel 9 编译器在vc 6.0的环境里编译openmp, 但其中一个线程老是被重复
执行, 不知道为什么? 有谁遇到过类似的问题吗?
☆─────────────────────────────────────☆
petersam (google) 于 (Fri Oct 2 16:36:24 2009, 美东) 提到:
以下是我的测试代码:
#include "stdio.h"
#include "omp.h"
int main(){
int i;
omp_set_num_threads(2);
#pragma omp parallel for
for(i = 0; i < 6; i++ )
printf("i = %d\n", i);
return 0;
}
☆─────────────────────────────────────
O*******d
发帖数: 20343
28
我个人比较喜欢OpenMP。 不需要加很多code,最简单的就只需要加一行, compiler就
可以自动把for loop平行。 线程的数目自动和你的CPU核的数目一致,每个核执行for
loop的不同index。 这些全都是自动的,不需要你操心。 你可以做data parallelism
和task parallelism.
m***x
发帖数: 492
29
Data parallel use openmp.
y**b
发帖数: 10166
30
来自主题: Programming版 - openmp并行计算疑问
更新一下,用了GCC quad-precision math libraray,初步结果显示openmp每次运行的
结果完全一致(在原来double输出精度的意义上),而原来double或long double在相同
运算下有明显偏差。
没有白折腾。感叹64位计算还没普及,128位计算已经颇有需求了,很多高精度库恐怕就
是例证。遗憾的是quadmath库目前很慢,我的计算显示大约慢30倍,够慢。

遍,
t****t
发帖数: 6806
31
我不懂fortran, 但是第一, 这种小事没必要搞什么openmp这么复杂, 你不就是要一次
开十七八个进程吗? shell就可以搞定了, 看你的程序本来就是shell的包装, 可是这包
装有什么用呢?
第二, 同时跑十七八个进程, 输入可以是同一个文件(但是注意不要exclusive open),
输出如果是同一个文件那就是自找麻烦. 看你的程序, 调用mymodel.exe的时候命令行
完全没有变化, 多半就是麻烦的根源了吧
O*******d
发帖数: 20343
32
为什么输入文件要用OpenMP?。 输入文件的瓶颈不在CPU,而在硬件IO。
y****n
发帖数: 15
33
下面这段程序使用openmp执行一个类似图像线性插值的算法。
输入为Z(图像),X(坐标),Y(坐标),输出为F(图像)
为了避免同时写入数组F的某个元素,使用了#pragma omp atomic
我遇到的问题是,当把线程数设为1和2时,运行程序会得到不同的结果。实在想不出问
题出在什么地方。肯请大牛们帮忙看一看。
#pragma omp parallel for
for (int n = 0; n < MN; n++)
{
double y = Y[n];
double x = X[n];
int fx = (int)floor(x);
int fy = (int)floor(y);

if (fx<1 || x>nw || fy<1 || y>nh) // image index is [1...nw]
{
for (int i = 0; i < ndim; i++)
{
#pragma omp atomic
F[n+i*MN] += Z... 阅读全帖
t****t
发帖数: 6806
34
不懂openmp, 但是浮点数支持atomic吗? I actually don't think so...
p***o
发帖数: 1252
35
纠结这个不如上TBB。再说难道openmp会笨到每次都重新建立新线程而不用线程池?
g****n
发帖数: 13
36
来自主题: Computation版 - OpenMP入门级问题
Hi
I am new to openMP. now I have some question about it.
I wrote a very simple program in C++.
#include
#include
main ()
{
int nthreads, tid;
int i;
omp_set_num_threads(2);
printf("Number of CPUS:%d\n",omp_get_num_procs());
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
if(tid==0)
{
printf("tid=%d thread = %d\n", 0,tid);
printf("there are %d threads\n",omp_get_num_threads
t*******t
发帖数: 1067
37
来自主题: Computation版 - Openmp问题请教
请问这里有人在用openmp吗?我有个弱问题请教,在下面这行程序里,如果我有很多变
量是
private,至少超过一行,请问怎么换行,谢谢
!$OMP PARALLEL DO SHARED(n,a), PRIVATE(i,j,k,su,....)
t******0
发帖数: 629
38
我在网上找到如下手册http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
编写出如下Hello World程序,在VC2012下跑。
#include
#include
#include // system("pause")
int main()
{
omp_set_num_threads(4);
# pragma omp parallel
{
int ID=omp_get_thread_num();
printf("Hello(%d)",ID);
printf("World(%d)n",ID);
}
system("pause"); //课件里没有这句
return(0); //课件里没有这句
}
运行结果就是:
Hello(0)World(0)
Press any key to continue...
说好的1,2,3都没看见了。。。请问我是哪里编... 阅读全帖
U***g
发帖数: 330
39
编译的时候没有加上openmp的flag
y**b
发帖数: 10166
40
mpi一直可以做shared memory计算,在一台机器的内存里面通讯,性能能不好吗。
用mpi比mpi+openmp性能还好,很多情况是这样的,我做的情况也是如此。但是不能排
除有些情况不是如此。
关键是,mpi从设计到完成比openmp复杂太多。一个项目,时间上很可能不允许做mpi(
没个半年设计、开发、调试、大规模测试很难搞定),但是openmp很简单,几天几周基
本都能搞定。
mpi一旦做好了,就不是openmp能比的了。openmp只能运行在一个节点或一台工作站上
,mpi就没这个限制了,几百几千个节点并行的威力没法比。
s******u
发帖数: 501
41
来自主题: Programming版 - intel knights landing 72core CPU 谁用过?
烂。OpenMP的scaling明显有问题,72核心280线程但是scaling能到50-60x就很不错了
。总而言之,OpenMP对海量线程的优化还是不行,sweet spot停留在8-32线程并行。也
许是kernel thread的模型决定了OpenMP thread的overhead太高,不像GPU那么
lightweight。MPI倒是能做的不错,但是要这么多的进程内存又不够。最大的优点是可
以直接用现有的x86代码(绝大多数已经支持MPI+OpenMP了),不用像GPU需要重新
fork出来写CUDA,然后maintain两套codebase
y**b
发帖数: 10166
42
来自主题: Programming版 - intel knights landing 72core CPU 谁用过?
有啥解释吗?
是总体上跟以下因素有关?
mpi靠手工分块(分区)决定计算粒度,这个常常就是一种优化;
而openmp靠机器决定计算粒度,通常太细而overhead太大。
还是跟编译器和底层硬件更有关系?
我做的一种密集颗粒碰撞模拟,也是mpi明显优于openmp,原计划在几千个
节点上采用hybrid mpi/openmp模式,最后发现还是pure mpi模式快得多,
跨五个数量级的模拟都给出同样结论。当然我这个模拟跟那些专门的测试
有所区别,毕竟有其它因素影响:比如有小量代码不适合openmp化,有些
地方加锁,算法还可进一步改进等等。
l******9
发帖数: 579
43
I am also thinking about openMP.
But, how to make sure that openMP take full use of available
cores ?
Suppose that I have 24 CPUs, each of them has 6 cores (each core
supports hyperthreading).
I have 10,000 computing tasks, each of them needs 0.001 second.
Some of the tasks need to exchange data, which is very small.
Which task needs to send/receive data to/from which task is pre-defined. It
is known before the program in run.
But, the exchange frequency may be very high.
I want to schedule task... 阅读全帖
y**b
发帖数: 10166
44
【 以下文字转载自 Linux 讨论区 】
发信人: yanb (大象,多移动一点点), 信区: Linux
标 题: 如何查看一个程序/进程使用了哪些cpu?
发信站: BBS 未名空间站 (Tue Sep 25 01:10:18 2007), 站内
该程序使用了MPI或OpenMP, 在一个有8个Intel Quad-core(也就是32个core)的
linux服务器上运行.请问有什么命令能看出这个程序使用了哪些cpu及占用率?
目的主要是想直接看看该程序是否真正利用上了MPI或OpenMP。比如OpenMP,
设置OMP_NUM_THREADS=4或8或16...皆能运行,但从处理器结构来看应该是4
才有实际意义,8、16、32究竟是怎么回事? 还有MPI,用下面命令运行
mpirun -np 8或16或32...究竟是否分配到不同cpu上面了?
c******n
发帖数: 16666
45
说来比较悲催 非cs专业,搞了个小程序跑模拟,数据量小的时候还好,数据量一大先
是内存挂了。后来跑去ec2租了个大内存服务器发现跑得还是很慢,仔细一看,有个
function算得特别慢,因为是n*n的复杂度,数据量上去了计算时间马上跳了等量级上
升。自己又是一知半解的,不知道哪位能帮着改进下算法然后提示下OpenMP该怎么做。
简而言之,是个关于水文的模拟,计算流域面积,所以数据的基本单位/对象就是node
。 有两个linked-list(求别吐槽用这个而不用vector,摊子摊太大了 改起来不容易
,或者如果我现在添加一个vector,复制现有list行不?)里面存的都是node之间的指
针。
第一个linked-list存的所有node的指针,按照node的ID存放,方便遍历所有node
第二个linked-list,其实不止一个,存的是所有在当前node的下游的node的指针,遍
历的话可以从当前node一直走到当前mesh的边界
流域面积的具体计算,就是当前node自己的面积加上其所有上有点面积的总和
比如在下图中,
a b c d e
... 阅读全帖
W***o
发帖数: 6519
46
来自主题: Programming版 - 求救
try:
gcc -fopenmp -lpthread xxx.cpp
openMP 的东西最好还是在LINUX环境下整方便点,而且有的LINUX还没有 openMP, MPI
library
上次用openMP, MPI 整多线程 的 synch/barrier lock发现 ubuntu 就没有这两个
library

cannot
k**********g
发帖数: 989
47
来自主题: Programming版 - 求助个dll调用的问题

Step into the disassembly. Or use a CPU instruction profiler like AMD
CodeAnalyst or Intel VTune.
If this 0.5 second delay only occurs on the first call after application
launch, I think this is an inevitable cost for using OpenMP. If it happens
on every call then there is a need to investigate.
With the debugger attached, check how many OpenMP threads are created. Also
make sure the EXE and DLL are linking against the correct OpenMP library.
w***g
发帖数: 5958
48
来自主题: Programming版 - intel knights landing 72core CPU 谁用过?
你有benchmark吗? 你这么说我很涨见识. 我见过的几个, openblas有openmp或者
thread版,
opencv用tbb, fftw用openmp, 还没见过哪个单机跑的轮子用MPI的. 你没有用32MPI我
觉得
就是一个证据, 就是MPI还做不到底. 但是即使是4x8或8x4能把OpenMP干掉我觉得也很
牛.
t*****z
发帖数: 812
49
假设稀疏矩阵用CRS方式存储,为什么我的openmp并行不好?
#pragma omp parallel for private(i,j,t)
for(i=0; i t = 0.0;
for(j=A.ptr[i];j t += A.value[j] * x[A.index[j]];
y[i] = t;
}
n=400,000. 2,4,8threads 运行的时间差不多,比1thread w/ openmp快,根1thread w
/o openmp差不错
做iterative solver 大家出出点子?
z*******h
发帖数: 346
50
也许是我孤陋寡闻了,我怎么没听说过在Hadoop cluster上用openMP or MPI的。MPI根
本就不可能用,openMP也没必要啊。
1 2 3 4 5 6 7 8 下页 末页 (共8页)