b****t 发帖数: 114 | 1 Hi All,
I did a quick search, but no luck.
Say, I use qsub to submit multiple jobs to run on a cluster. Each job on a
computing node will generate a folder and copy a file to this folder. That
means its possible that multiple computing nodes will copy the same file to
different folders. (notice that all the folders on a shared file system.)
Do you think that would be problematic? I believe, occasionally, there was
crashes that the copied file was corrupted and became empty. If the problem
is because of copying from different folders, how can I fix this problem?
Thanks very much,
Beet | m********5 发帖数: 17667 | 2 你的问题描述太不清楚了
请试图用一段伪码来表述你的问题
to
problem
【在 b****t 的大作中提到】 : Hi All, : I did a quick search, but no luck. : Say, I use qsub to submit multiple jobs to run on a cluster. Each job on a : computing node will generate a folder and copy a file to this folder. That : means its possible that multiple computing nodes will copy the same file to : different folders. (notice that all the folders on a shared file system.) : Do you think that would be problematic? I believe, occasionally, there was : crashes that the copied file was corrupted and became empty. If the problem : is because of copying from different folders, how can I fix this problem? : Thanks very much,
| M*****r 发帖数: 1536 | 3 1. the code doesn't check if the directory/file already exists?
2. any log file for all the mkdir/cp/scp operations?
to
problem
【在 b****t 的大作中提到】 : Hi All, : I did a quick search, but no luck. : Say, I use qsub to submit multiple jobs to run on a cluster. Each job on a : computing node will generate a folder and copy a file to this folder. That : means its possible that multiple computing nodes will copy the same file to : different folders. (notice that all the folders on a shared file system.) : Do you think that would be problematic? I believe, occasionally, there was : crashes that the copied file was corrupted and became empty. If the problem : is because of copying from different folders, how can I fix this problem? : Thanks very much,
| b****t 发帖数: 114 | 4
Sorry for the confusion. But the question is kind of general.
Say,
I have one file fieldinput.DATA in the current folder.
Now I have 20 parallel jobs, in job i, first is to create a folder
1. mkdir job$i
2. cp fieldinput.DATA job$i
So for these 20 jobs on 20 computing nodes, its possible that multiple jobs
are copying the same fieldinput.DATA to their corresponding folder job$i.
Would that be problematic? Possible to crash the file fieldinput.DATA?
Thanks again for your comments.
Beet
【在 m********5 的大作中提到】 : 你的问题描述太不清楚了 : 请试图用一段伪码来表述你的问题 : : to : problem
| l*******G 发帖数: 1191 | 5 How big is the file? If it is more than 1G and u r using NFS, you may have
problem. Lustre file system is better | l*******G 发帖数: 1191 | 6 How big is the file? If it is more than 1G and u r using NFS, you may have
problem. Lustre file system is better | b****t 发帖数: 114 | 7
small files. Thanks.
I think this could be a very frequently asked question, but surprisingly no
quick answers on the net.
So a general question:
Would it be possibly a problem if A FILE was copied at the SAME TIME by
multiple applications to different places in linux?
Regards,
Beet
【在 l*******G 的大作中提到】 : How big is the file? If it is more than 1G and u r using NFS, you may have : problem. Lustre file system is better
| l*******G 发帖数: 1191 | 8 Because disk is a serial device, same file can be accessed by one process at
a given time.
If you have a lock and unlock mechanism, then you will not have problem.
Instead of copying, why not just make a link? | b****t 发帖数: 114 | 9
at
Thanks for your suggestion. The file linked or copied will be read
simultaneously by the parallel jobs. Same issue.
Beet
【在 l*******G 的大作中提到】 : Because disk is a serial device, same file can be accessed by one process at : a given time. : If you have a lock and unlock mechanism, then you will not have problem. : Instead of copying, why not just make a link?
| w*******g 发帖数: 205 | 10 agree
at
【在 l*******G 的大作中提到】 : Because disk is a serial device, same file can be accessed by one process at : a given time. : If you have a lock and unlock mechanism, then you will not have problem. : Instead of copying, why not just make a link?
|
|