由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - 如何快速处理上万个文件?
相关主题
web server怎么练习java multithread
Java NIO 问题求教Can Java thread return a value?
Re: Help!!! Java in Unix. Thread. I need exit and ls concurrently runnJAVA 面世题.
求推荐的java多线程教程菜鸟问关于Java Programming的问题
怎么学multithreading/concurrency?〔请勿置顶〕Java/Flex工作职位 (转载)
关于java的疑惑 (转载)C# is light-years ahead of Java now (转载)
求问如何学习multithreadingC# is light-years ahead of Java now (转载)
java concurrency (转载)最近node.js real time web 很火
相关话题的讨论汇总
话题: database话题: db话题: io话题: thread
进入Java版参与讨论
1 (共1页)
c*******t
发帖数: 32
1
程序要求能快速的从硬盘读取上万个文件,
并把文件内容放在database中。
如果用循环逐个打开文件读就太慢了。不知有什么好办法?
多谢。
xt
发帖数: 17532
2

既然这样太慢那就没有办法了.我想不出有更快的办法

【在 c*******t 的大作中提到】
: 程序要求能快速的从硬盘读取上万个文件,
: 并把文件内容放在database中。
: 如果用循环逐个打开文件读就太慢了。不知有什么好办法?
: 多谢。

c*******t
发帖数: 32
3
用多线程会不会快一点?

【在 xt 的大作中提到】
:
: 既然这样太慢那就没有办法了.我想不出有更快的办法

xt
发帖数: 17532
4

可能吧.不好说

【在 c*******t 的大作中提到】
: 用多线程会不会快一点?
e***g
发帖数: 158
5
not likely, this is IO bound.

【在 xt 的大作中提到】
:
: 可能吧.不好说

m******t
发帖数: 2416
6
If his "database" doesn't happen to be another file located
on the same harddrive, I think multithread would improve the
performance for a lot. It would take some experiment to find
an optimal number of "worker thread" though.

【在 e***g 的大作中提到】
: not likely, this is IO bound.
r*****s
发帖数: 985
7
the bottleneck here is the HD and file system. Even if you
read the files sequentially, it won't be much different from
the multithread solutions, as it is IO bound only. Multithread
works only if it is IO+CPU bound.
therefore, you might need high performance file system, such as
IBM GPFS ...

【在 m******t 的大作中提到】
: If his "database" doesn't happen to be another file located
: on the same harddrive, I think multithread would improve the
: performance for a lot. It would take some experiment to find
: an optimal number of "worker thread" though.

xt
发帖数: 17532
8

A SCSI will be good enough to handle that

【在 r*****s 的大作中提到】
: the bottleneck here is the HD and file system. Even if you
: read the files sequentially, it won't be much different from
: the multithread solutions, as it is IO bound only. Multithread
: works only if it is IO+CPU bound.
: therefore, you might need high performance file system, such as
: IBM GPFS ...

e***g
发帖数: 158
9
in that case, typical producer/consumer, 2 thread should be enough
with a queue in between. more threads writing to database will cause
unnecessary concurrency control to already busy database.

【在 m******t 的大作中提到】
: If his "database" doesn't happen to be another file located
: on the same harddrive, I think multithread would improve the
: performance for a lot. It would take some experiment to find
: an optimal number of "worker thread" though.

m******t
发帖数: 2416
10

It's not the only bottleneck. Another potential bottleneck would
be the DB+network roundtrip. A multi-thread design would allow
the application to do DB and local I/O concurrently (again, assuming
the DB is not local).
Also, before knowing more about the details of the OP application,
it's not unusual that some processing does happen to the data
once it's read into the memory. A multi-thread design would also
allow the application to improve its CPU utlization in this case.

【在 r*****s 的大作中提到】
: the bottleneck here is the HD and file system. Even if you
: read the files sequentially, it won't be much different from
: the multithread solutions, as it is IO bound only. Multithread
: works only if it is IO+CPU bound.
: therefore, you might need high performance file system, such as
: IBM GPFS ...

m******t
发帖数: 2416
11

Well it depends. If the data is written to different tables,
or different pages in the same table, most modern database
products have very sophisticated concurrency support to avoid resource
competing.

【在 e***g 的大作中提到】
: in that case, typical producer/consumer, 2 thread should be enough
: with a queue in between. more threads writing to database will cause
: unnecessary concurrency control to already busy database.

c********e
发帖数: 383
12

1st on the server side, trust your database and let it to optimization
2nd, on the client(ur)side, if networking is really the bottle next,
asynchronous handling could be a good measure.

【在 m******t 的大作中提到】
:
: Well it depends. If the data is written to different tables,
: or different pages in the same table, most modern database
: products have very sophisticated concurrency support to avoid resource
: competing.

1 (共1页)
进入Java版参与讨论
相关主题
最近node.js real time web 很火怎么学multithreading/concurrency?
向能人请教关于java的疑惑 (转载)
java SOAP比restful难学吗?求问如何学习multithreading
High Availability DB questionjava concurrency (转载)
web server怎么练习java multithread
Java NIO 问题求教Can Java thread return a value?
Re: Help!!! Java in Unix. Thread. I need exit and ls concurrently runnJAVA 面世题.
求推荐的java多线程教程菜鸟问关于Java Programming的问题
相关话题的讨论汇总
话题: database话题: db话题: io话题: thread