由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - [转载] a question on XML parser
相关主题
Java XML parser的问题Error handling in Jelly, SwixML and others
请问关于 Java XML processing[ZT] choose your java xml parser
Java处理XML两种方式有什么不同
哪位能大概介绍一下XML的那些APIhibernate question? session and hibernate.cfg.xml
dtd questionJSF is actually fucking cool
CookXml 简介How to delete an entry in JAR?
xml to PDF的package大家用什么 HTML Parser?
help! XML parse problemRe: [转载] Re: why do u use XML?
相关话题的讨论汇总
话题: xml话题: parser话题: ncbi话题: file
进入Java版参与讨论
1 (共1页)
l***r
发帖数: 459
1
【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (You know what!), 信区: Programming
标 题: a question on XML parser
发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
Greetings,
I have several "<.." in one file. Right now, I first divide this
file to many string. Each string is one xml record. Then, I use Java SAX
parser to parse it. It turns out that it performs very slowly on dividing and
parsing. Is there any better way? like parsing all records in this file in one
time?
Tha
x***n
发帖数: 39
2
my guess is ur file operation takes long time.
post the snippet where you chop down the files into pieces.

【在 l***r 的大作中提到】
: 【 以下文字转载自 Programming 讨论区,原文如下 】
: 发信人: laoer (You know what!), 信区: Programming
: 标 题: a question on XML parser
: 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
: Greetings,
: I have several "<.." in one file. Right now, I first divide this
: file to many string. Each string is one xml record. Then, I use Java SAX
: parser to parse it. It turns out that it performs very slowly on dividing and
: parsing. Is there any better way? like parsing all records in this file in one
: time?

z****g
发帖数: 2497
3
why do you dividing the file to string?
SAX parser is a progressing parser, go line by line.
If you use JDOM, it will read all the file in.
For large file, SAX Parser performs better than DOM
parser.

【在 l***r 的大作中提到】
: 【 以下文字转载自 Programming 讨论区,原文如下 】
: 发信人: laoer (You know what!), 信区: Programming
: 标 题: a question on XML parser
: 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
: Greetings,
: I have several "<.." in one file. Right now, I first divide this
: file to many string. Each string is one xml record. Then, I use Java SAX
: parser to parse it. It turns out that it performs very slowly on dividing and
: parsing. Is there any better way? like parsing all records in this file in one
: time?

l***r
发帖数: 459
4
Sorry, I guess I didn't describe the problem clearly. The xml file looks like
this:

"NCBI_BlastOutput.dtd">

...


"NCBI_BlastOutput.dtd">

...


"NCBI_BlastOutput.dtd">

...


【在 z****g 的大作中提到】
: why do you dividing the file to string?
: SAX parser is a progressing parser, go line by line.
: If you use JDOM, it will read all the file in.
: For large file, SAX Parser performs better than DOM
: parser.

w*r
发帖数: 2421
5
okey, your xml file is not well formated for parsing. My suggestion is
that you can write a class to get rid of the all document head at the first
place and put all record well-formated [Cin one file(or stream).
Then what you need to do is just write a xslt to transform the
xml to whatever the format you want and parse it into your application.

【在 l***r 的大作中提到】
: Sorry, I guess I didn't describe the problem clearly. The xml file looks like
: this:
:
: : "NCBI_BlastOutput.dtd">
:
: ...
:

:
:
z****g
发帖数: 2497
6
重新读一下SAX parser的sample code.
你的理解是错误的。
SAX parser是循序解读每个element.
另外, 你的xml doc好像有些问题
XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
应该有一个啊。
l***r
发帖数: 459
7

Really? what's my mistake?

It should be no problem because this is created by commercial program. And, my
SAX parser works for this format.
like
should
XML
parser

【在 z****g 的大作中提到】
: 重新读一下SAX parser的sample code.
: 你的理解是错误的。
: SAX parser是循序解读每个element.
: 另外, 你的xml doc好像有些问题
: XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
: 应该有一个啊。

z****g
发帖数: 2497
8
不是一个chunk 一个chunk读的, 是
按顺序,或者说一行一行的读的。
DOM才是整个文件送进去。
明白?

【在 l***r 的大作中提到】
:
: Really? what's my mistake?
:
: It should be no problem because this is created by commercial program. And, my
: SAX parser works for this format.
: like
: should
: XML
: parser

x***n
发帖数: 39
9
1. chop ur monolithic(?) file (collection of xmls) into collection of
xml files, parse one by one
2. find a fast way to feel an xml document (part of the file) to a parser,
then the second parsing for the second xml DOCUMENT (unfortunately
it's the second part of ur physical file), and so on.
1 or 2.

【在 z****g 的大作中提到】
: 不是一个chunk 一个chunk读的, 是
: 按顺序,或者说一行一行的读的。
: DOM才是整个文件送进去。
: 明白?

w*r
发帖数: 2421
10
man, you got no other choice, your xml doc has multiple xml declaration
header,
what can you expect from the parser ? magic? NO! all you can do is to
design your own 'feeder' to the parser, skip the declare part and feed the
record to the parser. Both 1 and 2 will work, it just depends how big your
file is, if its millions millions record, i suggest 2 if small number of
records, 1 is okey.

And, my

【在 x***n 的大作中提到】
: 1. chop ur monolithic(?) file (collection of xmls) into collection of
: xml files, parse one by one
: 2. find a fast way to feel an xml document (part of the file) to a parser,
: then the second parsing for the second xml DOCUMENT (unfortunately
: it's the second part of ur physical file), and so on.
: 1 or 2.

x***n
发帖数: 39
11
do u expect him to do a big project?
//btw, not ME does this project.

【在 w*r 的大作中提到】
: man, you got no other choice, your xml doc has multiple xml declaration
: header,
: what can you expect from the parser ? magic? NO! all you can do is to
: design your own 'feeder' to the parser, skip the declare part and feed the
: record to the parser. Both 1 and 2 will work, it just depends how big your
: file is, if its millions millions record, i suggest 2 if small number of
: records, 1 is okey.
:
: And, my

1 (共1页)
进入Java版参与讨论
相关主题
Re: [转载] Re: why do u use XML?dtd question
JavaCC/SableCC/otherCC ?CookXml 简介
Configuration File reading.. xml?xml to PDF的package
Java HTMLEditorKit 中取得html的text?help! XML parse problem
Java XML parser的问题Error handling in Jelly, SwixML and others
请问关于 Java XML processing[ZT] choose your java xml parser
Java处理XML两种方式有什么不同
哪位能大概介绍一下XML的那些APIhibernate question? session and hibernate.cfg.xml
相关话题的讨论汇总
话题: xml话题: parser话题: ncbi话题: file