l******9 发帖数: 579 | 1 【 以下文字转载自 JobHunting 讨论区 】
发信人: light009 (light009), 信区: JobHunting
标 题: error of couting total line number in txt file on MSDOS
发信站: BBS 未名空间站 (Thu Nov 20 18:34:45 2014, 美东)
I would like to find the total line number of a text file ( > 60 GB) in MS-
DOS.
I used:
findstr /R /N "^" file.txt | find /C ":"
But, the returned result is a negative number.
It is overflow ?
The file have not more than 5 billion lines.
For an integer (4 Bytes), its max range is From −2,147,483,648 to 2,
147,483,647.
So, I need to design a script to count the number by dividing the result
with 1000 ?
If yes, please help me with how to design the script in MS DOS.
Thanks |
S*A 发帖数: 7142 | 2 Linux has "wc -l"
That might just work out of the box. |
S*A 发帖数: 7142 | |
l******9 发帖数: 579 | 4 I used "wc -l" but I got a wrong number for a large file (120 GB).
The result is a positive number but it is wrong.
Any help would be appreciated.
【在 S*A 的大作中提到】 : Wc 源码在这里。 : https://www.gnu.org/software/cflow/manual/html_node/Source-of-wc-command. : html : 是 unsigned long。所以64位编译就是 : 64位的counter了。
|
w********m 发帖数: 1137 | 5 用python吧
空间O(1),时间O(n)
cnt = 0
with open('file.txt', 'r') as infile:
for _ in infile:
cnt += 1
print cnt
空间O(n), 时间O(n/k)
import pyspark
sc = pyspark.SparkContext()
infile = sc.textFile('file.txt')
print infile.count() |
n*****t 发帖数: 22014 | 6 怎么个错法?是不是超过 2^16?
【在 l******9 的大作中提到】 : I used "wc -l" but I got a wrong number for a large file (120 GB). : The result is a positive number but it is wrong. : Any help would be appreciated.
|
l******9 发帖数: 579 | 7 I am not sure,
I used cygwin to access the server where the file is located by SSH.
Then, I ran "wc -l" to get the wrong number, but it is not overflow because
it is positive not negative.
【在 n*****t 的大作中提到】 : 怎么个错法?是不是超过 2^16?
|
l******9 发帖数: 579 | 8 I am not allowed to install python on the server.
I can only access the file remotely. This will make the time very long for a
large file 120 GB.
【在 w********m 的大作中提到】 : 用python吧 : 空间O(1),时间O(n) : cnt = 0 : with open('file.txt', 'r') as infile: : for _ in infile: : cnt += 1 : print cnt : 空间O(n), 时间O(n/k) : import pyspark : sc = pyspark.SparkContext()
|
S*A 发帖数: 7142 | 9 cygwin 是 32 位的,64 位的只有 alpha,跑起来问题很多。
你需要用 64 位的 wc。
你可以实验一下 mingw 64 位,那个如果包含 wc
应该就是64的。 |