a********r 发帖数: 218 | 1 There are many large files, how to fast find the phone numbers from these
files? Phone number has two formats:
0123456789
012 345 6789
He gives examples of file in google document:
File 1:
From: s***[email protected]
To: b*[email protected]
Subject: Request #12365712
Date: Fri, 2 Sep 2011 10:34:83
Hi Bob
You can contact our sales department on 9379273429 or our marketing
department on 038 739 7392.
Regards
Jenny
File 2:
ACME consulting
Contact information:
Enquires: 187 834 9382
In examples above:
How do you find phone numbers: 9379273429, 038 739 7392, 187 834 9382?
请大牛指点一下吧, 多谢!!! | g*********g 发帖数: 114 | 2 1.Write a small project
Open folder and Open every file.
You can think and set all the conditions about phone number.
In this case, i would like to set phone number format as follows.
FIRST OF ALL, DELTE SPACE, THUS , USE REPLACE method.
Then , replace all the '-', then replace all the '(', then replace all the '
)',
Okay, then use indexof , might be instr, to find 0123456789.
And if found, keep folder path and file name to save into file by using
streamwriter class.
2. the file looks like EMAIL;
Thus, you should can import into outlook. And using outlook's search
function, search title, content, from, to . then find all the files
including this phone number. | c****p 发帖数: 6474 | 3 用个状态机就行了吧。。
【在 a********r 的大作中提到】 : There are many large files, how to fast find the phone numbers from these : files? Phone number has two formats: : 0123456789 : 012 345 6789 : He gives examples of file in google document: : File 1: : From: s***[email protected] : To: b*[email protected] : Subject: Request #12365712 : Date: Fri, 2 Sep 2011 10:34:83
| a********r 发帖数: 218 | 4 能展开讲讲吗?多谢了
【在 c****p 的大作中提到】 : 用个状态机就行了吧。。
| a********r 发帖数: 218 | 5 Thanks for your reply.
I think he just want to filter out the phone number from other text in the
file instead of seaching a specific phone number. The file format could be
any type, not limited to email or html ...
'
【在 g*********g 的大作中提到】 : 1.Write a small project : Open folder and Open every file. : You can think and set all the conditions about phone number. : In this case, i would like to set phone number format as follows. : FIRST OF ALL, DELTE SPACE, THUS , USE REPLACE method. : Then , replace all the '-', then replace all the '(', then replace all the ' : )', : Okay, then use indexof , might be instr, to find 0123456789. : And if found, keep folder path and file name to save into file by using : streamwriter class.
| g*****x 发帖数: 799 | 6 regular expression will do the trick
grep "(\d{10})|(\d{3}[ ]\d{3}[ ]\d{4})" /files | S**********n 发帖数: 250 | 7 我也支持状态机的解法。
就是画一个DFA(Deterministic Finite Automata)的图,然后状态机的代码就自然写
出来了。
几乎一模一样题,我以前学compiler的时候,是第一个星期的家庭作业:三种类型的电
话号码
1234567890
123-456-7890
(123)456-7890
不过当时跟你这个有个不同的是,我们没要求考虑“很多”的“大”的文件。不知道你
是不是还要考虑这方面。
【在 a********r 的大作中提到】 : 能展开讲讲吗?多谢了
|
|