由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
THU版 - parsing bibliography and sorting
相关主题
水木WWW有问题了?最近面试碰到的题目
这里有linkedin的THU oversea群的管理员么这题怎么解好?
parsing bibliography and sorting (转载)NLP problem - How to distinguish these two kinds of texts??
职位和 candidate 数量的关系面试要求解方程
一个搞统计的对C#的第一印象求讨论一道SYSTEM DESIGN题,CC10.1
Question about tab问java api的问题 (转载)
segmentation fault as soon as entering 1 function in the ar (转载)ChineseWeb 偷偷用Proxy?
segmentation fault as soon as entering 1 function in the arm processor boardHow to Parsing function in Haskell?
相关话题的讨论汇总
话题: author话题: lastname话题: my话题: candidates
进入THU版参与讨论
1 (共1页)
c******n
发帖数: 4965
1
my wife was spending a lot of time sorting the bibliography of her thesis,
because her bib was obtained in plain text form, I have to parse out the
first author last name first. so I wrote this little piece of code. hope it
will be useful for someone too....
right now it fails to parse single-author bib, cuz it's difficult to
recognize a human name from other words. but for biology papers, a paper
mostly has multiple authors
sub get_first_author($) {
my ($line) = @_;
my ($author, $second_possible, $remaining ) = split /,|and|\d/, $line ,3;
my $lastname = find_author_lastname($author);
my $second_possible_lastname = find_author_lastname($second_possible);
return $lastname ne ''? $lastname:$second_possible_lastname;
}
sub find_author_lastname($) {
my ($author) = @_;
my @segments = split /[, ]+/, $author;
my @candidates_for_last = ();
foreach my $s (@segments) {
if ( uc($s) eq $s ) { next;} # all upper case
if ( length($s) == 1 ) { next;} # only a single letter
if ( $s =~ /^([:alpha:]\.)+$/ ) { next;} # A.B.C. pattern
push @candidates_for_last, $s;
}
@candidates_for_last = sort {length($b) - length($a)} @candidates_for_
last;
return $candidates_for_last[0];
}
print join "", map {$_->[1]} sort { $a->[0] cmp $b->[0] } map { [get_first_
author($_) , $_ ] } <>;
1 (共1页)
进入THU版参与讨论
相关主题
How to Parsing function in Haskell?一个搞统计的对C#的第一印象
java and javascript 问题请教,有包子Question about tab
下载apk安装失败:A problem parsing the package?segmentation fault as soon as entering 1 function in the ar (转载)
How to Parsing function in haskell?segmentation fault as soon as entering 1 function in the arm processor board
水木WWW有问题了?最近面试碰到的题目
这里有linkedin的THU oversea群的管理员么这题怎么解好?
parsing bibliography and sorting (转载)NLP problem - How to distinguish these two kinds of texts??
职位和 candidate 数量的关系面试要求解方程
相关话题的讨论汇总
话题: author话题: lastname话题: my话题: candidates