在我的 .bib 檔案中尋找重複的文章標題

Question 1

您可以使用perl遍歷 bib 文件，將所有標題保存為哈希鍵，以其行作為哈希值，然後循環遍歷它並列印標題（如果其值有多個條目）。為此，請建立一個包含以下內容的文件，例如“finddupls.pl”，變更 bib 檔案名，然後perl finddupls.pl在終端機中執行：

#!perl
my %seen = ();

my $line = 0;
open my $B, 'file.bib';
while (<$B>) {
    $line++;
    # remove all non-alphanumeric characters, because bibtex could have " or { to encapsulate strings etc
    s/[^a-zA-Z0-9 _-]//ig; 
    # lower-case everything to be case-insensitive
    # pattern matches lines which start with title
    $seen{lc($1)} .= "$line," if /^\s*title\s*(.+)$/i;
}
close $B;

# loop through the title and count the number of lines found
foreach my $title (keys %seen) {
    # count number of elements seperated by comma
    my $num = $seen{$title} =~ tr/,//;
    print "title '$title' found $num times, lines: ".$seen{$title},"\n" if $num > 1;
}

# write sorted list into file
open my $S, '>sorted_titles.txt';
print $S join("\n", sort keys %seen);
close $S;

它直接在終端機中返回如下內容：

title 'observation on soil moisture of irrigation cropland by cosmic-ray probe' found 2 times, lines: 99,1350,
title 'multiscale and multivariate evaluation of water fluxes and states over european river basins' found 2 times, lines: 199,1820,
title 'calibration of a non-invasive cosmic-ray probe for wide area snow water equivalent measurement' found 2 times, lines: 5,32,

它還會編寫一個文件sorted_titles.txt，列出按字母順序排列的所有標題，您可以手動瀏覽並檢測重複項。

Answer

您可以使用perl遍歷 bib 文件，將所有標題保存為哈希鍵，以其行作為哈希值，然後循環遍歷它並列印標題（如果其值有多個條目）。為此，請建立一個包含以下內容的文件，例如“finddupls.pl”，變更 bib 檔案名，然後perl finddupls.pl在終端機中執行：

#!perl
my %seen = ();

my $line = 0;
open my $B, 'file.bib';
while (<$B>) {
    $line++;
    # remove all non-alphanumeric characters, because bibtex could have " or { to encapsulate strings etc
    s/[^a-zA-Z0-9 _-]//ig; 
    # lower-case everything to be case-insensitive
    # pattern matches lines which start with title
    $seen{lc($1)} .= "$line," if /^\s*title\s*(.+)$/i;
}
close $B;

# loop through the title and count the number of lines found
foreach my $title (keys %seen) {
    # count number of elements seperated by comma
    my $num = $seen{$title} =~ tr/,//;
    print "title '$title' found $num times, lines: ".$seen{$title},"\n" if $num > 1;
}

# write sorted list into file
open my $S, '>sorted_titles.txt';
print $S join("\n", sort keys %seen);
close $S;

它直接在終端機中返回如下內容：

title 'observation on soil moisture of irrigation cropland by cosmic-ray probe' found 2 times, lines: 99,1350,
title 'multiscale and multivariate evaluation of water fluxes and states over european river basins' found 2 times, lines: 199,1820,
title 'calibration of a non-invasive cosmic-ray probe for wide area snow water equivalent measurement' found 2 times, lines: 5,32,

它還會編寫一個文件sorted_titles.txt，列出按字母順序排列的所有標題，您可以手動瀏覽並檢測重複項。

Question 2

如果您可以相信該title欄位是相同的，那麼非常簡單：

grep -n 'title =' bibliography.bib | uniq -cdf 1

這將只列印非唯一行 ( ) 及其在文件中-d出現的次數 ( ) 以及它們在參考書目文件中出現的行號 ( )；告訴忽略第一個字段，即該行號。-cbibliography.bib-n-f 1uniq

所以如果你得到這樣一行：

     2 733:  title =    {Ethica Nicomachea},

您知道您有兩次出現，title = {Ethica Nicomachea},其中第一個出現在.bib文件的第 733 行。

Answer

如果您可以相信該title欄位是相同的，那麼非常簡單：

grep -n 'title =' bibliography.bib | uniq -cdf 1

這將只列印非唯一行 ( ) 及其在文件中-d出現的次數 ( ) 以及它們在參考書目文件中出現的行號 ( )；告訴忽略第一個字段，即該行號。-cbibliography.bib-n-f 1uniq

所以如果你得到這樣一行：

     2 733:  title =    {Ethica Nicomachea},

您知道您有兩次出現，title = {Ethica Nicomachea},其中第一個出現在.bib文件的第 733 行。

在我的 .bib 檔案中尋找重複的文章標題

答案1

答案2

相關內容