如何從路徑/文件清單中尋找公共路徑

Question 1

這個答案使用Python。由於OP想要刪除其父母覆蓋的目錄（我認為這是一種可能性），我開始編寫一個不同的程式來刪除覆蓋物：

例子：

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file1' | removecoverings 
/home/phil
/home/dave

命令代碼removecoverings：

#!/usr/bin/env python2

import sys

def list_startswith(a, b):
    if not len(a) >= len(b):
        return False
    return all(x == y for x,y in zip(a[:len(b)],b))

def removecoverings(it):
    g = list(it)
    g.sort(key=lambda v: len(v.split('/')), reverse=True)
    o = []
    while g:
        c = g.pop()
        d = []
        for v in g:
            if list_startswith(v.split('/'), c.split('/')):
                d.append(v)
        for v in d:
            g.remove(v)
        o.append(c)
    return o

for o in removecoverings(l.strip() for l in sys.stdin.readlines()):
    print o

這個答案使用Python。它還使用組件方式而不是字串方式的公共前綴。對於路徑來說，最好將和用作公共前綴/ex/ample，而/exa/mple不是。這假設需要的是最大的公共前綴，而不是刪除了覆蓋的前綴列表。如果你有並期望而不是。這不是您要尋找的答案。//ex/home/dave /home/dave/file1 /home/phil /home/phil/file2/home/dave /home/phil/home

例子：

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | commonprefix 
/home/dave

命令代碼commonprefix：

#!/usr/bin/env python2

import sys

def commonprefix(l):
    # this unlike the os.path.commonprefix version
    # always returns path prefixes as it compares
    # path component wise
    cp = []
    ls = [p.split('/') for p in l]
    ml = min( len(p) for p in ls )

    for i in range(ml):

        s = set( p[i] for p in ls )         
        if len(s) != 1:
            break

        cp.append(s.pop())

    return '/'.join(cp)

print commonprefix(l.strip() for l in sys.stdin.readlines())

Answer

這個答案使用Python。由於OP想要刪除其父母覆蓋的目錄（我認為這是一種可能性），我開始編寫一個不同的程式來刪除覆蓋物：

例子：

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file1' | removecoverings 
/home/phil
/home/dave

命令代碼removecoverings：

#!/usr/bin/env python2

import sys

def list_startswith(a, b):
    if not len(a) >= len(b):
        return False
    return all(x == y for x,y in zip(a[:len(b)],b))

def removecoverings(it):
    g = list(it)
    g.sort(key=lambda v: len(v.split('/')), reverse=True)
    o = []
    while g:
        c = g.pop()
        d = []
        for v in g:
            if list_startswith(v.split('/'), c.split('/')):
                d.append(v)
        for v in d:
            g.remove(v)
        o.append(c)
    return o

for o in removecoverings(l.strip() for l in sys.stdin.readlines()):
    print o

這個答案使用Python。它還使用組件方式而不是字串方式的公共前綴。對於路徑來說，最好將和用作公共前綴/ex/ample，而/exa/mple不是。這假設需要的是最大的公共前綴，而不是刪除了覆蓋的前綴列表。如果你有並期望而不是。這不是您要尋找的答案。//ex/home/dave /home/dave/file1 /home/phil /home/phil/file2/home/dave /home/phil/home

例子：

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | commonprefix 
/home/dave

命令代碼commonprefix：

#!/usr/bin/env python2

import sys

def commonprefix(l):
    # this unlike the os.path.commonprefix version
    # always returns path prefixes as it compares
    # path component wise
    cp = []
    ls = [p.split('/') for p in l]
    ml = min( len(p) for p in ls )

    for i in range(ml):

        s = set( p[i] for p in ls )         
        if len(s) != 1:
            break

        cp.append(s.pop())

    return '/'.join(cp)

print commonprefix(l.strip() for l in sys.stdin.readlines())

Question 2

假設輸入已排序，偽代碼將是：

$seen = last_line;
if current_line begins exactly as $seen then next
else { output current_line; $seen = current_line }

翻譯成 Perl 程式碼（是的 Perl，最漂亮的腳本語言）：

perl -e '
my $l = "\n";
while (<>) {
    if ($_ !~ /^\Q$l/) {
        print;
        chomp;
        $l = $_;
    }
}
'

信用：Ben Bacarisse @bsb.me.uk，來自 comp.lang.perl.misc。謝謝本，效果很好！

Answer

假設輸入已排序，偽代碼將是：

$seen = last_line;
if current_line begins exactly as $seen then next
else { output current_line; $seen = current_line }

翻譯成 Perl 程式碼（是的 Perl，最漂亮的腳本語言）：

perl -e '
my $l = "\n";
while (<>) {
    if ($_ !~ /^\Q$l/) {
        print;
        chomp;
        $l = $_;
    }
}
'

信用：Ben Bacarisse @bsb.me.uk，來自 comp.lang.perl.misc。謝謝本，效果很好！

Question 3

並且，xpt 答案的單行版本。再次假設輸入已排序：

perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

在範例輸入上運行

/home/dave
/home/dave/file1
/home/dave/sub2/file2
/home/phil
/home/phil/file2

使用

echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2' | perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

給出

/home/dave
/home/phil

神奇之處在於 perl 的命令列參數：-e允許我們在命令列上給出腳本，-n迭代文件的行（將每一行放在中$_），並-l為我們處理換行符。

該腳本的工作原理是使用l追蹤最後看到的前綴。該BEGIN區塊在讀取第一行之前運行，並將變數初始化為看不到的字串（無換行符）。條件在文件的每一行上運行（由儲存$_）。該條件在文件的所有行上執行，並表示“如果該行沒有當前值作為l前綴，則會列印該行並將其儲存為值l。”由於命令列參數的原因，這與其他腳本本質上相同。

問題是兩個腳本都假設公共前綴作為自己的行存在，因此找不到輸入的公共前綴，例如

/home/dave/file1
/home/dave/file2

Answer