How to find the common paths from a list of paths/files

Question 1

This answer uses Python. As the OP wanted to remove the directories covered by their parents as I had seen as a possiblity I began writing a different program to remove coverings:

Example:

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file1' | removecoverings 
/home/phil
/home/dave

Code of the removecoverings command:

#!/usr/bin/env python2

import sys

def list_startswith(a, b):
    if not len(a) >= len(b):
        return False
    return all(x == y for x,y in zip(a[:len(b)],b))

def removecoverings(it):
    g = list(it)
    g.sort(key=lambda v: len(v.split('/')), reverse=True)
    o = []
    while g:
        c = g.pop()
        d = []
        for v in g:
            if list_startswith(v.split('/'), c.split('/')):
                d.append(v)
        for v in d:
            g.remove(v)
        o.append(c)
    return o

for o in removecoverings(l.strip() for l in sys.stdin.readlines()):
    print o

This answer uses Python. It also does a component-wise rather than string-wise common prefix. Better for paths as the common prefix of /ex/ample and /exa/mple should be / not /ex. This assumes that what is wanted is the greatest common prefix and not a list of prefixes with their coverings removed. If you have /home/dave /home/dave/file1 /home/phil /home/phil/file2 and expect /home/dave /home/phil rather than /home. This is not the answer you would be looking for.

Example:

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | commonprefix 
/home/dave

Code of the commonprefix command:

#!/usr/bin/env python2

import sys

def commonprefix(l):
    # this unlike the os.path.commonprefix version
    # always returns path prefixes as it compares
    # path component wise
    cp = []
    ls = [p.split('/') for p in l]
    ml = min( len(p) for p in ls )

    for i in range(ml):

        s = set( p[i] for p in ls )         
        if len(s) != 1:
            break

        cp.append(s.pop())

    return '/'.join(cp)

print commonprefix(l.strip() for l in sys.stdin.readlines())

Answer

This answer uses Python. As the OP wanted to remove the directories covered by their parents as I had seen as a possiblity I began writing a different program to remove coverings:

Example:

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file1' | removecoverings 
/home/phil
/home/dave

Code of the removecoverings command:

#!/usr/bin/env python2

import sys

def list_startswith(a, b):
    if not len(a) >= len(b):
        return False
    return all(x == y for x,y in zip(a[:len(b)],b))

def removecoverings(it):
    g = list(it)
    g.sort(key=lambda v: len(v.split('/')), reverse=True)
    o = []
    while g:
        c = g.pop()
        d = []
        for v in g:
            if list_startswith(v.split('/'), c.split('/')):
                d.append(v)
        for v in d:
            g.remove(v)
        o.append(c)
    return o

for o in removecoverings(l.strip() for l in sys.stdin.readlines()):
    print o

This answer uses Python. It also does a component-wise rather than string-wise common prefix. Better for paths as the common prefix of /ex/ample and /exa/mple should be / not /ex. This assumes that what is wanted is the greatest common prefix and not a list of prefixes with their coverings removed. If you have /home/dave /home/dave/file1 /home/phil /home/phil/file2 and expect /home/dave /home/phil rather than /home. This is not the answer you would be looking for.

Example:

$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | commonprefix 
/home/dave

Code of the commonprefix command:

#!/usr/bin/env python2

import sys

def commonprefix(l):
    # this unlike the os.path.commonprefix version
    # always returns path prefixes as it compares
    # path component wise
    cp = []
    ls = [p.split('/') for p in l]
    ml = min( len(p) for p in ls )

    for i in range(ml):

        s = set( p[i] for p in ls )         
        if len(s) != 1:
            break

        cp.append(s.pop())

    return '/'.join(cp)

print commonprefix(l.strip() for l in sys.stdin.readlines())

Question 2

Given that the input is sorted, The pseudo code would be:

$seen = last_line;
if current_line begins exactly as $seen then next
else { output current_line; $seen = current_line }

Translating into Perl code (Yes Perl, the most beautiful script language of all):

perl -e '
my $l = "\n";
while (<>) {
    if ($_ !~ /^\Q$l/) {
        print;
        chomp;
        $l = $_;
    }
}
'

Credit: Ben Bacarisse @bsb.me.uk, from comp.lang.perl.misc. Thanks Ben, it works great!

Answer

Given that the input is sorted, The pseudo code would be:

$seen = last_line;
if current_line begins exactly as $seen then next
else { output current_line; $seen = current_line }

Translating into Perl code (Yes Perl, the most beautiful script language of all):

perl -e '
my $l = "\n";
while (<>) {
    if ($_ !~ /^\Q$l/) {
        print;
        chomp;
        $l = $_;
    }
}
'

Credit: Ben Bacarisse @bsb.me.uk, from comp.lang.perl.misc. Thanks Ben, it works great!

Question 3

And, the one liner version of xpt's answer. Again, assuming sorted input:

perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

Run on the example input

/home/dave
/home/dave/file1
/home/dave/sub2/file2
/home/phil
/home/phil/file2

using

echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2' | perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

gives

/home/dave
/home/phil

The magic is in the command-line arguments to perl: -e allows us to give a script on the command line, -n iterates over the lines of the file (placing each line in $_), and -l deals with newlines for us.

The script works by using l to track the last prefix seen. The BEGIN block is run before the first line is read, and initializes the variable to a string that won't be seen (no newlines). The conditional is run on each line of the file (held by $_). The conditional is executed on all of the lines of the file, and says "if the line does not have the current value of l as a prefix, then print the line and save it as the value of l." Because of the command-line arguments, this is essentially identical to the other script.

The catch is that both scripts assume that the common prefix exists as its own line, so don't find the common prefix for input like

/home/dave/file1
/home/dave/file2

Answer

And, the one liner version of xpt's answer. Again, assuming sorted input:

perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

Run on the example input

/home/dave
/home/dave/file1
/home/dave/sub2/file2
/home/phil
/home/phil/file2

using

echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2' | perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'

gives

/home/dave
/home/phil

The magic is in the command-line arguments to perl: -e allows us to give a script on the command line, -n iterates over the lines of the file (placing each line in $_), and -l deals with newlines for us.

The script works by using l to track the last prefix seen. The BEGIN block is run before the first line is read, and initializes the variable to a string that won't be seen (no newlines). The conditional is run on each line of the file (held by $_). The conditional is executed on all of the lines of the file, and says "if the line does not have the current value of l as a prefix, then print the line and save it as the value of l." Because of the command-line arguments, this is essentially identical to the other script.

The catch is that both scripts assume that the common prefix exists as its own line, so don't find the common prefix for input like

/home/dave/file1
/home/dave/file2

How to find the common paths from a list of paths/files

Antwort1

Antwort2

Antwort3

verwandte Informationen