
Auftakt:
Wie lassen sich bei einer sortierten Eingabe einer Liste von Pfaden/Dateien die gemeinsamen Pfade finden?
Um es in einen technischen Begriff zu übersetzen: Wie wählt man das kürzeste richtige Präfix aus der Standardeingabe aus, wenn die sortierte Eingabe von der Standardeingabe erfolgt?
Hier hat das „Präfix“ die normale Bedeutung, z. B. hat die Zeichenfolge „abcde“ das Präfix „abc“. Hier ist meine Beispieleingabe
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2'
/home/dave
/home/dave/file1
/home/dave/sub2/file2
Dies ist ein Beispiel fürEntfernen Sie aufeinanderfolgende richtige Präfixevon der Standardeingabe aus mit dem folgenden Befehl sed
:
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | sed "N; /^\(.*\)\n\1\//D; P; D"
/home/dave/file1
/home/dave/sub2/file2
Frage:
Meine Frage ist, wie ichBewahren Sie das richtige Präfixstattdessen und entfernen Sie alle Zeilen mit diesem Präfix. Wenn sowohl /home/dave/file1
als auch /home/dave/sub2/file2
das Präfix haben /home/dave
, /home/dave
bleibt das erhalten, während die anderen beiden nicht erhalten bleiben. D. h., es wird das genaue Gegenteil des obigen sed
Befehls bewirkt.
Mehr Info:
- Die Eingabe wäre bereits sortiert
/home/dave /home/dave/file1 /home/phil /home/phil/file2
Wenn ich ( ) habeecho -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2'
, würde ich erwarten, dass/home/dave
und/home/phil
die Antwort ist.
Anwendung:
I have two disk volumes containing similiar content. I want to copy what's in v1 but missing from v2 into another disk volume, v3. Using find
, sort
, and comm
, I am able to get a list of what to copy, but I need to further clean up that list. I.e., as long as I have /home/dave
in the list, I don't need the other two.
Thanks!
Antwort1
This answer uses Python. As the OP wanted to remove the directories covered by their parents as I had seen as a possiblity I began writing a different program to remove coverings:
Example:
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file1' | removecoverings
/home/phil
/home/dave
Code of the removecoverings
command:
#!/usr/bin/env python2
import sys
def list_startswith(a, b):
if not len(a) >= len(b):
return False
return all(x == y for x,y in zip(a[:len(b)],b))
def removecoverings(it):
g = list(it)
g.sort(key=lambda v: len(v.split('/')), reverse=True)
o = []
while g:
c = g.pop()
d = []
for v in g:
if list_startswith(v.split('/'), c.split('/')):
d.append(v)
for v in d:
g.remove(v)
o.append(c)
return o
for o in removecoverings(l.strip() for l in sys.stdin.readlines()):
print o
This answer uses Python. It also does a component-wise rather than string-wise common prefix. Better for paths as the common prefix of /ex/ample
and /exa/mple
should be /
not /ex
. This assumes that what is wanted is the greatest common prefix and not a list of prefixes with their coverings removed. If you have /home/dave /home/dave/file1 /home/phil /home/phil/file2
and expect /home/dave /home/phil
rather than /home
. This is not the answer you would be looking for.
Example:
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | commonprefix
/home/dave
Code of the commonprefix
command:
#!/usr/bin/env python2
import sys
def commonprefix(l):
# this unlike the os.path.commonprefix version
# always returns path prefixes as it compares
# path component wise
cp = []
ls = [p.split('/') for p in l]
ml = min( len(p) for p in ls )
for i in range(ml):
s = set( p[i] for p in ls )
if len(s) != 1:
break
cp.append(s.pop())
return '/'.join(cp)
print commonprefix(l.strip() for l in sys.stdin.readlines())
Antwort2
Given that the input is sorted, The pseudo code would be:
$seen = last_line;
if current_line begins exactly as $seen then next
else { output current_line; $seen = current_line }
Translating into Perl code (Yes Perl, the most beautiful script language of all):
perl -e '
my $l = "\n";
while (<>) {
if ($_ !~ /^\Q$l/) {
print;
chomp;
$l = $_;
}
}
'
Credit: Ben Bacarisse @bsb.me.uk, from comp.lang.perl.misc. Thanks Ben, it works great!
Antwort3
And, the one liner version of xpt's answer. Again, assuming sorted input:
perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'
Run on the example input
/home/dave
/home/dave/file1
/home/dave/sub2/file2
/home/phil
/home/phil/file2
using
echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2' | perl -lne 'BEGIN { $l="\n"; }; if ($_ !~ /^\Q$l/) { print $_; $l = $_; }'
gives
/home/dave
/home/phil
The magic is in the command-line arguments to perl: -e
allows us to give a script on the command line, -n
iterates over the lines of the file (placing each line in $_
), and -l
deals with newlines for us.
The script works by using l
to track the last prefix seen. The BEGIN
block is run before the first line is read, and initializes the variable to a string that won't be seen (no newlines). The conditional is run on each line of the file (held by $_
). The conditional is executed on all of the lines of the file, and says "if the line does not have the current value of l
as a prefix, then print the line and save it as the value of l
." Because of the command-line arguments, this is essentially identical to the other script.
The catch is that both scripts assume that the common prefix exists as its own line, so don't find the common prefix for input like
/home/dave/file1
/home/dave/file2