grep을 사용하여 텍스트를 추출하는 스크립트

Question 1

아이디어는 grep의 결과를 처리하고 이를 출력 파일에 명시적으로 추가하는 것입니다. 이렇게 하면 콘솔을 사용하여 디버그 메시지를 작성할 수 있습니다.

#/bin/bash

# Save output to this file
outputFile='./xmldocs/1.txt'
rm -f $outputFile

# List only *.xml files and iterate
for i in `ls *.xml`
do
    # Echo which file is being processed (only printed to console )
    echo 'Processing :'$i
    # Grep, remove trailing newline and append to $outputFile
    grep "Document ID:" -s $i | tr -d '\n'  >> $outputFile
    # Add char to separate
    printf "~" >> $outputFile
    # Grep, remove trailing newline and append to $outputFile
    grep 'CI[^"]' -s $i | tr -d '\n' >> $outputFile
    # Print newline to separate results
    printf "\n" >> $outputFile
done 

echo '!! done'

이것이 작동하지 않으면 테스트하기 위해 grep하려는 다른 줄을 게시하십시오.

Answer

아이디어는 grep의 결과를 처리하고 이를 출력 파일에 명시적으로 추가하는 것입니다. 이렇게 하면 콘솔을 사용하여 디버그 메시지를 작성할 수 있습니다.

#/bin/bash

# Save output to this file
outputFile='./xmldocs/1.txt'
rm -f $outputFile

# List only *.xml files and iterate
for i in `ls *.xml`
do
    # Echo which file is being processed (only printed to console )
    echo 'Processing :'$i
    # Grep, remove trailing newline and append to $outputFile
    grep "Document ID:" -s $i | tr -d '\n'  >> $outputFile
    # Add char to separate
    printf "~" >> $outputFile
    # Grep, remove trailing newline and append to $outputFile
    grep 'CI[^"]' -s $i | tr -d '\n' >> $outputFile
    # Print newline to separate results
    printf "\n" >> $outputFile
done 

echo '!! done'

이것이 작동하지 않으면 테스트하기 위해 grep하려는 다른 줄을 게시하십시오.

Question 2

당신이 원하는 것은 paste:

#!/bin/bash
for f in *.xml
do
    paste -d '~' <(grep 'Document ID:' "$f") <(grep 'CI[\^"]' "$f")
done > /xmldocs/1.txt

Answer

당신이 원하는 것은 paste:

#!/bin/bash
for f in *.xml
do
    paste -d '~' <(grep 'Document ID:' "$f") <(grep 'CI[\^"]' "$f")
done > /xmldocs/1.txt

Question 3

을 사용하면 스크립트가 중단되는 이유에 대해서는 grep 'CI[^"]'^를 이스케이프 처리해야 합니다. 사용하면 grep 'CI[\^"]'문제가 해결되었습니다. 이는 당근 기호가 괄호 안에 있어도 부정으로 해석되기 때문입니다.

편집: 스틸드라이버의 수정

Answer

을 사용하면 스크립트가 중단되는 이유에 대해서는 grep 'CI[^"]'^를 이스케이프 처리해야 합니다. 사용하면 grep 'CI[\^"]'문제가 해결되었습니다. 이는 당근 기호가 괄호 안에 있어도 부정으로 해석되기 때문입니다.

편집: 스틸드라이버의 수정

grep을 사용하여 텍스트를 추출하는 스크립트

답변1

답변2

답변3

관련 정보