
我的輸入檔 input.txt 有兩行,如下所示,我需要從第一行提取claimStartDate,從第二行提取claimEndDate。
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00">
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00">
rm input.txt
awk '/<ProfessionalClaim/' test.xml | head -1 > input.txt
awk '/<ProfessionalClaim/' test.xml | tail -1 >> input.txt
awk '{match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \
{match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt
答案1
$ awk '/F_LINE/ {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \
/L_LINE/ {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt
2018-04-02
2018-04-17
根據您的新資訊進行編輯:
$ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \
NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt
2018-04-02
2018-04-17
您還可以一次完成這一切:
$ grep "<ProfessionalClaim" text.xml \
| sed -n '1p;$p' \
| $ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \
NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}'
grep
尋找所有包含<ProfessionalClaim
in 的行text.xml
sed
將行截斷到第一個和最後一個 onylawk
將列印claimStartDate
第一行和ClaimEndDate
第二行
答案2
假設一些 XML 輸入文檔如下所示:
<?xml version="1.0"?>
<root>
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00"/>
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00"/>
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-18" claimStartDate="2018-04-18" sourceSystemId="abcd" claimActionCode="00"/>
<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-19" claimStartDate="2018-04-19" sourceSystemId="abcd" claimActionCode="00"/>
</root>
……我們可以用來從後面有另一個節點的每個節點中xmlstarlet
提取claimStartDate
屬性值,以及下一個節點的屬性值:ProfessionalClaim
ProfessionalClaim
ProfessionalClaim
claimEndDate
xmlstarlet select --template \
--match '//ProfessionalClaim[following-sibling::ProfessionalClaim/@claimEndDate]' \
--value-of 'concat(@claimStartDate, " ", following-sibling::ProfessionalClaim/@claimEndDate)' \
-nl input.txt
這首先匹配每個ProfessionalClaim
節點,然後是另一個ProfessionalClaim
節點。
對於每個這樣的節點,屬性的值與後續節點的屬性claimStartDate
值連接,並使用單一空格字元作為分隔符號。claimEndDate
ProfessionalClaim
鑑於我上面的示例文檔,這將生成
2018-04-02 2018-04-17
2018-04-17 2018-04-18
2018-04-18 2018-04-19