我試圖讓file
命令檢測一些從未打算按文件分類的 Windows 文本文件...最好的選擇似乎使用正則表達式來匹配行內容,但我找不到其使用的單一示例(共性關鍵字“file” 、「magic」和「regex」在以google 為中心的世界中沒有幫助)。手冊頁沒有幫助。
此外,我無法讓 ^ $ 工作。
兩份文件都以
Project Units: <stuff>
Units & Scale - <stuff>
<blank line>
下一行是開始的標題 4a) 物件點 ID,照片 #, 4b) Id,名稱,
我為此嘗試的神奇規則是:
0 字串項目\040Units: >2 正規表示式 ^Object\040point\040ID,Photo\040#, PhotoModeler 2D 匯出表 0 字串項目\040Units: >2 正規表示式 ^Id、名稱、PhotoModeler 3D 匯出表
即在第一行匹配“項目單位:”,然後嘗試正規表示式以達到最大 2+1 行。將正規表示式錨定到行首以提高速度。
這是 Ubuntu 14.04,檔案 5.14。
文件類型 1 的範例(僅限前 10 行):
項目單位:米 單位和比例 - 活動、平移 - 活動、旋轉 - 活動 物件點 ID、照片編號、X(像素)、Y(像素)、殘差 X、殘差 Y、殘差向量、標記類型、圖層、材質、標記 2,1,1429.187065,1456.427823,-0.164541,0.182824,0.245964,LSM 圓形,預設,白色, 2,2,666.583514,1126.807078,-0.168174,0.109780,0.200833,LSM 圓形,預設,白色, 2,3,716.264669,1196.788962,0.152059,0.082258,0.172882,LSM 圓形,預設,白色, 2,4,674.145595,442.969428,0.119315,-0.050084,0.129401,LSM 圓形,預設,白色, 2,5,330.056929,836.292587,0.048372,-0.022235,0.053238,LSM 圓形,預設,白色, 2,6,1147.101715,39.253316,0.475434,-0.189514,0.511814,LSM 圓形,預設,白色,
文件類型 2 的範例(僅限前 10 行):
項目單位:米 單位和比例 - 活動、平移 - 活動、旋轉 - 活動 ID、姓名、照片(使用)、X(項目單位)、Y(項目單位)、Z(項目單位)、X 精度、Y 精度、Z 精度、精度向量長度、緊密度(百分比)、緊密度(項目單位) ,角度(度),控制名稱,RMS 殘差(像素),最大殘差(像素),照片最大殘差,材質,圖層,標記,類型,處理中使用,凍結,#Constraints,目標程式碼,目標位,參考。檢查標籤,照片(已標記),顏色(R),顏色(G),顏色(B) 2," ","1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.285721 ,1.143037,-0.000990,0.000044,0.000043,0.000075,0.000097,0.037511,0.000682,85.604862,0.261,0.006 ,” 1,2,3 ,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255 3," ","1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.428622 ,1.143108,-0.000230,0.000044,0.000042,0.000074,0.000096,0.033814,0.000615,86.326354,,0.22883,028,56,5,6354,,0.22 ,” 1,2,3 ,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255 4," ","1,2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.142979,1.143124 ,-0.000840,0.000045,0.000044,0.000078,0.000100,0.030045,0.000546,84.468461,,0.239445,0.374918,16,白色,默認,,常規,是,否,0,n/a,n/一個,,“1, 2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255 5," ","1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.571353 ,1.143164,0.000784,0.000044,0.000042,0.000074,0.000096,0.027194,0.000494,86.593419,027194,0.000494,86.593419,,0.215546,035,46,0,00,469 "1 ,2,3, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255 6," ","1,2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.000141,1.143101 ,-0.000885,0.000。 2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255 7," ","1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",0.714058 ,1.143134,0.000247,0.000044,0.000043,0.000075,0.000097,0.030057,0.000547,86.326626,,0.210626,,0.210626,,0.210626,,0.210695,0626,,0.21. "1 ,2,3, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21",255,255,255
答案1
這文件(1)線上說明頁面僅告訴您如何執行該指令。有關魔法圖案的描述,請參見魔法(5)。然而, 的部分regex
並不是特別詳細。在它附帶的模式文件中可以找到它的廣泛使用範例: https://github.com/file/file/tree/master/magic/Magdir
您的主要問題是插入符號需要轉義:\^
對於行首,\\^
對於文字^
。我還沒弄清楚unscaped^
有什麼特殊意義。空格也可以被轉義,使模式更具可讀性。
您打算將匹配限制在較窄的行範圍內。 regex
接受一個/<length>
選項(在單字之後regex
,而不是在模式之後),這樣就限制了搜尋位置結束。如果長度後面跟l
,則表示行而不是位元組。在我的測試中,/1l
只能匹配空行——非空行,即使使用精確的起始偏移量,也至少需要/2l
.
為了開始搜尋的,offset
被解釋為位元組計數,即使使用regex
. (5.19 版本之前,文件表明它被解釋為“行計數”,但該聲明是已刪除沒有匹配的代碼更改,所以我懷疑它在那之前是否準確&0
。行的中間。
此外,「行的開頭」也會符合「搜尋範圍的開頭」(即 from offset
),無論這是否是檔案中行的開頭。
因此,為了更嚴格地匹配,您可以在每一行上使用全行正則表達式,並&1
在下一個匹配上使用偏移量,以跳過上一個換行符,並位於正確的位置以便\^
按預期工作。這對於識別您的自訂文件類型可能有點過分了。
最後,您不需要重複公共部分。縮排等級>
意味著當相同等級的先前模式失敗時應嘗試該模式。
將所有這些結合在一起:
0 regex/2l \^Project\ Units:.*$
>&1 regex/2l \^Units\ &\ Scale.*$
>>&1 regex/1l \^$
>>>&1 regex/2l \^Object\ Point\ ID Photo Modeler 2D export table
>>>&1 regex/2l \^Id,Name,Photos Photo Modeler 3D export table
答案2
一個解決方案是@JigglyNaga - 逃避插入符號。下面的程式碼片段現在是我的 .magic 檔案的一部分。
0 字串項目\040Units: >2 正規表示式 \^Id,PhotoModeler 3D 匯出表 0 字串項目\040Units: >2 正規表示式 \^Object\040Point\040ID,PhotoModeler 2D 匯出表