如何找到可能具有不同解析度的重複影像?

如何找到可能具有不同解析度的重複影像?

我有一些圖像(照片)並且有重複項,但無論我如何對它們進行排序,它們由於解析度和不規則命名而分散。

我試過gm compare但無法弄清楚要使用哪個指標或哪些值表示匹配。

下面是一張看起來完全相同的圖像範例,但第二張圖像的解析度是 2 倍(品質更好):

gm compare -metric MAE "7920068.jpg" "7920034.jpg"
gm compare -metric MSE "7920068.jpg" "7920034.jpg"
gm compare -metric PAE "7920068.jpg" "7920034.jpg"
gm compare -metric PSNR "7920068.jpg" "7920034.jpg"
gm compare -metric RMSE "7920068.jpg" "7920034.jpg"

Image Difference (MeanAbsoluteError):
           Normalized    Absolute
          ============  ==========
     Red: 0.1751787015    11480.3
   Green: 0.1168407563     7657.2
    Blue: 0.0029600541      194.0
   Total: 0.0983265040     6443.8

Image Difference (MeanSquaredError):
           Normalized    Absolute
          ============  ==========
     Red: 0.0910979679     5970.1
   Green: 0.0274231091     1797.2
    Blue: 0.0000203617        1.3
   Total: 0.0395138129     2589.5

Image Difference (PeakAbsoluteError):
           Normalized    Absolute
          ============  ==========
     Red: 1.0000000000    65535.0
   Green: 0.7803921569    51143.0
    Blue: 0.0784313725     5140.0
   Total: 1.0000000000    65535.0

Image Difference (PeakSignalToNoiseRatio):
           PSNR
          ======
     Red: 10.40
   Green: 15.62
    Blue: 46.91
   Total: 14.03

Image Difference (RootMeanSquaredError):
           Normalized    Absolute
          ============  ==========
     Red: 0.3018243991    19780.1
   Green: 0.1655992426    10852.5
    Blue: 0.0045123979      295.7
   Total: 0.1987808163    13027.1

使用graphicsmagick識別我找到了這些值

          |image a        |image a @2x    |image b
Red:
  Minimum:|  0.00 (0.0000)|  0.00 (0.0000)|  0.00 (0.0000)
  Maximum:|255.00 (1.0000)|255.00 (1.0000)|255.00 (1.0000)
  Mean:   |175.81 (0.6894)|176.00 (0.6902)|117.79 (0.4619)
  Std Dev:| 65.59 (0.2572)| 65.73 (0.2577)| 61.55 (0.2414)
Green:
  Minimum:|  0.00 (0.0000)|  0.00 (0.0000)|  0.00 (0.0000)
  Maximum:|255.00 (1.0000)|255.00 (1.0000)|255.00 (1.0000)
  Mean:   |161.58 (0.6336)|162.47 (0.6371)| 99.07 (0.3885)
  Std Dev:| 71.14 (0.2790)| 71.26 (0.2794)| 64.94 (0.2547)
Blue:
  Minimum:|  0.00 (0.0000)|  0.00 (0.0000)|  0.00 (0.0000)
  Maximum:|255.00 (1.0000)|255.00 (1.0000)|255.00 (1.0000)
  Mean:   |153.59 (0.6023)|153.27 (0.6010)|104.50 (0.4098)
  Std Dev:| 71.65 (0.2810)| 71.67 (0.2811)| 60.09 (0.2357)

看起來我可以使用這些值進行比較,圖像 a 檔案與圖像 b 相比具有非常相似的值,只需要獲得一個好的閾值來指示可能的匹配項

我將使用這些圖像作為範例:

  1. 不同的形象 老闆8
  2. 主題圖像 首領1
  3. 主題圖像半尺寸 老闆12

這是他們的輸出:

gm identify -verbose BOSS-1.jpg   
Image: BOSS-1.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Geometry: 591x1049
  Class: DirectClass
  Type: true color
  Depth: 8 bits-per-pixel component
  Channel Depths:
    Red:      8 bits
    Green:    8 bits
    Blue:     8 bits
  Channel Statistics:
    Red:
      Minimum:                     7.00 (0.0275)
      Maximum:                   255.00 (1.0000)
      Mean:                       89.97 (0.3528)
      Standard Deviation:         79.68 (0.3125)
    Green:
      Minimum:                    11.00 (0.0431)
      Maximum:                   255.00 (1.0000)
      Mean:                      108.55 (0.4257)
      Standard Deviation:         70.34 (0.2758)
    Blue:
      Minimum:                     8.00 (0.0314)
      Maximum:                   255.00 (1.0000)
      Mean:                      126.50 (0.4961)
      Standard Deviation:         68.28 (0.2678)
  Resolution: 72x72 pixels
  Filesize: 129.6Ki
  Interlace: No
  Orientation: Unknown
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: #BDBDBD
  Page geometry: 591x1049+0+0
  Compose: Over
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  JPEG-Quality: 93
  JPEG-Colorspace: 2
  JPEG-Colorspace-Name: RGB
  JPEG-Sampling-factors: 2x2,1x1,1x1
  Signature: 06a764225a290be783b0b3b90c72356f71b0032af8f58e88857c33d6e59b8ccc
  Profile-EXIF: 74 bytes
    Exif Offset: 26
    Color Space: 1
    Exif Image Width: 591
    Exif Image Length: 1049
  Tainted: False
  Elapsed Time: 0m:0.011805s
  Pixels Per Second: 50.1Mi

$ gm identify -verbose BOSS-1-50.jpg
Image: BOSS-1-50.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Geometry: 296x525
  Class: DirectClass
  Type: true color
  Depth: 8 bits-per-pixel component
  Channel Depths:
    Red:      8 bits
    Green:    8 bits
    Blue:     8 bits
  Channel Statistics:
    Red:
      Minimum:                     7.00 (0.0275)
      Maximum:                   255.00 (1.0000)
      Mean:                       89.34 (0.3504)
      Standard Deviation:         78.83 (0.3091)
    Green:
      Minimum:                    12.00 (0.0471)
      Maximum:                   255.00 (1.0000)
      Mean:                      107.87 (0.4230)
      Standard Deviation:         70.29 (0.2756)
    Blue:
      Minimum:                    14.00 (0.0549)
      Maximum:                   255.00 (1.0000)
      Mean:                      125.77 (0.4932)
      Standard Deviation:         68.19 (0.2674)
  Resolution: 72x72 pixels
  Filesize: 44.2Ki
  Interlace: No
  Orientation: Unknown
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: #BDBDBD
  Page geometry: 296x525+0+0
  Compose: Over
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  JPEG-Quality: 93
  JPEG-Colorspace: 2
  JPEG-Colorspace-Name: RGB
  JPEG-Sampling-factors: 2x2,1x1,1x1
  Signature: 2c12437d162d8bf92ad49497e2644ca3a5edd9d3c8947d44445a5923565123cc
  Profile-EXIF: 74 bytes
    Exif Offset: 26
    Color Space: 1
    Exif Image Width: 296
    Exif Image Length: 525
  Tainted: False
  Elapsed Time: 0m:0.002051s
  Pixels Per Second: 72.3Mi

$ gm identify -verbose BOSS-8.jpg   
Image: BOSS-8.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Geometry: 584x1050
  Class: DirectClass
  Type: true color
  Depth: 8 bits-per-pixel component
  Channel Depths:
    Red:      8 bits
    Green:    8 bits
    Blue:     8 bits
  Channel Statistics:
    Red:
      Minimum:                     0.00 (0.0000)
      Maximum:                   255.00 (1.0000)
      Mean:                       91.51 (0.3589)
      Standard Deviation:         85.21 (0.3341)
    Green:
      Minimum:                     0.00 (0.0000)
      Maximum:                   255.00 (1.0000)
      Mean:                      110.18 (0.4321)
      Standard Deviation:         83.58 (0.3278)
    Blue:
      Minimum:                     0.00 (0.0000)
      Maximum:                   255.00 (1.0000)
      Mean:                      132.97 (0.5214)
      Standard Deviation:         87.69 (0.3439)
  Resolution: 72x72 pixels
  Filesize: 180.5Ki
  Interlace: No
  Orientation: Unknown
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: #BDBDBD
  Page geometry: 584x1050+0+0
  Compose: Over
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  JPEG-Quality: 93
  JPEG-Colorspace: 2
  JPEG-Colorspace-Name: RGB
  JPEG-Sampling-factors: 2x2,1x1,1x1
  Signature: 9d12ad4d93d1c8d219d41ef9755984bcb151a8de502c70279aea4b69202c99d1
  Profile-EXIF: 74 bytes
    Exif Offset: 26
    Color Space: 1
    Exif Image Width: 584
    Exif Image Length: 1050
  Tainted: False
  Elapsed Time: 0m:0.016498s
  Pixels Per Second: 35.4Mi

答案1

您可以嘗試透過調整影像大小來標準化影像,使其具有已知解析度的正方形長寬比。比較歸一化影像會導致 MSE 指標的值相當低 (~100):

$ gm convert -geometry 1000x1000! same-big.jpg norm-same-big.jpg
$ gm convert -geometry 1000x1000! same-small.jpg norm-same-small.jpg
$ gm convert -geometry 1000x1000! different.jpg norm-different.jpg

$ gm compare -metric mse norm-same-big.jpg norm-same-small.jpg
Image Difference (MeanSquaredError):
           Normalized    Absolute
          ============  ==========
     Red: 0.0015487693      101.5
   Green: 0.0009830381       64.4
    Blue: 0.0015041910       98.6
   Total: 0.0013453328       88.2

$ gm compare -metric mse norm-same-big.jpg norm-different.jpg
Image Difference (MeanSquaredError):
           Normalized    Absolute
          ============  ==========
     Red: 0.0829284628     5434.7
   Green: 0.0682458298     4472.5
    Blue: 0.0753763994     4939.8
   Total: 0.0755168974     4949.0

您可以輕鬆地將其轉換為腳本,該腳本採用兩個文件名,對它們進行標準化,比較標準化圖像,然後在差異足夠接近時報告原始文件名。

答案2

不是答案,而是其他比較基礎:

  • 使用圖形魔術師取得影像尺寸,並比較水平和垂直比率。由於舍入誤差,可能會有較小的增量,但對於重新縮放到不同尺寸的影像,該比率應該相同。也可以看看對於類似的問題。

  • 使用圖像魔術師提取相似的訊息,這些訊息可能更適合比較。

  • 使用exif工具提取 EXIF 資料。如果一個影像是由另一個影像組成的,可以選擇保留EXIF 數據,以及如果原件有該數據,兩者的數據應該基本上相同。

    EXIF 數據

相關內容