Ceph でファイルを削除してもディスク領域が解放されない

2024-6-23 • tag-icon

Ceph バージョン: 16.2.13 (pacific) (pacific が非推奨であることは承知していますが、環境全体がレガシーです (centos 7.3 など)。アップグレードする権限もありません。) クラスターには 6 台のサーバー (22 台の osds、97 台の pgs) があります。NFS 経由でエクスポートされる CephFS があります。クライアントは NFSv4.1 (NFS-Ganesha) 経由でクラスターにアクセスしています。マウントにはクライアントで次のコマンドを使用します。

# mount -t nfs -o nfsvers=4.1,noauto,soft,sync,proto=tcp 172.20.0.31:/exports /cephmnt

フォルダー（約5.2GB）を/cephmntにコピーしました。

# cp sysdir /cephmnt

したがって、スペースは予想どおりに拡張されました (df -Thおよびの出力を確認した後ceph df detail)。

# df -Th | grep -i ceph
172.20.0.31:/exports    nfs4    26T    5.2G    26T    1%    /cephmnt

# ceph df | grep -i cephfs
cephfs.new_storage.meta    8    32    26 MiB    28      79 MiB    0      25 TiB
cephfs.new_storage.data    9    32    5.2 GiB   1.42k   15 GiB    0.02   25 TiB

しかし、フォルダを削除してもスペースは縮小されませんでした。

# rm -rf sysdir

# df -Th | grep -i ceph
172.20.0.31:/exports    nfs4    26T    5.2G    26T    1%    /cephmnt

# ceph df | grep -i cephfs
cephfs.new_storage.meta    8    32    26 MiB     28       79 MiB    0       25 TiB
cephfs.new_storage.data    9    32    5.2 GiB    1.42k    15 GiB    0.02    25 TiB

データプール内のオブジェクトのリストは次のように表示できます。

# rados -p cephfs.new_storage.data ls

私は ceph にかなり不慣れなので、これが ceph の通常の動作であるかどうかはわかりませんが、後者ではないかと疑っているので、調べてみました。

スナップショットは無効になっており、両方のプールに既存のスナップショットはありません。

# ceph fs set new_storage allow_new_snaps false
# rados -p cephfs.new_storage.meta lssnap
0 snaps
# rados -p cephfs.new_storage.data lssnap
0 snaps

bdev_async_discardどこかでOSDのbluestoreは、がtrueに設定されていると使用できないデータを自動的に削除すると読んだので、bdev_enable_discardそれらを設定しました。

# ceph config get osd bdev_async_discard
true
# ceph config get osd bdev_enable_discard
true

しかし、これは効果がありません。NFS 共有を数回アンマウントしてマウントしましたが (一晩アンマウントしたままにしたこともありました)、マウントし直すたびにdf -Th、ceph df占有されている同じスペースが表示されます。/cephmntcdディレクトリに入り、sync コマンドを実行しました。それでも効果はありません。

削除されたファイルのスペースを解放するにはどうすればよいですか?

私は読んだここcephfs には遅延削除機能がありますが、これが私のケースで発生している現象なのか、それとも別の問題があるのかはわかりません。遅延削除である場合、それをどのように確認し、実際の削除をトリガーできますか? 遅延削除でない場合、実際の問題は何ですか?

トラブルシューティングに他のデータが必要かどうかお問い合わせください。私はこれに 3 日間ほど取り組んでいますが、まったくアイデアが浮かばないので、どんな助けでも大歓迎です。

編集1: 詳細を追加しました

[root@cephserver1 ~]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE   DATA      OMAP     META     AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  3.63869   1.00000  3.6 TiB   1.6 GiB   593 MiB    2 KiB  1.0 GiB  3.6 TiB  0.04  1.07   10      up
 1    hdd  3.63869   1.00000  3.6 TiB   1.1 GiB   544 MiB   19 KiB  559 MiB  3.6 TiB  0.03  0.71    9      up
 2    hdd  3.63869   1.00000  3.6 TiB   1.7 GiB   669 MiB    6 KiB  1.0 GiB  3.6 TiB  0.05  1.12   13      up
 4    hdd  3.63869   1.00000  3.6 TiB   1.6 GiB   742 MiB   26 KiB  918 MiB  3.6 TiB  0.04  1.07   13      up
13    hdd  3.63869   1.00000  3.6 TiB   1.7 GiB   596 MiB    4 KiB  1.2 GiB  3.6 TiB  0.05  1.15    8      up
 5    hdd  3.63869   1.00000  3.6 TiB   1.9 GiB   1.2 GiB   56 MiB  713 MiB  3.6 TiB  0.05  1.26   16      up
 6    hdd  3.63869   1.00000  3.6 TiB   1.6 GiB   407 MiB  124 MiB  1.1 GiB  3.6 TiB  0.04  1.04    9      up
 7    hdd  3.63869   1.00000  3.6 TiB   1.3 GiB   418 MiB   67 MiB  887 MiB  3.6 TiB  0.04  0.89   12      up
 8    hdd  3.63869   1.00000  3.6 TiB   1.1 GiB   667 MiB   73 MiB  372 MiB  3.6 TiB  0.03  0.72   15      up
 9    hdd  3.63869   1.00000  3.6 TiB   1.7 GiB   1.2 GiB    7 KiB  526 MiB  3.6 TiB  0.05  1.13   18      up
10    hdd  3.63869   1.00000  3.6 TiB   1.5 GiB   906 MiB    8 KiB  579 MiB  3.6 TiB  0.04  0.96   11      up
11    hdd  3.63869   1.00000  3.6 TiB   1.7 GiB   1.1 GiB    6 KiB  628 MiB  3.6 TiB  0.05  1.15   11      up
12    hdd  3.63869   1.00000  3.6 TiB   1.8 GiB   600 MiB   16 MiB  1.2 GiB  3.6 TiB  0.05  1.17   15      up
 3    hdd  3.63869   1.00000  3.6 TiB   2.8 GiB   1.6 GiB   37 MiB  1.2 GiB  3.6 TiB  0.08  1.86   17      up
14    hdd  3.63869   1.00000  3.6 TiB   1.6 GiB   857 MiB   37 KiB  781 MiB  3.6 TiB  0.04  1.06   12      up
15    hdd  3.63869   1.00000  3.6 TiB   1.9 GiB   1.4 GiB    2 KiB  499 MiB  3.6 TiB  0.05  1.26   12      up
16    hdd  3.63869   1.00000  3.6 TiB   2.2 GiB   972 MiB    1 KiB  1.2 GiB  3.6 TiB  0.06  1.44   15      up
17    hdd  3.63869   1.00000  3.6 TiB  1002 MiB   981 MiB    8 KiB   20 MiB  3.6 TiB  0.03  0.65   17      up
18    hdd  3.63869   1.00000  3.6 TiB   935 MiB   915 MiB    3 KiB   20 MiB  3.6 TiB  0.02  0.60   17      up
19    hdd  3.63869   1.00000  3.6 TiB   1.0 GiB  1006 MiB      0 B   28 MiB  3.6 TiB  0.03  0.67   10      up
20    hdd  3.63869   1.00000  3.6 TiB   866 MiB   835 MiB      0 B   31 MiB  3.6 TiB  0.02  0.56   20      up
21    hdd  3.63869   1.00000  3.6 TiB   731 MiB   709 MiB      0 B   22 MiB  3.6 TiB  0.02  0.47   11      up
                       TOTAL   80 TiB    33 GiB    19 GiB  374 MiB   14 GiB   80 TiB  0.04
MIN/MAX VAR: 0.47/1.86  STDDEV: 0.01

[root@cephserver1 ~]# ceph fs status
new_storage - 4 clients
======================
RANK  STATE                      MDS                        ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  new_storage.cephserver2.gvflgv  Reqs:    0 /s   161    163     52    154
               POOL                   TYPE     USED  AVAIL
cephfs.new_storage.meta  metadata  79.4M  25.3T
cephfs.new_storage.data    data    18.2G  25.3T
               STANDBY MDS
new_storage.cephserver3.wxrhxm
new_storage.cephserver4.xwpidi
new_storage.cephserver1.fwjpoi
MDS version: ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)

[root@cephserver1 ~]# ceph -s
  cluster:
    id:     dcad37bc-1185-11ee-88c0-7cc2556f5050
    health: HEALTH_WARN
            1 failed cephadm daemon(s)

  services:
    mon: 5 daemons, quorum cephserver1,cephserver2,cephserver3,cephserver4,cephserver5 (age 8d)
    mgr: cephserver2.sztiyq(active, since 2w), standbys: cephserver1.emjcaa
    mds: 1/1 daemons up, 3 standby
    osd: 22 osds: 22 up (since 3h), 22 in (since 8d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 1.81k objects, 6.2 GiB
    usage:   33 GiB used, 80 TiB / 80 TiB avail
    pgs:     97 active+clean

  io:
    client:   462 B/s rd, 0 op/s rd, 0 op/s wr

[root@cephserver1 ~]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon grafana.cephserver1 on cephserver1 is in error state

編集2:重要な点について言及するのを忘れていました。ストレージクラスター全体がエアギャップ環境にあります。

編集3:eblock からのコメントの提案に従って、オンラインで OSD を圧縮しようとしましたが、部分的には機能しました。ceph df圧縮前に表示されていたのは次の内容です。

[root@cephserver1 ~]# ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL    USED  RAW USED  %RAW USED
hdd    80 TiB  80 TiB  **33 GiB**    **33 GiB**       0.04
TOTAL  80 TiB  80 TiB  **33 GiB**    **33 GiB**       0.04

--- POOLS ---
POOL                                ID  PGS   STORED  OBJECTS    USED  %USED  MAX AVAIL
device_health_metrics                1    1   17 MiB       29   50 MiB      0     25 TiB
cephfs.new_storage.meta   8   32   26 MiB       28   79 MiB      0     25 TiB
cephfs.new_storage.data   9   32   5.2GiB     1.42k  15 GiB   0.02     25 TiB
.nfs                                10   32  1.7 KiB        7   40 KiB      0     25 TiB

圧縮後、33GiBは以下のように23GiBに減少しました。

[root@cephserver1 ~]# ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL    USED  RAW USED  %RAW USED
hdd    80 TiB  80 TiB  **23 GiB**    **23 GiB**       0.03
TOTAL  80 TiB  80 TiB  **23 GiB**    **23 GiB**       0.03

--- POOLS ---
POOL                                ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics                1    1   18 MiB       29   54 MiB      0     25 TiB
cephfs.new_storage.meta   8   32   26 MiB       28   79 MiB      0     25 TiB
cephfs.new_storage.data   9   32   5.2GiB     1.42k  15 GiB   0.02     25 TiB
.nfs                                10   32   32 KiB        7  131 KiB      0     25 TiB

ただし、プール内のデータは減少しませんでした。そのため、さらなる提案を心より歓迎いたします。

編集4:次のコマンドを使用して、CephFS をネイティブにマウントしました (つまり、カーネルコマンドを使用して NFS を介さずに)。

# mount -t ceph 172.30.0.31:6789,172.30.0.32:6789,172.30.0.33:6789:/ /cephmnt -o name=user1

マウントした後、ls -a /cephmnt古いデータは表示されません。ただし、df -ThCephFS がマウントされているクライアントで実行すると、古いデータで占有されている領域 (5.2 GB) がまだ表示されます。したがって、問題は NFS にないのではないかと思います。

関連情報