Skip to content

Conversation

@chongchongxiao
Copy link
Contributor

Purpose

Linked issue: close #134

Concurrent reading of snapshot files to obtain used files

Tests

No.

API and Format

No.

Documentation

No.

@lucasfang lucasfang requested a review from Copilot February 11, 2026 03:26
@lucasfang
Copy link
Collaborator

+1

lucasfang
lucasfang previously approved these changes Feb 11, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the performance of orphan file cleaning by parallelizing snapshot/manifest scanning when computing the set of “used” files (linked to #134).

Changes:

  • Parallelize OrphanFilesCleanerImpl::GetUsedFiles() by dispatching per-snapshot work to the configured Executor.
  • Factor snapshot-specific logic into a new helper GetUsedFilesBySnapshot.
  • Add a new clean metric key (snapshotFiles) and record the snapshot count.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/paimon/core/operation/orphan_files_cleaner_impl.h Declares new helper for per-snapshot used-file collection.
src/paimon/core/operation/orphan_files_cleaner_impl.cpp Implements concurrent per-snapshot scanning and aggregates results; emits new metric.
src/paimon/core/operation/metrics/clean_metrics.h Adds CLEAN_SNAPSHOT_FILES metric key.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zjw1111 zjw1111 changed the title optimize orphan files cleaner feat: optimize orphan files cleaner Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] optimize clean orphan files

2 participants