arc/docs/architecture.md
hanna a6b1a027c8
refactor: switch storage format from MessagePack to bincode
- Replace rmp-serde with bincode 1.x in Cargo.toml
- Update store.rs serialization/deserialization and ID hashing
- Rename model.rs helpers from to_msgpack/from_msgpack to to_bytes/from_bytes
- Consolidate MsgPack/MsgPackDecode error variants into single Bincode variant
- Remove skip_serializing_if on ssh_signature (incompatible with bincode)
- Update all documentation to reflect bincode storage format
2026-02-10 21:03:53 +00:00

120 lines
5 KiB
Markdown

# Architecture
Arc is a version control system with its own data model, storage format, and a
git bridge for interoperability.
## Repository Layout
An arc repository keeps all state in an `.arc/` directory at the worktree root:
| Path | Format | Purpose |
|------|--------|---------|
| `HEAD` | YAML | Current state — one of three variants: **unborn** (no commits yet; has `bookmark`), **attached** (on a bookmark; has `bookmark` + `commit`), or **detached** (raw commit; has `commit`). |
| `config.yml` | YAML | Local repository configuration. |
| `commits/<id>.zst` | Zstandard-compressed bincode | Commit objects. Each file contains a `CommitObject` that bundles a `Commit` and its `Delta`. |
| `bookmarks/<name>.yml` | YAML | One file per bookmark. Contains a `RefTarget` with an optional `commit` field. |
| `tags/<name>.yml` | YAML | Same format as bookmarks. |
| `stashes/state.yml` | YAML | Tracks the active stash. |
| `stashes/named/<name>.yml` | YAML | Per-stash state files. |
| `remotes.yml` | YAML | Map of remote names to URLs. |
| `git/` | Bare git repo | Shadow repository used by the git bridge. |
| `git-map.yml` | YAML | Bidirectional mapping between arc commit IDs and git OIDs. |
## Data Model (`src/model.rs`)
`CommitId` and `DeltaId` are newtype wrappers around `String`, holding SHA-256
hex hashes.
**Commit**`id`, `parents` (Vec), `delta` (DeltaId), `message`,
`author` (optional `Signature`), `timestamp` (i64 unix),
`ssh_signature` (optional PEM string).
**Delta**`id`, `base` (optional parent CommitId), `changes` (Vec of
`FileChange`).
**FileChange**`path` plus a `kind`: Add, Modify, Delete, or Rename.
**FileContentDelta** — either `Full { bytes }` (complete snapshot) or
`Patch { format, data }` (incremental).
**Head** — enum with variants Unborn, Attached, and Detached.
## Storage (`src/store.rs`)
`CommitObject` bundles a `Commit` and its `Delta` into a single unit that is
serialized with bincode, then compressed with Zstandard at level 3. Files are
written atomically (write to `.tmp`, then rename). IDs are computed by SHA-256
hashing the bincode-serialized content-addressable data.
## Tracking (`src/tracking.rs`)
`FileTree` is a `BTreeMap<String, Vec<u8>>` mapping relative paths to file
content.
- `scan_worktree` — recursively walks the working directory, respecting ignore
rules and skipping `.arc/` and `.git/`.
- `materialize_committed_tree` — rebuilds the full file tree by replaying the
linear delta chain from the root commit.
- `detect_changes` — compares the committed tree against the worktree to produce
a list of `FileChange` entries (adds, modifies, deletes).
## Ignore System (`src/ignore.rs`)
Reads `.arcignore` first, falling back to `.ignore`. Always ignores `.arc/` and
`.git/`.
- `*` and `?` glob wildcards.
- `!` prefix for negation.
- Patterns without `/` match any path component; patterns with `/` match the
full relative path.
- Patterns ending with `/` match directories only.
## Git Bridge (`src/bridge.rs`)
Maintains a shadow bare git repository under `.arc/git/`.
`GitMap` provides bidirectional mapping (`arc_to_git` / `git_to_arc`) persisted
in `.arc/git-map.yml`.
- `arc_to_git` recursively converts arc commits to git commits, materializing
file trees as git tree objects.
- `git_to_arc` does the reverse, computing deltas from git tree diffs.
- SSH authentication via agent or key files (`~/.ssh/id_ed25519`, `id_rsa`,
`id_ecdsa`).
## Merge (`src/merge.rs`)
Full three-way merge with line-level merging for text files using Myers diff.
Conflicts are marked with `<<<<<<< ours` / `=======` / `>>>>>>> theirs`.
Binary files fall back to keeping the "ours" version on conflict.
## Signing (`src/signing.rs`)
Optional SSH key signing using the `ssh-key` crate. Signs with SHA-512 under
the `arc` namespace. Verification extracts the public key from the signature
itself. Supports `~` expansion in key paths.
## Source Modules
| Module | Responsibility |
|--------|----------------|
| `main.rs` | Entry point, macro definitions |
| `cli.rs` | Clap-based CLI parsing and command dispatch |
| `model.rs` | Core data types |
| `store.rs` | Commit/delta serialization and content-addressing |
| `tracking.rs` | Worktree scanning, change detection, commit logic |
| `repo.rs` | Repository init/open/discover, path validation |
| `config.rs` | YAML config loading, merging (local-first), effective config |
| `refs.rs` | Bookmark/tag CRUD, switch, worktree write/clean |
| `bridge.rs` | Git bridge (shadow repo, push, pull, clone, migrate, sync) |
| `diff.rs` | Unified diff rendering |
| `inspect.rs` | Log, show, history (blame), Myers diff |
| `merge.rs` | Three-way merge |
| `modify.rs` | Reset, revert, merge command, graft |
| `resolve.rs` | Target/range resolution (bookmarks, tags, prefixes, HEAD) |
| `ignore.rs` | Ignore file parsing and matching |
| `signing.rs` | SSH commit signing and verification |
| `stash.rs` | Named stash system |
| `remote.rs` | Remote management (remotes.yml) |
| `error.rs` | Error types |
| `ui.rs` | Colored output formatting |