Backend architecture
The Alcoves backend is a pure JSON API written in Go. It runs as a single binary
against PostgreSQL, a Dragonfly/Redis-compatible queue, and either local or S3
blob storage. It serves no HTML — the Nuxt frontend is a separate process that
proxies /api/** and /s/** to this service.
If you are contributing a new feature, adding a service, or debugging a production issue, this page gives you the mental model you need: how the process boots, how auth and library access are enforced before your handler runs, how async work is queued and tracked, and where the extension points are.
For the overall system topology, see Architecture. For environment variables and operator configuration, see Configuration.
Key dependencies
Section titled “Key dependencies”| Library | Role |
|---|---|
| Echo v4 | HTTP framework, routing, middleware |
GORM + gorm.io/driver/postgres | ORM over PostgreSQL |
| Goose v3 | SQL migrations (embedded, applied at startup) |
| Asynq | Redis-backed async job queue and inspector UI |
| govips (libvips) | Image processing (CGO) |
| onnxruntime_go | CPU-only ONNX Runtime bindings for face, object, and audio ML |
| golang.org/x/oauth2 | Google OAuth2 |
| coder/websocket | WebSocket for the real-time activity feed |
| miniredis v2 | In-process Redis for tests |
Native dependencies shipped in the Docker image: libvips, ffmpeg, the ONNX Runtime shared library, and the whisper-cli binary from whisper.cpp.
Startup sequence
Section titled “Startup sequence”The entire process is wired in strict dependency order in backend/cmd/server/main.go.
A contributor adding a new service or route edits this file.
-
Load config — read and validate all
ALCOVES_*environment variables. Startup aborts immediately ifALCOVES_SESSION_SECRETis missing; it is the only required field. -
Connect to the database — open the GORM PostgreSQL connection with a bounded connection pool (
MaxOpenConns=25,MaxIdleConns=5). GORM never manages foreign-key constraints; the schema is owned entirely by Goose migrations. -
Apply migrations — run all pending Goose SQL migrations from the embedded
migrations.FSbefore any handler is registered. Startup aborts on failure. This is what makes a rolling Kubernetes deployment auto-apply schema changes: the first api pod to start applies the migration; later pods see it as already applied. -
Construct services — in dependency order:
- Auth service (AES-GCM cookie crypto)
- Library access service (RBAC)
- Files service, settings service (single-row
app_settingscache) - Storage driver + service
- Asynq client and inspector
- Activity bus (Redis pub/sub) and activity hub (in-process WebSocket fan-out)
- All media and ML services: face detection, object detection, video proxy, transcription, audio detection, waveform, moment export, file hashing, image proxy
-
Pre-fetch ONNX models (
mode=all|worker) — a background goroutine callsEnsureModelsfor face and object detection. Non-fatal: logs a warning on failure and the first job will block while the model downloads lazily. -
Start the Asynq worker (
mode=all|worker) — concurrency 8, with per-job-type queue priorities (weighted by importance ÷ complexity) that let interactive image-transform requests preempt batch ML work and keep the heavy long-runners (video transcode, whisper transcription) from starving fast jobs like thumbnailing. -
Configure Echo — install the custom validator, then the global middleware chain: logger, recover, CORS (strict allowlist, never a wildcard), auth middleware, library access middleware.
-
Register routes — skipped entirely in
workermode. All handlers register under/apivia theirRegisterRoutesmethod. -
Start the activity bus goroutine — subscribes to Redis pub/sub and fans messages out to the in-process WebSocket hub.
-
Listen and graceful shutdown — on interrupt, the Asynq server shuts down first, then Echo with a 10-second timeout.
Runtime modes
Section titled “Runtime modes”A single binary, three behaviors selected by the ALCOVES_MODE environment
variable:
| Mode | HTTP routes | Asynq worker | Activity hub | Typical use |
|---|---|---|---|---|
all (default) | yes | yes | yes | Single-node, local dev |
api | yes | no | yes | Horizontally-scaled request path |
worker | health + version only | yes | no | CPU/RAM-heavy background jobs |
In Kubernetes the Helm chart runs backend-api and backend-worker as
separate deployments from the same image, against the same database, queue, and
shared storage. The worker deployment intentionally has no CPU limit —
whisper.cpp, ffmpeg, and ONNX are bursty; CFS throttling hurts throughput more
than it protects the cluster.
Configuration
Section titled “Configuration”Config is loaded once at startup into a flat struct via ALCOVES_* environment
variables. The only required field is ALCOVES_SESSION_SECRET (≥ 32 bytes; used
as the AES-GCM key for encrypted session cookies).
Beyond the core server and database settings covered in Configuration, the backend exposes tuning knobs for each ML pipeline:
Face detection / recognition
| Variable | Description |
|---|---|
ALCOVES_FACE_DETECTION_MIN_SCORE | Minimum confidence to keep a detected face |
ALCOVES_FACE_RECOGNITION_MAX_DISTANCE | Maximum embedding distance for cluster assignment |
ALCOVES_FACE_RECOGNITION_MIN_FACES | Minimum faces before a cluster becomes a named person |
ALCOVES_MODELS_PATH | Directory where ONNX model files are stored |
Object detection
| Variable | Description |
|---|---|
ALCOVES_OBJECT_DETECTION_MIN_SCORE | Minimum confidence threshold |
ALCOVES_OBJECT_DETECTION_MAX_DETECTIONS | Cap on detections per image |
ALCOVES_OBJECT_DETECTION_NMS_THRESHOLD | Non-maximum suppression threshold |
Whisper transcription
| Variable | Default | Description |
|---|---|---|
ALCOVES_WHISPER_BINARY | whisper-cli | Path to the whisper.cpp CLI binary |
ALCOVES_WHISPER_MODEL | large-v3 | Default model (overridable in admin UI) |
ALCOVES_WHISPER_LANGUAGE | auto | Transcription language |
ALCOVES_WHISPER_VAD_MODEL | silero-v6.2.0 | Voice-activity-detection model |
Audio event detection
| Variable | Default | Description |
|---|---|---|
ALCOVES_AUDIO_DETECT_WINDOW_SEC | 10.0 | Analysis window in seconds |
ALCOVES_AUDIO_DETECT_THRESHOLD | 0.2 | Minimum confidence to record an audio tag |
ALCOVES_AUDIO_DETECT_TOP_K | 5 | Number of top events to retain per window |
The backend builds a strict allowlist; it never reflects a wildcard origin
(required because credentials are attached). The allowlist always includes
the scheme + host from ALCOVES_BASE_URL, any entries in
ALCOVES_EXTRA_CORS_ORIGINS (comma-separated), and in development mode it also
allows localhost:3000 and localhost:5173.
Global middleware
Section titled “Global middleware”Two custom middlewares run after Echo’s built-in logger, recover, and CORS.
Auth middleware
Section titled “Auth middleware”Runs on every request. Decides whether the path requires an authenticated session:
Public paths (no session required):
/api/auth/{login,register,providers,logout,google,google/callback}/api/_auth/session(used by the Nuxt auth guard)/api/share/**(public moment share)/api/health,/api/version,/api/_meta/**GET /api/invites/{token}(invite lookup; the accept POST is guarded inside the handler)
For all other /api/** paths the middleware validates the encrypted session
cookie. On failure it returns 401. On success it sets the authenticated user
and session token into the request context so handlers can read them without
touching the session layer again.
Library access middleware
Section titled “Library access middleware”Activates only for paths shaped like /api/libraries/{id}/.... It parses the
library UUID, requires an authenticated user, then gates by HTTP method:
GET/HEAD/OPTIONS— viewer role or above is sufficientPOST/PUT/PATCH/DELETE— admin or owner role required
The resolved access record (library ID, name, owner, role, IsOwner, IsAdmin)
is placed in the request context for handlers to read.
RBAC rules:
- The library owner always has the owner role.
- Default (personal) libraries are never collaborative — non-owners always resolve to no access.
- Collaborative library members get the role stored in
library_members. - Mutations that only make sense on a collaborative library (e.g. managing members) additionally reject calls against personal libraries.
Endpoints outside /api/libraries/* that still need library scoping (such as
the image/file proxy) call the access service directly and return 404 (not
403) to non-members to avoid leaking that a library exists at all.
Route map
Section titled “Route map”| Route | Purpose |
|---|---|
GET /api/health | Health check — always available, always public |
GET /api/version | Build version, commit SHA, build time |
/api/auth/** | Login, register, logout, session, avatar, Google OAuth |
/api/_auth/session | Session validation for the Nuxt route guard |
/api/libraries | Library CRUD |
/api/libraries/:id/** | Files, folders, tags, moments, members, people, objects, downloads, notifications |
/api/invites/** | Invite lookup (public GET) and accept |
/api/notifications/**, /api/ws | Global notification feed, dismiss, WebSocket |
/api/search | Cross-library full-text search |
/api/admin/** | Owner-gated admin panel and Asynq job-queue dashboard |
/api/tus/** | TUS resumable uploads |
/api/files/proxy/** | Authenticated image transform and file serve |
/api/share/:token/** | Public moment share — no auth, used for OG embeds |
Routes in the worker mode are limited to /api/health and /api/version.
Handler pattern
Section titled “Handler pattern”Every handler in backend/internal/handlers/ follows the same shape:
- A struct holding injected services, constructed in
main.go. - A
RegisterRoutes(g *echo.Group)method that binds methods to paths. - Individual handler functions that read auth context, bind and validate the request body, call services, and serialize a JSON response.
Admin routes layer an extra ownership guard: a middleware checks that the
authenticated user has the owner role and returns 403 otherwise. This guard
wraps both the admin panel and the job-queue dashboard.
Services in backend/internal/services/ contain all business logic and expose
no HTTP routes. They are injected into handlers and into the Asynq worker mux.
Async job model
Section titled “Async job model”Media and ML processing never blocks a request. When a file is uploaded, or a moment’s time range is edited, the API enqueues a job and returns immediately. Status, progress, and results are stored as columns on the affected row and polled by the frontend.
Job state machine
Section titled “Job state machine”Each job type tracks five fields on the row being processed:
| Column | Values |
|---|---|
<job>_status | queued, processing, ready, not_needed, failed |
<job>_progress | 0–100 |
<job>_eta_seconds | nullable, estimated time remaining |
<job>_error | nullable, failure reason |
<job>_version / <job>ed_version | optimistic versioning pair |
Jobs tracked this way: proxy, transcribe, audio_detect, waveform (on
files) and export (on moments).
Queue tasks
Section titled “Queue tasks”| Task | Queue | What it does |
|---|---|---|
image:proxy | imageproxy | On-demand image transforms (resize, crop, format) |
file:metadata | metadata | EXIF/GPS + ffprobe metadata extraction |
video:thumbnail | thumbnail | Extracts a thumbnail from a video file |
file:hash | hash | SHA-256 dedup hash |
moment:export | moment-export | Exports a named moment clip to MP4 |
file:waveform | waveform | Extracts and stores the audio waveform |
object:detect | object-detection | COCO object detection for a file |
face:detect | face-detection | Face detection and embedding for a file |
file:audio-detect | audio-detection | AudioSet audio-event classification |
video:proxy | video-transcode | Video transcoding to HLS/MP4 |
file:transcribe | transcription | whisper.cpp speech transcription |
image:prewarm | maintenance | Hourly pre-warm of every image-proxy variant |
The queue runs at concurrency 8. Each job type has its own queue, weighted by
importance ÷ complexity (see internal/queues): interactive image
transforms rank highest, fast post-upload derivations (metadata, thumbnails,
hashes) next, then ML inference, then the heavy long-runners (full video
transcode, whisper transcription) just above background maintenance. So a long
transcription or transcode job can never queue ahead of an interactive
thumbnail request — and, per the explicit priority intent, whisper sits below
thumbnailing.
Idempotency and re-triggering
Section titled “Idempotency and re-triggering”Jobs are designed to be re-enqueued safely:
-
Idempotency: stateless detectors skip already-processed inputs. Face and object detection skip files that already have detection rows; file hashing skips files with a non-null hash; video proxy skips files already in
readystatus. -
Version-bump re-trigger: to force a re-run, callers increment
<job>_version. The worker captures the target version at job start and discards the result if the version changed mid-flight (e.g. the user edited the moment’s time range while the export was running). This is how moment export handles concurrent edits: each edit bumpsexport_version, and the output is written to a version-stamped cache key ({lib}/moments/{moment}/v{version}.mp4) so old exports survive until a new encode succeeds. -
Audio detection additionally uses Asynq’s uniqueness option (2-hour window) to deduplicate double-enqueues, and replaces detections transactionally (DELETE + bulk INSERT) to avoid a torn read.
-
Image proxy is the only task that can respond inline: it coordinates worker results over Redis pub/sub and falls back to synchronous processing when no queue is configured (useful for tests and minimal dev setups).
Activity events are emitted as best-effort notifications when jobs complete; the durable record is always the database row, and clients re-fetch over HTTP on reconnect.
Data layer
Section titled “Data layer”PostgreSQL via GORM. All primary keys are UUIDs with gen_random_uuid()
database defaults and BeforeCreate hooks in Go. File sizes use bigint to
support files larger than 2 GB. Password hashes and internal session fields are
excluded from JSON serialization.
Migrations via Goose. SQL files are embedded in the binary and applied
automatically at startup. The pgvector extension is bootstrapped in the first
migration. Later migrations add per-job status columns, moments and share
links, collaborative library invites, the activity feed, and the 512-dimension
HNSW vector index used for face recognition. The HNSW index is created with
CREATE INDEX CONCURRENTLY inside a NO TRANSACTION migration so it does not
lock the table.
Face embeddings. The 512-dimension face_detections.embedding column is
a pgvector vector(512) type. It is written and queried via raw SQL with
explicit ::vector casts rather than through GORM’s type system.
Extending the backend
Section titled “Extending the backend”Adding a handler: create a struct in backend/internal/handlers/, add a
RegisterRoutes method, construct it in main.go, and call RegisterRoutes
in the route registration block. Read auth context via the middleware helpers;
call services for business logic.
Adding an async task: add a task-type constant and a ProcessTask function
in the relevant service under backend/internal/services/. Register the handler
in the Asynq mux in main.go. Add the <job>_status / <job>_version columns
via a new Goose migration and track them on the affected GORM model.
Adding a config field: add the field to the Config struct in
backend/internal/config/config.go and read it from the environment with
getEnv. Pass it through to the service constructor in main.go.