#18137 · @BYK · opened Mar 18, 2026 at 9:37 PM UTC · last updated Mar 21, 2026 at 10:24 AM UTC
fix(opencode): reduce memory usage during prompting with lazy boundary scan and context windowing
Score breakdown
Impact
Clarity
Urgency
Ease Of Review
Guidelines
Readiness
Size
Trust
Traction
Summary
This PR implements two targeted optimizations, lazy boundary scanning and context windowing, to drastically reduce peak memory usage during prompting in opencode from 4-8GB down to ~1.2GB. This addresses a critical performance bottleneck identified in issue #18136.
Description
Issue for this PR
Closes #18136
Type of change
- [x] Bug fix
- [ ] New feature
- [x] Refactor / code improvement
- [ ] Documentation
What does this PR do?
Two targeted optimizations to reduce peak RSS during prompting from ~4-8GB down to ~1.2GB:
1. Lazy compaction boundary scan (filterCompactedLazy)
The prompt loop calls filterCompacted(stream(sessionID)) which streams ALL messages newest→oldest, loading parts for every message. For compacted sessions, most of those parts are discarded once the boundary is found.
New approach: probe the newest 50 message infos (1 DB query, no parts). If a compaction summary is detected, use a two-phase scan — info-only scan to find the boundary, then hydrate parts only for messages after it. If no compaction summary is found, fall back to the original single-pass filterCompacted(stream()) to avoid wasted info-only queries.
2. Context-window message windowing
toModelMessages was called with ALL messages (e.g., 7,704 for a long session), creating ModelMessage wrapper objects for every one. These flow through 4-5 copy layers (toModelMessages → convertToModelMessages → ProviderTransform.message → convertToLanguageModelPrompt), each creating ~60MB of wrapper objects.
Now the prompt loop estimates which messages from the tail fit in the LLM context window (model.limit.context × 4 chars/token) and only passes those to toModelMessages. For a 7,704-message session where ~200 fit, this cuts the conversion pipeline from ~300MB to ~10MB.
3. Prompt loop caching
The conversation is loaded once before the loop. On normal tool-call iterations, only the latest 200-message page is fetched and merged into the cache. Full reload only happens after compaction.
How did you verify your code works?
- Monitored RSS with
/proc/<PID>/statusevery 30s during active prompting - Before: peak 4.8GB, idle 1GB
- After: peak 1.2GB, idle ~580MB
- All session tests pass (118 pass, 4 skip, 0 fail)
- Tested with both compacted and uncompacted sessions
Screenshots / recordings
Memory monitoring (30s intervals) after fix:
time,rss_mb,hwm_mb
20:46:02,942,1236 ← active prompting
20:48:02,1020,1236 ← peak during tool calls
20:50:02,606,1236 ← settled after activity
20:55:32,568,1236 ← stable idle
Checklist
- [x] I have tested my changes locally
- [x] I have not included unrelated changes in this PR
Linked Issues
#18136 perf: prompt loop loads entire conversation history into memory on every step
View issueComments
No comments.
Changed Files
packages/app/src/components/dialog-connect-provider.tsx
+1−1packages/app/src/components/dialog-custom-provider.tsx
+1−1packages/opencode/src/cli/cmd/tui/component/dialog-provider.tsx
+1−1packages/opencode/src/session/message-v2.ts
+102−0packages/opencode/src/session/prompt.ts
+43−2