Optimize
10 practical techniques to get the most out of your monthly Copilot allowance — without sacrificing quality.
10 Tips
Actionable techniques
3 Phases
Plan, implement, monitor
1 Goal
More output, fewer credits
Match the Model to the Task
More capable models cost more per token. Use the right tool for each job.
Light Models
GPT-4o mini, Claude Haiku
Best for
- Quick edits
- Boilerplate generation
- Straightforward Q&A
- Auto-complete tasks
Balanced Models
GPT-4o, Claude Sonnet
Best for
- Code review
- Feature implementation
- Documentation writing
- General debugging
Reasoning Models
o1, Claude Opus
Best for
- Complex refactoring
- Architectural decisions
- Multi-step debugging
- System design
Auto Model Selection
Let VS Code route each request to an efficient model that balances quality and cost. The model picker in chat shows cost details in the hover menu, including cost per token type and a generic cost tier label.
10 Ways to Extend Your Credits
Each technique independently reduces token consumption. Stack them for maximum efficiency.
Plan Before You Implement
Separate planning and implementation phases. Use a reasoning model for planning, then switch to a faster model for execution once the plan is solidified.
Thinking Effort Defaults
VS Code sets default effort levels with adaptive reasoning. Only increase thinking effort for genuinely complex problems.
New Chat, New Task
Start fresh when changing topics. Accumulated context from previous messages consumes tokens without improving results.
Leverage Forking
Fork the conversation to explore alternatives without re-establishing context. Type /fork to branch from the current message.
Disable Unneeded Tools
Every tool call produces output that consumes context window space. Disable tools and MCP servers you don't need for the current task.
Exclude Files from Context
Large generated files, build outputs, and irrelevant directories inflate token usage. Use .gitignore and files.exclude to keep context lean.
Manage Context with /compact
When a conversation grows long, use /compact to summarize older parts and reclaim context window space. Add optional instructions to guide the summary.
Monitor Your Usage
View current Copilot usage in the Status Bar dashboard. It shows percentage of monthly allowance used for AI credits.
Inspect Token Usage & Caching
Use Agent Debug Logs to understand credit consumption. The Cache Explorer shows prompt cache hit rates — cached tokens cost less.
Separate Planning from Implementation
The biggest credit drain is generating code before the approach is right. Two-phase workflow fixes this.
Planning
Implementation
Why This Saves Credits
Jumping straight into code generation with a reasoning model for the entire session wastes credits if the approach is wrong. By separating phases, you use the expensive model only for the short planning phase, then switch to a cost-effective model for the longer implementation phase — where most tokens are consumed.
85% of tokens are in implementation — use the cheapest model there.
Know Where Your Credits Go
The Agent Debug Logs and Copilot Status Dashboard give you full visibility into token consumption.
Summary View
Aggregate token usage for the session, total tool calls, and overall duration.
Cache Explorer
Prompt cache hit rates and how many input tokens were reused from previous requests.
Monthly AI Credit Allowance
Copilot Status Dashboard
Available via the VS Code Status Bar. Shows percentage of monthly allowance used. Run /chronicle:cost-tips for personalized recommendations.
Start Optimizing Today
Run /chronicle:cost-tips in any chat session to get personalized recommendations based on your recent activity.
/chronicle:cost-tips