Optimise AI usage

GitHub Copilot now uses usage-based billing rather than premium requests, where each AI request consumes tokens from your user budget. This guide provides practical strategies to minimise token consumption whilst maintaining code quality and development velocity.

Why Optimising AI usage matters

With usage-based billing, every interaction with Copilot, except inline code suggestions, consumes AI credits. By adopting the strategies below, you can reduce costs without sacrificing productivity.

1. Choose the correct model

This is the single most effective way to reduce token consumption. Different models have different capabilities and token costs. Using a high-tier model for a simple task wastes tokens, while using a low-tier model for a complex task may require multiple iterations, also wasting tokens.

Model tiers

Reasoning models (Claude Opus, GPT-5.5)

Best for: Complex architectural decisions, multi-file refactoring, significant changes affecting the codebase
Token cost: Highest
Use when: Problems cannot be solved with simpler models

Mid-tier models (Claude Sonnet, GPT-5.4)

Best for: General code generation, bug fixes, writing tests, code review
Token cost: Moderate
Use by default for most coding tasks

Low-tier models (Claude Haiku, GPT-mini)

Best for: Simple completions, quick explanations, documentation
Token cost: Lowest
Use for straightforward, well-scoped tasks

Auto mode

Enable auto-model selection in your GitHub Copilot settings. Copilot’s auto mode automatically detects task intent and routes requests to the appropriate model tier. This removes the burden of manual selection whilst optimising costs.

2. Provide clear, precise prompts

The quality of your prompt directly affects token consumption. Vague prompts often lead to unsatisfactory results, forcing you to re-prompt and consume more tokens. Clear, precise prompts reduce iteration cycles. It’s better to focus on improving agent quality through better prompts than to rely on more powerful models to fix poor instructions.

Best practices for prompt engineering

Be explicit about requirements - Include output format, constraints, and edge cases
Add stop signals - Specify where Copilot should stop generating (e.g., “Stop after the function definition”)
Provide known context beforehand - Include relevant code snippets, function signatures, or architectural patterns above your request. This avoids forcing Copilot to search your codebase
Optimise for quality - A single well-crafted prompt is cheaper than multiple attempts to fix poor initial results
Use code comments - Add inline comments that explain what the next code block should do; this reduces the need for lengthy written prompts

Example: Instead of “write a function to process data”, write:

Write a function that:
- Takes a list of dictionaries containing user records
- Filters records where age >= 18
- Returns a sorted list (by name, ascending)
- Raises ValueError if input is not a list

3. Follow a research, plan, implement workflow

Structured development reduces backtracking and prompting cycles.

Research - Understand the problem space. Browse existing code, documentation, and dependencies before asking Copilot for help
Plan - Outline your approach with comments or pseudocode. Let Copilot fill in the details rather than generating from scratch
Implement - Use Copilot for code generation now that intent is clear
Validate - Run tests and verify output before moving on

This workflow keeps Copilot focused and prevents token waste on exploration or rework.

4. Add deterministic controls

Automated checks reduce the need for AI-assisted code review, freeing up tokens for tasks only AI can handle.

Essential tools

Unit tests - Run before committing. Catch errors early and reduce need for Copilot-assisted debugging
Linters and formatters - Use ESLint, Pylint, Prettier, black, etc. to enforce style; Copilot respects existing patterns
Security scanning - Tools like SAST (static application security testing) find vulnerabilities without AI; use Copilot only for remediation advice
Type checking - TypeScript, mypy, etc. catch errors early; this reduces tokens spent on fixes later

These tools provide fast feedback without token cost, allowing Copilot to focus on higher-value tasks like architecture and complex logic.

5. Maintain concise copilot-instructions.md

Copilot Instructions (.copilot-instructions.md) guide Copilot’s behaviour within your repository. A concise, well-maintained file:

Reduces token waste from incorrect outputs (Copilot doesn’t have to re-read bloated instructions)
Serves as an agent-miss log - document why agents performed poorly so Copilot learns
Trims unnecessary output - specify output length and format preferences to avoid re-prompting

Keep instructions under 2,000 tokens. Focus on:

Repository structure and architecture
Coding standards specific to your project
Known limitations or gotchas
Examples of preferred patterns

Additional tools and techniques

rtk - CLI token reducer

rtk is a CLI proxy that reduces token consumption by 60-90% on common development commands. It summarises command output before sending to LLMs.

Installation (Windows):

# Inside WSL
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
rtk init -g

Usage:

# Use rtk as a prefix to reduce output verbosity
rtk ls .

Copilot CLI insights

Use the Chronicle slash command in the GitHub Copilot CLI to analyse your token usage patterns:

copilot # Starts GitHub Copilot CLI
/chronicle tips

This command shows:

Current usage statistics
Efficiency recommendations
Model distribution

Run this regularly to identify further optimisation opportunities.

Practical workflow examples

Here’s how to apply these strategies to typical tasks:

Task 1: Add input validation to a function

Research - Review existing validation patterns in your codebase (look for type checks, range validation, null handling)
Plan - Comment the validation rules (type, range, required fields) above the function
Choose model - Use a low-tier model or auto-mode for straightforward validation
Craft prompt - “Add input validation to this function (attach appropriate context). Include checks for type, range, and required fields”
Implement - Accept Copilot’s suggestion
Validate - Run unit tests, linter, security scan
Optimise - If tests fail, debug locally before re-prompting

Task 2: Write end-to-end tests for a user workflow

Research - Review existing end-to-end test patterns and test frameworks in use
Plan - Outline test cases for the workflow: successful path, validation errors, edge cases
Choose model - Use a mid-tier model; test design benefits from better reasoning
Craft prompt - “Write end-to-end tests for the user registration workflow (attach appropriate context). Include successful registration, validation errors, and duplicate email scenarios”
Implement - Accept Copilot’s suggestion
Validate - Run the test suite, check code coverage, verify against acceptance criteria
Optimise - If tests fail or coverage is incomplete, debug locally before re-prompting

Task 3: Create infrastructure-as-code for environment provisioning

Research - Review existing infrastructure definitions and provisioning patterns in your repository
Plan - Comment the infrastructure requirements (networking, security groups, scaling, health checks)
Choose model - Use a mid-tier model; infrastructure design requires architectural decision-making
Craft prompt - “Create Azure resources for this infrastructure using Terraform (attach appropriate context). Include networking, security groups, auto-scaling, and health checks”
Implement - Accept Copilot’s suggestion
Validate - Run syntax validation with tflint, deploy to staging, verify all resources are provisioned correctly
Optimise - If deployment fails, debug locally before re-prompting

These examples reduce token consumption compared to exploratory approaches by following a structured research → plan → implement → validate workflow.

Key takeaways

Match models to task complexity; use auto mode
Invest time in clear, precise prompts to reduce iteration cycles
Structure development as research → plan → implement
Use deterministic tools (tests, linters, security scanning) to reduce AI workload
Keep copilot-instructions.md concise and up-to-date
Monitor usage with /chronicle tips
Consider tools like rtk to reduce token consumption on CLI commands

By following these practices, you’ll develop faster, spend fewer tokens, and maintain higher code quality.