Optimise AI usage

GitHub Copilot now uses usage-based billing rather than premium requests, where each AI request consumes tokens from your user budget. This guide provides practical strategies to minimise token consumption whilst maintaining code quality and development velocity.

Why Optimising AI usage matters

With usage-based billing, every interaction with Copilot, except inline code suggestions, consumes AI credits. By adopting the strategies below, you can reduce costs without sacrificing productivity.

1. Choose the correct model

This is the single most effective way to reduce token consumption. Different models have different capabilities and token costs. Using a high-tier model for a simple task wastes tokens, while using a low-tier model for a complex task may require multiple iterations, also wasting tokens.

Model tiers

Reasoning models (Claude Opus, GPT-5.5)

  • Best for: Complex architectural decisions, multi-file refactoring, significant changes affecting the codebase
  • Token cost: Highest
  • Use when: Problems cannot be solved with simpler models

Mid-tier models (Claude Sonnet, GPT-5.4)

  • Best for: General code generation, bug fixes, writing tests, code review
  • Token cost: Moderate
  • Use by default for most coding tasks

Low-tier models (Claude Haiku, GPT-mini)

  • Best for: Simple completions, quick explanations, documentation
  • Token cost: Lowest
  • Use for straightforward, well-scoped tasks

Auto mode

Enable auto-model selection in your GitHub Copilot settings. Copilot’s auto mode automatically detects task intent and routes requests to the appropriate model tier. This removes the burden of manual selection whilst optimising costs.

2. Provide clear, precise prompts

The quality of your prompt directly affects token consumption. Vague prompts often lead to unsatisfactory results, forcing you to re-prompt and consume more tokens. Clear, precise prompts reduce iteration cycles. It’s better to focus on improving agent quality through better prompts than to rely on more powerful models to fix poor instructions.

Best practices for prompt engineering

  • Be explicit about requirements - Include output format, constraints, and edge cases
  • Add stop signals - Specify where Copilot should stop generating (e.g., “Stop after the function definition”)
  • Provide known context beforehand - Include relevant code snippets, function signatures, or architectural patterns above your request. This avoids forcing Copilot to search your codebase
  • Optimise for quality - A single well-crafted prompt is cheaper than multiple attempts to fix poor initial results
  • Use code comments - Add inline comments that explain what the next code block should do; this reduces the need for lengthy written prompts

Example: Instead of “write a function to process data”, write:

Write a function that:
- Takes a list of dictionaries containing user records
- Filters records where age >= 18
- Returns a sorted list (by name, ascending)
- Raises ValueError if input is not a list

3. Follow a research, plan, implement workflow

Structured development reduces backtracking and prompting cycles.

  1. Research - Understand the problem space. Browse existing code, documentation, and dependencies before asking Copilot for help
  2. Plan - Outline your approach with comments or pseudocode. Let Copilot fill in the details rather than generating from scratch
  3. Implement - Use Copilot for code generation now that intent is clear
  4. Validate - Run tests and verify output before moving on

This workflow keeps Copilot focused and prevents token waste on exploration or rework.

4. Add deterministic controls

Automated checks reduce the need for AI-assisted code review, freeing up tokens for tasks only AI can handle.

Essential tools

  • Unit tests - Run before committing. Catch errors early and reduce need for Copilot-assisted debugging
  • Linters and formatters - Use ESLint, Pylint, Prettier, black, etc. to enforce style; Copilot respects existing patterns
  • Security scanning - Tools like SAST (static application security testing) find vulnerabilities without AI; use Copilot only for remediation advice
  • Type checking - TypeScript, mypy, etc. catch errors early; this reduces tokens spent on fixes later

These tools provide fast feedback without token cost, allowing Copilot to focus on higher-value tasks like architecture and complex logic.

5. Maintain concise copilot-instructions.md

Copilot Instructions (.copilot-instructions.md) guide Copilot’s behaviour within your repository. A concise, well-maintained file:

  • Reduces token waste from incorrect outputs (Copilot doesn’t have to re-read bloated instructions)
  • Serves as an agent-miss log - document why agents performed poorly so Copilot learns
  • Trims unnecessary output - specify output length and format preferences to avoid re-prompting

Keep instructions under 2,000 tokens. Focus on:

  • Repository structure and architecture
  • Coding standards specific to your project
  • Known limitations or gotchas
  • Examples of preferred patterns

Additional tools and techniques

rtk - CLI token reducer

rtk is a CLI proxy that reduces token consumption by 60-90% on common development commands. It summarises command output before sending to LLMs.

Installation (Windows):

# Inside WSL
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
rtk init -g

Usage:

# Use rtk as a prefix to reduce output verbosity
rtk ls .

Copilot CLI insights

Use the Chronicle slash command in the GitHub Copilot CLI to analyse your token usage patterns:

copilot # Starts GitHub Copilot CLI
/chronicle tips

This command shows:

  • Current usage statistics
  • Efficiency recommendations
  • Model distribution

Run this regularly to identify further optimisation opportunities.

Practical workflow examples

Here’s how to apply these strategies to typical tasks:

Task 1: Add input validation to a function

  1. Research - Review existing validation patterns in your codebase (look for type checks, range validation, null handling)
  2. Plan - Comment the validation rules (type, range, required fields) above the function
  3. Choose model - Use a low-tier model or auto-mode for straightforward validation
  4. Craft prompt - “Add input validation to this function (attach appropriate context). Include checks for type, range, and required fields”
  5. Implement - Accept Copilot’s suggestion
  6. Validate - Run unit tests, linter, security scan
  7. Optimise - If tests fail, debug locally before re-prompting

Task 2: Write end-to-end tests for a user workflow

  1. Research - Review existing end-to-end test patterns and test frameworks in use
  2. Plan - Outline test cases for the workflow: successful path, validation errors, edge cases
  3. Choose model - Use a mid-tier model; test design benefits from better reasoning
  4. Craft prompt - “Write end-to-end tests for the user registration workflow (attach appropriate context). Include successful registration, validation errors, and duplicate email scenarios”
  5. Implement - Accept Copilot’s suggestion
  6. Validate - Run the test suite, check code coverage, verify against acceptance criteria
  7. Optimise - If tests fail or coverage is incomplete, debug locally before re-prompting

Task 3: Create infrastructure-as-code for environment provisioning

  1. Research - Review existing infrastructure definitions and provisioning patterns in your repository
  2. Plan - Comment the infrastructure requirements (networking, security groups, scaling, health checks)
  3. Choose model - Use a mid-tier model; infrastructure design requires architectural decision-making
  4. Craft prompt - “Create Azure resources for this infrastructure using Terraform (attach appropriate context). Include networking, security groups, auto-scaling, and health checks”
  5. Implement - Accept Copilot’s suggestion
  6. Validate - Run syntax validation with tflint, deploy to staging, verify all resources are provisioned correctly
  7. Optimise - If deployment fails, debug locally before re-prompting

These examples reduce token consumption compared to exploratory approaches by following a structured research → plan → implement → validate workflow.

Key takeaways

  • Match models to task complexity; use auto mode
  • Invest time in clear, precise prompts to reduce iteration cycles
  • Structure development as research → plan → implement
  • Use deterministic tools (tests, linters, security scanning) to reduce AI workload
  • Keep copilot-instructions.md concise and up-to-date
  • Monitor usage with /chronicle tips
  • Consider tools like rtk to reduce token consumption on CLI commands

By following these practices, you’ll develop faster, spend fewer tokens, and maintain higher code quality.

Further reading