GitHub Copilot updates data usage policy for model training

Policy change details
GitHub announced that from April 24, 2026 onward, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve their AI models unless users opt out. Copilot Business and Copilot Enterprise users are not affected by this update.
If you previously opted out of data collection for product improvements, your preference has been retained. You can opt out in settings under "Privacy."
What data is collected
The interaction data that may be collected and leveraged includes:
- Outputs accepted or modified by you
- Inputs sent to GitHub Copilot, including code snippets shown to the model
- Code context surrounding your cursor position
- Comments and documentation you write
- File names, repository structure, and navigation patterns
- Interactions with Copilot features (chat, inline suggestions, etc.)
- Your feedback on suggestions (thumbs up/down ratings)
What data is NOT used
This program does not use:
- Interaction data from Copilot Business, Copilot Enterprise, or enterprise-owned repositories
- Interaction data from users who opt out of model training in their Copilot settings
- Content from your issues, discussions, or private repositories at rest
GitHub notes they use the phrase "at rest" deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.
Data sharing and background
The data used in this program may be shared with GitHub affiliates, including Microsoft. This data will not be shared with third-party AI model providers or other independent service providers.
GitHub states they've already been incorporating interaction data from Microsoft employees and have seen meaningful improvements, including increased acceptance rates in multiple languages. They will also begin using interaction data from GitHub employees.
GitHub's initial models were built using a mix of publicly available data and hand-crafted code samples.
📖 Read the full source: HN LLM Tools
👀 See Also

Meta Releases BOxCrete AI Model for Concrete Mix Design
Meta has released Bayesian Optimization for Concrete (BOxCrete), an open-source AI model for designing sustainable concrete mixes using U.S.-produced materials. The model improves on previous versions with better noise robustness and slump prediction capabilities.

Two Research Projects Challenge Imitation Learning for Web Agents
Two research projects demonstrate limitations of imitation-only training for web agents: 'Browser in the Loop' uses RL with an 8B-parameter model to improve form submission success, while 'Concentrate or Collapse' shows standard RL fails with diffusion language models, requiring sequence-level optimization.

Analyzing Claude's 1M Context Window Token Burn: Data Shows Unbounded Growth and Cache Miss Compounding
Analysis of Claude's 1M context window reveals two compounding factors causing rapid token consumption: unbounded context growth without auto-compaction and expensive cache misses at larger context sizes. The author provides a Python script to analyze personal token usage from JSONL session files.

Stanford Report Shows AI Experts and Public Have Diverging Views on AI Impact
Stanford's annual AI industry report reveals significant gaps between AI experts' optimism and public anxiety, with experts focusing on AGI risks while the public worries about jobs, medical care, and utility costs.