ICML 2026 Desk-Rejects 2% of Papers for LLM Review Policy Violations

ICML 2026 has implemented a two-policy framework for LLM usage in peer review and taken disciplinary action against reviewers who violated their agreed-upon policies. The conference desk-rejected 497 papers, representing approximately 2% of all submissions.

Policy Framework and Violations

ICML 2026 established two distinct policies for LLM use in reviewing:

Policy A (Conservative): No LLM use allowed
Policy B (Permissive): LLMs allowed to help understand papers and related works, and to polish reviews

Reviewers selected which policy they preferred to operate under, with no reviewer who strongly preferred Policy B being assigned to Policy A. The only reviewers assigned to Policy A were those who explicitly selected "Policy A" or "I am okay with either [Policy] A or B."

Detection and Consequences

795 reviews (~1% of all reviews) written by 506 unique reviewers assigned to Policy A were detected to have used LLMs in their review. These reviewers had explicitly agreed not to use LLMs. Every flagged instance was manually verified by a human to avoid false positives.

When a designated Reciprocal Reviewer for a submission produced such a review, their submission was rejected, resulting in 497 total rejections. All Policy A reviews detected to be LLM-generated were removed from the system.

If more than half of the reviews submitted by a Policy A reviewer were detected to be LLM-generated, all of their reviews were deleted and the reviewer was removed from the reviewer pool. 51 Policy A reviewers (about 10% of the 506 detected reviewers) fell into this category.

Technical Detection Method

The detection method involved watermarking submission PDFs with hidden LLM instructions that would subtly influence any review produced via an LLM. The technique:

Created a dictionary of 170,000 phrases
For each paper, sampled two phrases randomly from this dictionary (probability smaller than one in ten billion for any given pair)
Watermarked PDFs with instructions visible only to an LLM, instructing it to include the two selected phrases in the review
These watermarks would not be directly visible to a human reading the PDF

The method was based on recent work by Rao, Kumar, Lakkaraju, and Shah. The conference notes this technique may only catch the most egregious and careless uses of LLMs in reviewing, particularly where reviewers input the PDF to an LLM and directly copy-paste the output.

Impact and Context

The conference emphasized they are not making judgments about the quality of flagged reviews or reviewers' intentions, but simply enforcing the policies reviewers agreed to. The disruption has required removing violating reviews, potentially finding new reviewers, and desk-rejecting some submissions that had already received a full set of reviews.

This approach reflects the broader challenge conferences face in adapting to AI integration in research workflows while maintaining review integrity.

📖 Read the full source: HN LLM Tools