ICP-by-platform · AI code review

Prompt-injection scanner for AI code review tools

AI code review tools — CodeRabbit, GitHub Copilot pull request summaries, Cursor, and Sourcegraph Cody — process the full text of pull requests including any images embedded or linked in the PR description, review comments, or linked issues. Images reach these AI reviewers from all PR authors: employees, external contributors, and — for open-source repositories — any internet user with a GitHub account. Unlike enterprise chat tools where identity governance limits who can send messages, pull requests are public submission surfaces. An adversarial image embedded in a PR description or review comment can inject instructions into the AI reviewer's output: produce an unconditional LGTM, suppress security findings, or generate a misleading summary that a human reviewer uses to approve a malicious change. A GitHub Actions step that scans PR images with Glyphward before the AI review runs breaks this attack.

TL;DR

Add a GitHub Actions workflow that triggers on pull_request and pull_request_review_comment events, downloads all image attachments from the PR body and comments via the GitHub REST API, scans each with POST https://glyphward.com/v1/scan, and exits with code 1 if any image scores ≥ 65. Configure this as a required status check so the AI review tool cannot run until the scan passes. Free tier — 10 scans/day, no card required.

Attack surface: where AI code review tools receive unscanned images

PR description images. GitHub Markdown in PR descriptions supports inline images via ![alt](URL) syntax and direct image paste (which uploads to user-images.githubusercontent.com). AI code review tools that read the PR description to generate a summary or initial review comment process these images as part of the PR context. On open-source repositories, any GitHub user can open a PR — meaning the PR description image is authored by a fully untrusted party. The trust model is weaker than most enterprise communication tools: there is no invite or identity verification beyond a GitHub account creation.

Review comment images. PR review comments (via the GitHub pull request review API) also support Markdown images. A user who cannot push code directly (non-contributor, external collaborator with Read access) can still post review comments on any open PR. AI code review tools that read review thread history to provide context-aware suggestions process images from all comment authors. This includes maintainers, contributors, and in some configurations, first-time contributors whose comments are held for moderation but still surface in the review thread.

Linked issue images. Many AI code review tools (CodeRabbit with issue tracker integration, GitHub Copilot with repository indexing) fetch the content of issues linked in the PR description (Closes #123, Fixes #456). Issues can contain images from any user with the ability to comment on issues — which on public repositories is any GitHub user. An attacker who cannot open a PR can instead post a malicious image in an issue comment and then reference that issue in their PR, or wait for a legitimate PR to link an issue they have already poisoned.

CI/CD artefact images and test output screenshots. Some AI code review configurations include CI/CD pipeline outputs as context: screenshot diffs from visual regression test tools (Percy, Chromatic, Playwright report), architecture diagrams auto-generated from code, or coverage report visualisations attached as PR comments by CI bots. These images are generated by pipeline steps that run contributor-supplied code — a PR that modifies the screenshot test fixtures or diagram-generation code can produce adversarial images that are then automatically attached to the PR as CI output and processed by the AI reviewer.

Integration: GitHub Actions workflow

The workflow below triggers on pull request and review comment events, collects all image URLs from the PR body and comment threads via the GitHub REST API, and scans each image with Glyphward before the AI review tool runs. Configure it as a required status check to gate AI review on a clean scan result.

# .github/workflows/scan-pr-images.yml
name: Scan PR images for prompt injection

on:
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]
  pull_request_review_comment:
    types: [created, edited]

permissions:
  pull-requests: read
  statuses: write

jobs:
  scan-images:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install requests
      - name: Scan PR images
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GLYPHWARD_API_KEY: ${{ secrets.GLYPHWARD_API_KEY }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
        run: python .github/scripts/scan_pr_images.py

# .github/scripts/scan_pr_images.py
import os, re, sys, base64, requests

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
PR_NUMBER = int(os.environ["PR_NUMBER"])
REPO = os.environ["REPO"]
THRESHOLD = 65  # stricter than default: open-source PRs are fully untrusted

GH_HEADERS = {"Authorization": f"Bearer {GITHUB_TOKEN}", "Accept": "application/vnd.github+json"}
GW_HEADERS = {"Authorization": f"Bearer {GLYPHWARD_KEY}", "Content-Type": "application/json"}
IMAGE_URL_RE = re.compile(r"!\[.*?\]\((https?://[^\)]+\.(?:png|jpg|jpeg|gif|webp))\)", re.IGNORECASE)

def collect_image_urls() -> list[str]:
    urls = []
    # PR body
    pr = requests.get(f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}", headers=GH_HEADERS).json()
    urls += IMAGE_URL_RE.findall(pr.get("body") or "")
    # PR review comments
    comments = requests.get(f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/comments", headers=GH_HEADERS).json()
    for c in comments:
        urls += IMAGE_URL_RE.findall(c.get("body") or "")
    # Issue comments on the PR thread
    issue_comments = requests.get(f"https://api.github.com/repos/{REPO}/issues/{PR_NUMBER}/comments", headers=GH_HEADERS).json()
    for c in issue_comments:
        urls += IMAGE_URL_RE.findall(c.get("body") or "")
    return list(set(urls))

def scan_url(url: str) -> dict:
    img_bytes = requests.get(url, timeout=10).content
    encoded = base64.b64encode(img_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": "github_pr_review"},
        headers=GW_HEADERS,
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()

image_urls = collect_image_urls()
if not image_urls:
    print("No images found in PR — scan skipped.")
    sys.exit(0)

flagged = []
for url in image_urls:
    try:
        result = scan_url(url)
        print(f"  {url[:80]} → score {result['score']} (ref {result['scan_id']})")
        if result["score"] >= THRESHOLD:
            flagged.append({"url": url, "score": result["score"], "scan_id": result["scan_id"]})
    except Exception as e:
        # Fail-closed: scanner error → block
        print(f"  Scanner error for {url}: {e} — treating as flagged.")
        flagged.append({"url": url, "score": None, "error": str(e)})

if flagged:
    print(f"\n❌ {len(flagged)} image(s) flagged — AI review blocked.")
    for f in flagged:
        print(f"  Score {f.get('score', 'N/A')} | Ref {f.get('scan_id', 'N/A')} | {f['url'][:80]}")
    sys.exit(1)

print(f"\n✅ {len(image_urls)} image(s) scanned — all clean. AI review may proceed.")
sys.exit(0)

Configure this workflow as a required status check in your branch protection rules (Settings → Branches → Add rule → Require status checks to pass before merging → add scan-images / scan-images). This prevents the AI code review tool from running (and potentially approving the PR) until the scan passes. The threshold is set to 65 rather than the default 70 because open-source PR authors are fully untrusted — a lower threshold catches lower-confidence adversarial signals that would pass at 70. Reduce to 60 for repositories with anonymous issue comment access.

Get early access

Coverage matrix

Defence layer	PR description image	Review comment image	CI artefact image
GitHub secret scanning	No — not a secrets pattern	No	No
Semgrep SAST	No — static analysis of code, not images	No	No
Text-only LLM guard (Lakera, LLM Guard)	No — image bytes ignored	No	No
Glyphward GitHub Actions scan	Yes — blocks AI review before it runs	Yes	Yes — if included in PR thread

Related questions

Does CodeRabbit process image attachments in PR descriptions?

CodeRabbit reads the full PR description and any review comments to generate its automated code review. GitHub-hosted images (user-images.githubusercontent.com) in the PR body are downloaded and processed as part of CodeRabbit's multimodal review context. An adversarial image in a PR description can inject instructions into CodeRabbit's review summary, security findings, or suggestion comments. Inserting the Glyphward GitHub Actions step as a required status check before CodeRabbit's review trigger prevents CodeRabbit from running on PRs that contain flagged images.

How does this differ from the CI/CD pipeline AI security page?

The CI/CD pipeline AI security page covers AI-assisted pipeline tools — AI-generated GitHub Actions YAML, AI code assistants wired into CI triggers, and AI that reads pipeline configuration files. This page covers AI code review tools — tools that read the PR itself (description, comments, diff) to produce review summaries and suggestions. The attack surface is different: CI/CD injection targets the pipeline configuration and build artefacts; AI code review injection targets the review output and any automated approval or merge-readiness signals generated by the AI reviewer.

What about Cursor's PR review or codebase understanding features?

Cursor's background indexing reads your repository's code and, in some configurations, linked documentation and README images. Cursor's PR review features (where supported) read the PR diff and linked comments. For Cursor, the highest-exposure surface is the codebase index: if your repository includes image files (diagrams in docs/, screenshots in tests/), Cursor may index and process these. Add a pre-commit hook or CI step that scans any new image files added to the repository using scan_url() or a local file variant of the scan script above — this catches adversarial images before they enter Cursor's index.

How do I apply this to private vs. open-source repositories?

For private repositories with a known set of contributors, you can relax the threshold to 70 (the default) and may choose a fail-open policy for scanner downtime (allow the AI review to proceed if the scanner is unreachable, but log the event). For open-source or public repositories, use threshold 65 and fail-closed (block the AI review on scanner error). The key distinction is contributor identity: private repos have IAM-controlled contributor lists; public repos accept PRs from any internet user. The threat model is proportionally different.

TL;DR

Attack surface: where AI code review tools receive unscanned images

Integration: GitHub Actions workflow

Coverage matrix

Related questions

Further reading