Blog · Aviation Security · 2026-06-11

eVTOL AI security: the multimodal prompt injection attack surface in urban air mobility

Joby Aviation, Archer Aviation, and Wisk Aero are entering commercial passenger service in 2026. Every critical phase of their flight operations — obstacle detection, vertiport approach, passenger boarding, and airspace management — depends on AI systems that process image data. Text-only prompt injection scanners are blind to every one of those attack surfaces. As FAA Type Certification and EASA CS-SC-VTOL-01 compliance obligations extend to AI robustness, the security gap is no longer a theoretical concern; it is a certification prerequisite that the industry has not yet closed.

TL;DR

eVTOL aircraft process image data at four safety-critical AI surfaces: obstacle detection and avoidance, vertiport approach guidance, biometric passenger boarding, and UTM airspace management. Physical adversarial patches on rooftop obstacles, adversarial patterns in approach corridors, synthesised boarding images, and injected map tile overlays are all pixel-domain attacks that text-only scanners cannot see. FAA Special Conditions require DO-326A Security Risk Assessments for AI perception systems; EASA AMC20-152A requires adversarial robustness testing under CS-SC-VTOL-01. The detailed compliance map for eVTOL AI is on the urban air mobility eVTOL AI prompt injection page; this post is the structural argument for why multimodal scanning is now a certification-path requirement, not just a security best practice.

1. The eVTOL AI stack: four safety-critical vision surfaces

Urban air mobility (UAM) platforms differ from conventional fixed-wing aircraft in their operational environment — low altitude, dense urban airspace, autonomous or highly automated flight, vertiport operations with minimal separation from structures — and in the degree to which AI-based perception replaces traditional instrument-based navigation. Understanding the attack surface requires understanding each AI function and the image data it consumes.

Obstacle detection and avoidance

eVTOL aircraft operate at altitudes of 150–600 metres in urban airspace that conventional aviation avoids: below the instrument flight rules floor, amid buildings, cranes, communication towers, and other aircraft. The obstacle detection system typically fuses input from multiple forward-facing, side-facing, and downward-facing cameras with LiDAR point clouds. The vision AI component — usually a real-time object detection model in the YOLO family or a learned depth-estimation model — classifies detected contacts into threat categories and drives avoidance manoeuvres when a contact falls within the safety envelope.

The camera input to this system is pixel data: the raw image frame from each camera. A physical adversarial patch on a rooftop obstacle — a building antenna, crane arm, or parapet edge — can cause the detection model to classify the object as clear airspace or as a lower-threat category that does not trigger avoidance. The technique is identical in principle to the adversarial patch attacks documented against autonomous vehicle perception systems; the difference is that the consequence of a successful attack is loss of separation at altitude rather than a lane-change error at ground level.

Unlike software adversarial examples, which require pixel-level control of a digital input, physical adversarial patches work under real-world lighting variation, viewpoint change, and motion blur. The Expectation over Transformations (EoT) family of attacks explicitly optimises patches for robustness under these conditions, making them practical against deployed aerial perception systems with known model architectures — information that is often available from vendor documentation and OEM certification submissions.

Vertiport approach and landing guidance

Vertiport operations are more constrained than conventional runway approaches: the landing pad may be a rooftop platform with less than 50 metres of clearance on all sides, approached in a steep, curved descent path that differs from approach geometry at traditional airports. The approach guidance AI — which in automated configurations provides lateral and vertical guidance corrections to the flight control system — relies on visual interpretation of the approach corridor: the landing pad markers, surrounding structure geometry, and approach lighting.

Adversarial patterns applied to the vertiport surface or surrounding structures can distort the approach guidance AI's interpretation of the corridor geometry. A pattern that causes the AI to perceive the landing pad as 5 metres to the left of its actual position creates a systematic guidance error that compounds through the final approach phase. This is not a theoretical threat: the airport and aviation security AI context has documented analogous vision attacks against runway incursion detection systems at ground level; vertiport approach guidance is the same computer vision problem at a tighter margin.

Passenger biometric boarding at vertipad gates

Urban air mobility operations are designed for high throughput with minimal ground time — a vertiport target of under 5 minutes from passenger arrival to wheels-up. Biometric boarding via facial recognition replaces physical document checking in most UAM operator concepts of operation: the passenger is identified and boarding-authorised by matching a camera capture to a pre-enrolled identity record. This is architecturally identical to the biometric identity verification systems deployed at conventional airport gates, with the same vulnerability profile: a synthesised face image, a high-quality printout, or a deepfake video stream can inject a false identity match and authorise an unauthorised passenger or deny boarding to a legitimate one.

The safety-security intersection is specific to the UAM context: eVTOL aircraft carry 4–6 passengers in an automated-flight configuration with no flight attendant and no in-flight intervention capability. An attacker who boards via biometric boarding bypass is in an aircraft with no crew, in low-altitude urban airspace, with access to any physical controls the cabin provides. The boarding gate is the last physical security checkpoint before the flight profile executes autonomously.

UTM / U-Space airspace management displays

Urban air mobility operations are managed through UTM (Uncrewed Traffic Management) or U-Space (in the EU framework) systems — digital airspace management platforms that aggregate aircraft positions, weather data, obstacle databases, and restricted airspace boundaries into a shared operational picture. Ground-based UTM operators and in-cockpit AI-assisted displays both ingest this data in formats that include map tile imagery, camera feed overlays from vertiport surveillance systems, and rendered airspace visualisations.

Injecting adversarial content into a UTM map tile feed — by compromising a tile server, injecting into a data link, or crafting malicious tiles that are fetched by the display system — can cause the UTM AI to misrepresent traffic separation, obstacle positions, or restricted airspace boundaries. This is an extension of the satellite remote sensing AI injection attack class: the adversarial input is a spatial image asset (map tile, overhead camera frame) rather than a conversational text query, and text-only prompt injection defences provide zero coverage.

2. Why text-only scanners cannot close these gaps

The structural blindness of text-only prompt injection scanners to pixel-domain payloads is well-documented. For eVTOL AI, the situation is more acute than in most other deployments because all four primary attack surfaces process image data exclusively — there is no text layer to scan.

The channel mismatch problem

Obstacle detection AI receives camera frames. Approach guidance AI receives imagery of the approach corridor. Biometric boarding AI receives a face photograph. UTM display AI receives map tiles and camera feeds. In each case, the attack arrives in the pixel channel, and the defensive scanner must operate on the pixel channel. A text-based classifier placed upstream of any of these systems is positioned at a channel that carries nothing the attacker is exploiting. It will always return a clean verdict — not because there is no attack, but because it is scanning the wrong modality.

This is not a limitation that can be addressed by improving the text classifier's accuracy. The classifier may be extremely accurate at classifying text. The attack is not in the text. The FigStep, AgentTypo, and WhisperInject attack classes all share this structural property: they move the payload from the text layer to the image or audio layer precisely because doing so bypasses text classifiers that are otherwise robust.

The false compliance documentation problem

A UAM operator that deploys a text-only prompt injection scanner on its AI pipeline and cites that deployment in a DO-326A Security Risk Assessment or EASA AMC20-152A robustness evidence package has produced documentation that is accurate about what it covers — text inputs — and silent about what it does not cover — the image inputs that carry all four primary attack vectors. The documentation does not say the image attack surface is protected; but its presence in the compliance package creates an implicit completeness signal that security reviewers and certification auditors may accept at face value.

FAA Special Conditions for eVTOL cybersecurity require threat conditions to be enumerated, mitigated, and verified. Adversarial AI input manipulation is an enumerated threat condition in the FAA's documentation on AI/ML-based aircraft functions (the FAA's AI/ML Roadmap and the 2024 AC on machine learning risk mitigation both identify adversarial inputs as a category requiring specific control). A Security Risk Assessment that documents a text-only control against an attack that operates in the image layer has not satisfied the threat condition — it has documented a control that cannot address it.

The physical adversarial patch exception

Most prompt injection frameworks are designed for software-layer attacks: a user sends a malicious text or image input to a web API, and the scanner sits in that API pipeline. Physical adversarial patches present a different integration challenge: the attack surface is the physical environment the aircraft flies through, not a software input channel. The scanner must run on the raw camera frame at the sensor fusion layer — inside the avionics stack, not on an API gateway — to intercept the attack before the perception model call.

This is not an argument against scanning; it is an argument for where the scanner must be integrated. Multimodal scanning of camera frames before the YOLO or perception model call is architecturally feasible: the scan is a forward pass through a lightweight anomaly classifier on the raw frame bytes, running at the same cadence as the perception model. At Glyphward's published latency targets (p95 under 200 ms for a 4 MP frame), the overhead is compatible with the frame rates required for real-time obstacle detection. The integration point is the camera DMA buffer read → scan gate → perception model call pipeline. A text scanner has no integration point in this pipeline because the pipeline contains no text.

3. The regulatory stakes: FAA, EASA, and the certification clock

Unlike most sectors where AI security is a recommended practice, eVTOL AI security is becoming a certification prerequisite. Three regulatory frameworks create binding obligations for the manufacturers and operators entering commercial service in 2026.

FAA Special Conditions and DO-326A/DO-178C

The FAA issues Special Conditions alongside each eVTOL Type Certificate application to address novel features not covered by existing airworthiness standards. Joby Aviation, Archer Aviation, and Wisk Aero have all received FAA Special Conditions documents addressing cybersecurity for their aircraft systems. These Special Conditions adopt DO-326A (Airworthiness Security Process Specification) as the acceptable means of compliance for airworthiness security — the aviation equivalent of a secure development lifecycle — and require a Security Risk Assessment (SRA) identifying threat conditions, their severity, and the controls applied.

For AI-based perception systems, DO-326A's SRA requirement intersects with the FAA's developing guidance on AI/ML in aviation: the 2024 draft AC on ML-based aircraft functions identifies adversarial input manipulation as a category of failure condition requiring specific design assurance. An SRA that does not enumerate adversarial image injection as a threat condition for obstacle detection or approach guidance AI — or that enumerates it but documents only a text-based control — will face a Finding during the certification audit. Findings block Type Certificate issuance until resolved. The certification clock is running: Joby expects to achieve Type Certificate by 2025-2026; any unresolved Findings in AI security will directly delay commercial service entry.

EASA CS-SC-VTOL-01 and AMC20-152A

EASA's certification specification for small VTOL aircraft (CS-SC-VTOL-01) governs eVTOL certification in the EU, and EASA has separately published AMC20-152A — Acceptable Means of Compliance for AI/ML-based airborne systems — which establishes the specific evidence package required for AI functions used in safety-relevant roles. AMC20-152A requires robustness testing against out-of-distribution inputs, including adversarial examples, for all Level A and Level B AI functions (those whose failure contributes to catastrophic or hazardous conditions).

An obstacle detection function whose failure contributes to collision with an obstacle is a Level A or Level B function. An approach guidance AI whose failure contributes to a controlled-flight-into-terrain scenario is similarly classified. AMC20-152A's robustness requirements for these functions are not satisfiable by text-only scanning evidence — the adversarial input class being tested is image-layer manipulation, and the evidence must document that the control operates on image-layer inputs.

The full eVTOL regulatory compliance map details the specific AMC20-152A sections, the EASA AI Roadmap references, and how multimodal scan records satisfy the evidence requirements for each AI function level. For teams working on eVTOL certification submissions to EASA, the aerospace and defence AI security page covers the broader DO-178C/DO-254 context in which AMC20-152A evidence sits.

P.L. 117-203 (AAM Act) and FAA Reauthorization 2024

The Advanced Air Mobility Act of 2022 (P.L. 117-203) directed the FAA to develop standards for AAM operations, including airspace integration and safety requirements for UAM corridors. The 2024 FAA Reauthorization extended the AAM framework and specifically directed the FAA to address cybersecurity requirements for AAM aircraft and UTM systems. While these directives are still being translated into specific rulemaking, operators pursuing the FAA's BEYOND Beyond Visual Line of Sight (BVLOS) waiver pathways — which UAM commercial operations require — must satisfy the cybersecurity elements of the waiver application, including documented controls for AI perception system integrity.

This creates a near-term commercial gate: BVLOS waivers are the enabling authorisation for UAM passenger service in US airspace. Operators without documented adversarial AI input controls will find their waiver applications incomplete under the cybersecurity element. The timeline is not theoretical — it is the timeline of commercial service entry for the operators currently in the FAA certification pipeline.

4. Defence architecture for eVTOL multimodal AI

The integration architecture for multimodal scanning in eVTOL systems follows the same principle as other autonomous vehicle fleet AI security deployments: scan raw input bytes before the AI inference call, at every boundary where external data enters the AI system. For eVTOL, there are four such boundaries.

Camera frame scanning in the sensor fusion pipeline

The obstacle detection and approach guidance pipelines both begin with camera frame reads. The scan gate sits between the camera DMA buffer read and the perception model call: raw frame bytes are passed to the multimodal scanner, which returns a risk score and, if above threshold, a flagged region map identifying the suspicious pixel area. A clean scan triggers the normal perception pipeline. A flagged scan triggers a fallback mode — for obstacle detection, fallback to LiDAR-only or adjacent camera data; for approach guidance, a transition to instrument-based approach mode — and generates an audit record containing scan_id, image_hash, risk_score, flagged_region, and timestamp.

The latency constraint for obstacle detection is dictated by the aircraft's safety envelope: at 60 knots cruise speed, a 200 ms scan adds approximately 5 metres of sensor-fusion delay — within the safety margin for the lookahead distances at which obstacle detection is triggered (typically 200–500 metres minimum). For approach guidance, the frame rate is lower and the latency constraint is correspondingly looser.

Biometric boarding integration

At the vertipad gate, the scan sits between the camera capture and the facial recognition API call. The boarding-gate scanner serves a dual purpose: it runs the standard multimodal prompt injection check on the captured face image, and it also checks for presentation attack indicators (printed photo, screen replay, deepfake video artefacts) before passing the image to the identity matching system. Both functions operate on the raw image bytes at the pre-API boundary. The biometric identity verification AI security page covers the specific BIPA, EU AI Act Article 5, and GDPR Article 9 obligations that apply to biometric boarding systems, including the data minimisation requirements that affect how scan records are retained.

UTM feed scanning

For UTM display systems, the scan applies to any externally-ingested image asset before rendering: map tiles fetched from external tile servers, camera feeds ingested from vertiport surveillance systems, and weather radar overlays from third-party providers. Tile-level scanning (one scan per tile fetch, cached by tile ID and ETAg) keeps the overhead proportional to the number of distinct tile assets rendered rather than the frame rate of the display. Flagged tiles trigger a substitution with a placeholder or a cached clean version while the tile is quarantined for review.

This is the same architecture used for drone and UAV delivery inspection AI systems that ingest aerial imagery from third-party mapping providers — a validated integration pattern that the eVTOL UTM context can adopt directly. The shared concern is a third-party-controlled image data source that feeds a safety-relevant AI function; the control is the scan gate at the ingestion boundary.

Audit record architecture for DO-326A evidence

The DO-326A Security Risk Assessment requires documented evidence that controls are operating as described. For AI input validation controls, the audit record architecture described above — scan_id, image_hash, risk_score, flagged_region, timestamp, subsequent action taken — provides the evidence format that certification auditors can verify. The records are append-only, signed with the aircraft's maintenance key, and associated with the specific AI function identifier (obstacle detection function ID, approach guidance function ID) that the scanned frame fed into.

This audit architecture is also the format required by EASA AMC20-152A Section 7.1 for AI operational monitoring evidence, and by the FAA's 2024 draft AC on ML-based aircraft functions for input validation record-keeping. A scan that runs and produces no audit record satisfies none of these requirements; a scan that runs and produces a complete audit record satisfies all three simultaneously.

5. What this means for the eVTOL industry in 2026

Urban air mobility is at a commercial inflection point: Joby Aviation completed its 1,000th test flight in early 2025, Archer is building production aircraft at its Covington, Georgia facility, and Wisk's autonomous air taxi is in BVLOS waiver application. The AI security posture of these aircraft is being established now, in the Type Certificate and waiver applications, not after commercial service entry.

The specific challenge for eVTOL AI security is that the attack surface is almost entirely in the image layer — obstacle detection, approach guidance, boarding, UTM display — while the prompt injection defence ecosystem has been built almost entirely for text. The text scanner gap is not a minor limitation in the eVTOL context: it is a complete coverage failure across all four primary attack surfaces. An OEM or operator that deploys only text-based PI defence for its eVTOL AI will have a documented control that is irrelevant to the actual threat vectors and a DO-326A SRA that does not satisfy the adversarial input threat condition.

The regulatory path forward is clear: DO-326A SRA with adversarial image injection enumerated as a threat condition, AMC20-152A robustness evidence with image-layer adversarial testing documented, and BVLOS waiver cybersecurity element satisfied with documented image-layer input validation controls. Each of these requires a multimodal scanner — one that operates on pixel bytes at the AI inference boundary — not a text-based classifier that operates on a channel the attack never uses.

For operators and OEMs in the eVTOL space, the question is not whether multimodal scanning is required; it is when in the certification timeline it is easier to integrate. The answer is before the DO-326A SRA is finalised, not after the first Finding is issued. The architecture is straightforward; the audit record format is documented; and the latency is compatible with the frame rates of obstacle detection and approach guidance AI. The window to integrate cleanly — before the first commercial Type Certificate is granted and the SRA precedents are set — is the window the industry is in now.

Frequently asked questions

What AI components in eVTOL aircraft are vulnerable to prompt injection?

The four highest-risk surfaces are obstacle detection and avoidance (computer vision on 360° camera arrays), vertiport approach and landing guidance AI (visual approach corridor interpretation), passenger biometric boarding at vertipad gates (facial recognition identical in architecture to airport gate scanners), and UTM/U-Space airspace management displays (map tile overlays, camera feeds, traffic data). Each processes image bytes as its primary input — the attack surface that text-only scanners are structurally blind to. The eVTOL-specific risk is that all four are safety-critical: a successful injection that redirects obstacle avoidance, approach guidance, or airspace display is not a data breach but a potential loss of separation event.

How does adversarial obstacle injection work against eVTOL AI?

An adversarial patch — a precisely crafted visual pattern applied to a physical rooftop object, antenna, or building facade — can cause the computer vision model to classify the obstacle as clear sky or as a lower-priority contact, suppressing the avoidance manoeuvre. Unlike digital adversarial examples requiring pixel-level digital control, physical adversarial patches work under real-world lighting variation and viewpoint change; the Expectation over Transformations (EoT) family of physical attacks has demonstrated reliable fooling rates against YOLO-class detectors at distances relevant to urban low-altitude flight. The attacker places a pattern on a physical surface that the aircraft's cameras will see during approach — no access to the aircraft's systems is required.

What regulatory obligation does FAA eVTOL certification create for AI security?

FAA Special Conditions issued alongside Type Certificate applications reference DO-326A (Airworthiness Security Process Specification), requiring a Security Risk Assessment that enumerates threat conditions for all networked aircraft systems, including AI-based perception. The FAA's 2024 draft AC on ML-based aircraft functions identifies adversarial input manipulation as a category requiring specific design assurance. EASA adds AMC20-152A under CS-SC-VTOL-01, requiring robustness testing against adversarial inputs for Level A and Level B AI functions. An SRA that documents only a text-based control against an image-layer attack has an unresolved Finding that blocks Type Certificate issuance until remediated.

Do text-only prompt injection scanners provide any protection for eVTOL AI?

No — not for the four primary attack surfaces. Obstacle detection processes raw camera frames; there is no text layer. Vertiport approach guidance AI processes imagery; there is no text layer. Biometric boarding runs facial recognition on captured images; there is no text layer. UTM display injection is pixel-domain. A text-only scanner positioned at any of these pipelines scans a channel that carries nothing the attacker is exploiting and will always return a clean verdict — creating documented coverage of an empty surface while leaving the actual image attack surface unmonitored.

How should an eVTOL OEM integrate multimodal scanning into the avionics AI pipeline?

The integration point is the camera DMA buffer read → scan gate → AI inference call pipeline. For obstacle detection: scan each frame before the YOLO/perception model call; fallback to LiDAR-only on a flag. For approach guidance: scan each approach corridor frame before the guidance AI call; fallback to instrument-based approach on a flag. For biometric boarding: scan the face capture before the facial recognition API call. For UTM display: scan each external tile and camera feed at ingestion before rendering. Each scan produces an audit record (scan_id, image_hash, risk_score, flagged_region, timestamp, subsequent action) that satisfies DO-326A SRA control evidence and AMC20-152A Section 7.1 operational monitoring requirements simultaneously.