How Autonomous Vehicles Are Being Tested and What Still Has Not Been Solved

How Autonomous Vehicles Are Being Tested and What Still Has Not Been Solved

From Waymo's robotaxis navigating San Francisco to Tesla's contested Full Self-Driving data, the autonomous vehicle industry is moving faster than ever. But edge cases, bad weather, legal gaps, and a scalability problem nobody wants to talk about are keeping fully driverless cars further away than the headlines suggest.

0 Posted By Kaptain Kush

I have spent over a decade working in and around automotive technology, sitting in test vehicles across Phoenix, San Francisco, and Pittsburgh, watching engineers argue in parking lots at midnight about why a LiDAR sensor misread a shopping cart as a pedestrian.

I have seen the genuine brilliance of what autonomous vehicle technology can do on a clear Tuesday afternoon in a mapped suburb. I have also watched a robotaxi freeze in the middle of an intersection because a road crew set up cones in a configuration that no training dataset had ever accounted for.

Trending Now!!:

The contrast between those two experiences tells you almost everything you need to know about where this technology actually stands in 2026.

The autonomous vehicle industry has made remarkable strides since Google’s first self-driving car prototype bumbled around Mountain View in 2009. But the phrase “self-driving car” remains one of the most elastically misused terms in the English language.

There is a world of difference between a vehicle that can keep itself centered in a highway lane with a human’s hands hovering nearby and a robotaxi navigating downtown Atlanta without any human occupant watching over it.

The Society of Automotive Engineers codified this with its SAE automation levels framework, ranging from Level 0 (no automation) to Level 5 (full automation under all conditions). Most of what automakers marketed as self-driving technology through the early 2020s was Level 2.

What the industry is now scrambling toward is Level 4, meaning the vehicle handles everything within a defined operational design domain, or ODD, without any human fallback.

The gap between Level 2 and Level 4 is not incremental. It is a chasm.

How Autonomous Vehicles Are Actually Tested

If you imagine AV testing as a car silently gliding down an empty highway while engineers sip coffee and monitor dashboards from a glass-walled control room, you are imagining something closer to a concept video than reality. Real AV testing is chaotic, iterative, and expensive in ways that defy easy summary.

The modern AV testing pipeline operates across three primary environments: closed-course track testing, simulation, and public road validation. Each has its own logic and its own failure modes.

Closed-course testing is where teams put vehicles through choreographed scenarios, deliberately placing mannequins, cyclist dummies, and cut-off vehicles in front of test cars to evaluate emergency braking and object detection.

Facilities like the American Center for Mobility in Michigan and GoMentum Station in California were built precisely for this. Closed-course work is valuable for validating specific subsystems and detecting obvious sensor failures, but it has a fundamental flaw: you already know what will happen. Real roads do not give you that courtesy.

Simulation has become the backbone of modern AV safety validation, largely because the sheer volume of miles needed to statistically prove safety is staggering. RAND Corporation estimated years ago that autonomous vehicles would need to drive hundreds of millions of miles to demonstrate reliability beyond human baseline performance.

No fleet can accumulate that on physical roads in any reasonable timeframe. Simulation platforms like CARLA, NVIDIA DRIVE Sim, and internally developed environments at companies like Waymo allow engineers to run millions of virtual miles, stress-testing edge cases, adverse weather conditions, and sensor failure scenarios at a fraction of the cost.

Virtual testing, which constructs environments to simulate real-world scenarios and efficiently evaluate vehicle safety and risk decision-making capabilities, has become a core means of evaluating automated vehicle safety. The challenge is that simulation is only as good as its fidelity to the physical world, and the physical world has a habit of producing things that nobody thought to model.

Public road testing is where the rubber literally meets the road, and it is also where the most contentious debates play out. NHTSA’s April 2025 regulatory framework announcement set out three principles: prioritize the safety of ongoing AV operations on public roads, remove unnecessary regulatory barriers to innovation, and enable commercial deployment that enhances safety and mobility for the American public.

This framework matters because, until recently, AV testing on public roads operated in a regulatory patchwork of state-by-state rules, creating situations where a company could run driverless vehicles in California but be legally required to have a safety driver in Texas.

The sensor stack inside a modern AV test vehicle is itself a marvel worth understanding. Most companies rely on some combination of LiDAR, radar, cameras, and ultrasonic sensors, with onboard computers performing sensor fusion to produce a coherent picture of the vehicle’s surroundings.

Waymo’s fifth-generation Waymo Driver system, for instance, integrates long-range radar, mid-range LiDAR, and high-definition cameras in a 360-degree configuration. Tesla, by contrast, has bet heavily on a camera-only vision system for its Full Self-Driving software, arguing that LiDAR is expensive and unnecessary if the machine learning algorithms are robust enough. This is not merely a technical debate.

It is a philosophical argument about the nature of perception, and both sides have real-world data to support their positions.

What Waymo Has Built (and What It Reveals)

Waymo is currently the most operationally mature autonomous vehicle company in the world by almost any measurable standard. Waymo’s robotaxi service operates in Austin, the San Francisco Bay Area, Phoenix, Atlanta, and Los Angeles, and the company crossed an estimated 450,000 weekly paid rides in late 2025, having served 14 million trips throughout the year.

Those are not test rides with safety drivers. Those are commercial, fully driverless trips where the vehicle is entirely on its own. In November 2025, Waymo began extending its operations to freeway routes, which represents a meaningful leap beyond the structured urban environments where its vehicles had previously been geofenced.

Waymo has published detailed data showing that its vehicles are approximately 5 times safer than human drivers overall and 12 times safer in pedestrian incidents. These numbers are frequently cited and frequently contested, and both reactions are legitimate. The data is real, but it comes with caveats.

Waymo operates within carefully defined geofenced zones that have been mapped in extraordinary detail. Its vehicles do not drive in heavy snow, do not navigate rural highways, and do not handle the full range of operational conditions that a human driver encounters. The safety numbers are impressive within the ODD. What happens outside of it remains largely untested.

Between July 2021 and November 2025, there were 1,429 Waymo incidents reported to NHTSA, noting that these involved but were not necessarily caused by a Waymo vehicle. Context matters enormously here. A Waymo being rear-ended by an inattentive human driver counts as an “incident.”

A Waymo gently nudging a roadside barrier at two miles per hour counts the same way. The raw numbers without narrative context are nearly meaningless, which is part of why the industry’s approach to crash reporting remains such a point of contention.

NHTSA opened an investigation in October 2025 into Waymo’s performance around school buses after the Austin Independent School District reported 19 instances of Waymo vehicles illegally passing stopped school buses. Waymo issued a voluntary software recall in December 2025 to address prediction errors. This incident is worth sitting with.

The school bus stop-arm scenario is not exotic or obscure. It is a legally mandated, universally understood road rule that every licensed human driver in America learns in driver’s education.

The fact that a mature, commercially deployed autonomous driving system needed a software recall to handle it correctly tells you something important about the granularity of edge cases that still remain unsolved.

Tesla’s Full Self-Driving and the Robotaxi Question

Tesla occupies a uniquely complicated position in the autonomous vehicle landscape. It has by far the largest deployed fleet of vehicles running advanced driver assistance software, which means it has access to more real-world driving data than any other company on earth. It also has a track record of marketing that has consistently outrun the actual capabilities of its technology.

Tesla launched its robotaxi pilot service in Austin in late June 2025 and reported seven collisions involving its 2026 Model Y vehicles through October 15 to NHTSA. Those vehicles were equipped with Tesla’s newer automated driving systems, and the collisions were not severe according to NHTSA data. By early 2026, the total incident count had grown.

Tesla filed reports bringing the total to 15 incidents since the program launched, including collisions with a bus, a heavy truck, a pole, and multiple fixed objects, all at low speeds with the autonomous system engaged.

The more revealing issue is not the crash count but the transparency surrounding it. Tesla redacts the entire narrative section of its crash reports to NHTSA, marking them as confidential business information. Every other AV operator filing with NHTSA, including Waymo, Zoox, and Aurora, provides full descriptions of what happened in their incidents.

Crash narratives are where safety learning actually happens. Without knowing whether a vehicle failed to detect a stationary object, misidentified a shadow, or made a poor speed judgment, the incident data is nearly impossible to act on in any systematic way. The opacity matters.

Tesla claims that drivers using FSD travel about 2.9 million miles between major collisions, compared to NHTSA data showing all drivers travel about 505,000 miles per major collision.

If these numbers hold up under scrutiny, they represent a genuine safety improvement over human driving. But the methodology of how Tesla defines “major collision,” how it segments FSD-active miles, and what it excludes from its dataset continues to be challenged by independent researchers and regulators alike.

California threatened to ban sales of Tesla vehicles after a judge found the company had engaged in deceptive marketing around its Full Self-Driving and Autopilot systems, falsely implying they were fully automated. The naming problem is not trivial.

When a product is called “Full Self-Driving” or “Autopilot,” ordinary consumers draw natural conclusions about what those names imply. Those conclusions can be deadly.

The Problems That Remain Stubbornly Unsolved

Having watched this industry from inside test vehicles, research labs, and regulatory hearings, I can tell you that the unsolved problems are not matters of refinement. Some of them are genuinely hard in ways that the industry does not always advertise.

The Long Tail of Edge Cases

The edge case problem is the one that keeps autonomy engineers up at night. A self-driving car system can achieve extraordinary performance across the vast majority of everyday driving scenarios, which lulls people into underestimating the difficulty of what remains.

But rare and unusual scenarios are not rare in aggregate. Across millions of miles and thousands of vehicles, every unusual event happens with some regularity.

A plastic bag blowing across a freeway. A construction zone where temporary markings contradict permanent lane lines.

A child chasing a ball into the street from between parked cars. A funeral procession. A flash mob. Scenario generators for AV testing still struggle to integrate data-driven approaches, optimization techniques, and combinatorial methods into a cohesive framework, and the trade-off between scenario coverage and edge-case scenarios remains a significant unsolved issue.

Adverse Weather and Sensor Degradation

LiDAR, which remains the gold standard for autonomous vehicle perception, works extraordinarily well in clean conditions. Rain, snow, fog, and dust are another matter entirely. Water droplets and ice crystals scatter laser pulses in ways that confuse depth estimation.

Heavy snow can occlude camera lenses and coat sensor housings. The limitations of XiL testing methods stem from their reliance on human behavior, vehicle, and environmental models that often fail to replicate real-world conditions accurately. Waymo currently does not operate in heavy winter weather.

Tesla’s FSD has shown improvement in rain but remains unreliable in heavy snowfall. No commercially deployed AV system today operates with full confidence in all weather conditions, and this is not a rounding error. Weather is a foundational part of the driving experience in most of the world.

V2X Communication and Infrastructure Dependency

Vehicle-to-everything, or V2X communication, the ability for autonomous vehicles to communicate with traffic signals, other vehicles, pedestrians, and roadside infrastructure, has been discussed as a force multiplier for AV safety for years. The theoretical benefits are substantial.

An AV that receives a signal from a traffic light 300 meters ahead knows the signal will turn red before its cameras can confirm it. A vehicle warned by another vehicle of a hazard around a blind corner can slow proactively rather than reactively.

The problem is that V2X requires infrastructure investment that most municipalities simply have not made. Planning new infrastructure demands expensive investments and may impact existing socioeconomic structures of cities, making widespread CAV deployment difficult without upgrading existing infrastructure.

An autonomous vehicle designed to leverage V2X communication is still functionally operating without it in most of the real world.

Cybersecurity and Software Integrity

Research focused on trustworthy AI in autonomous vehicles highlights cybersecurity as one of the essential unsolved requirements, alongside transparency, robustness, and fairness.

An autonomous vehicle is, at its core, a networked computer on wheels, and it faces the same categories of vulnerability as any networked computing system, compounded by the fact that a successful attack can have immediate physical consequences.

Over-the-air (OTA) update systems, which allow manufacturers to push software improvements to vehicles remotely, are both a massive safety advantage and a significant attack surface. A malicious actor who can push a software update to a fleet of autonomous vehicles does not need a physical presence anywhere near those vehicles.

Liability and the Legal Framework

The question of who is responsible when an autonomous vehicle causes an accident remains genuinely unresolved in most jurisdictions. Unlike traditional accidents where fault is typically assigned to human drivers, self-driving car incidents introduce multiple potentially liable parties: the AV developer, the sensor hardware manufacturer, the software engineers, and, in some cases, the mapping data provider.

Existing tort law was built around the premise of a human operator making decisions. When the decision-maker is a neural network trained on petabytes of data, the legal frameworks designed to determine negligence are working with concepts that do not translate cleanly.

The Scalability Problem

Waymo’s technology is impressive. It is also extraordinarily expensive to deploy. Each Waymo vehicle costs significantly more to build and operate than a conventional vehicle, and the high-definition mapping that underpins its navigation requires continuous updating across every road it intends to operate on.

Waymo’s fleet will likely expand to at least 10,000 vehicles over the next year from more than 2,500 now, and the company still loses over two billion dollars annually despite revenues growing steadily. The path from “it works in Phoenix and San Francisco” to “it works everywhere” runs directly through the scalability problem, and nobody has solved it yet.

The Human Factor Nobody Talks About Enough

There is a dimension to autonomous vehicle deployment that gets consistently underweighted in technical discussions: human behavior is not just a variable to be predicted. It is an active participant in the safety equation.

When a human driver encounters an autonomous vehicle on the road, their behavior often changes in ways that trained models do not adequately account for. Pedestrians who notice a robotaxi approaching have been documented testing the vehicle, stepping into the street and then back, curious about whether it will stop.

Cyclists sometimes use AV vehicles as rolling blockers, knowing that the system’s conservative behavior makes it unlikely to proceed. Other drivers cut in front of robotaxis with a frequency that veteran engineers have described as noticeably higher than normal, apparently intuiting that the autonomous vehicle will brake.

These behavioral adaptations are not in any training dataset because they are emergent properties of autonomous vehicles sharing public spaces with humans for the first time.

This is not a critique of any specific company’s technology. It is an observation about the fundamental complexity of inserting a new kind of actor into a sociotechnical system that was built around human judgment, human communication, and human unpredictability.

Where the Industry Goes From Here

NHTSA’s biannual report to Congress in early 2026 stressed that the mobility and safety benefits of autonomous vehicles can only be achieved through public trust grounded in demonstrable safety, and that the technical and policy challenges must be addressed decisively. That is the kind of statement that is easy to agree with and extraordinarily difficult to operationalize.

The honest assessment, from someone who has watched this industry across more than a decade, is this: autonomous vehicle technology is genuinely safer than human driving within the specific conditions it has been designed and validated for.

That is a real achievement, and it should not be minimized. Waymo’s data on pedestrian safety outcomes alone represents a meaningful contribution to public safety in cities where it operates.

But the distance between “safer than humans in mapped urban geofences during good weather” and “ready to replace human driving as a general transportation technology” is enormous. The edge case problem requires not just more miles, but more conceptually diverse miles.

The weather problem requires hardware advances that are still in research stages. The liability framework requires legislative action that moves at the pace of legislatures. The V2X infrastructure problem requires coordinated public investment that has barely begun.

The companies that will win this race are not necessarily the ones with the most sophisticated technology today. They are the ones disciplined enough to be honest about what their systems cannot yet do, rigorous enough to test against their own assumptions, and patient enough to build trust rather than just build hype. The road ahead is real. So are the roadblocks.

What People Ask

What does it mean for an autonomous vehicle to be fully self-driving?
A fully self-driving vehicle, classified as SAE Level 5, can operate under all conditions, on all roads, and in all weather without any human input or supervision. No commercially deployed vehicle has reached that level. Most advanced systems operating today, including Waymo’s robotaxis and Tesla’s Full Self-Driving software, function at Level 4 or below, meaning they require defined geographic zones, favorable conditions, or some form of human oversight to operate safely.
How are autonomous vehicles tested before they are allowed on public roads?
Autonomous vehicles go through a multi-stage testing pipeline that includes closed-course track testing, simulation-based validation, and public road testing. Closed-course testing evaluates specific sensor and braking responses in controlled environments. Simulation allows companies to run millions of virtual miles and stress-test rare edge cases without physical risk. Public road testing gathers real-world performance data across varying traffic, road surfaces, and driver behaviors. All three stages are required before any regulatory body considers approving wider commercial deployment.
What are the SAE levels of driving automation?
The SAE International framework defines six levels of driving automation. Level 0 means no automation at all. Level 1 covers single-function assistance like adaptive cruise control. Level 2 allows the system to handle steering and acceleration together but requires constant human supervision. Level 3 allows the vehicle to manage most driving tasks but requires the human to take over when prompted. Level 4 is full automation within a defined operational design domain, meaning no human is needed inside that zone. Level 5 is complete autonomy under all conditions, which no production vehicle has achieved.
Is Waymo’s robotaxi actually safer than a human driver?
Within its defined operational zones, Waymo’s published data shows its vehicles are approximately five times safer than human drivers overall and around twelve times safer in pedestrian incidents. However, those figures apply specifically to the geofenced urban areas where Waymo operates, under favorable weather conditions, and on roads that have been meticulously mapped in advance. Waymo does not currently operate in heavy snow, on rural highways, or in areas outside its mapped service zones, which means the safety comparison does not yet apply to driving in general.
What sensors do autonomous vehicles use to navigate?
Most autonomous vehicles use a combination of LiDAR, radar, cameras, and ultrasonic sensors fused together by onboard computing systems to build a real-time picture of the surrounding environment. LiDAR measures distances using laser pulses and produces detailed three-dimensional maps of the vehicle’s surroundings. Radar handles long-range detection and performs well in low-visibility conditions. Cameras provide visual context for signs, signals, lane markings, and object classification. Ultrasonic sensors assist with close-range detection during parking and low-speed maneuvers. Tesla is notable for relying almost entirely on cameras rather than LiDAR, a design philosophy that remains contested among engineers and researchers.
What is an operational design domain in autonomous driving?
An operational design domain, commonly referred to as an ODD, defines the specific conditions under which an autonomous driving system is designed and validated to operate. This includes geographic boundaries, road types, speed limits, weather conditions, time of day, and traffic density. When an autonomous vehicle operates within its ODD, it is performing as designed. When it encounters conditions outside that domain, such as an unmapped road, a severe storm, or an unusual traffic configuration, the system may fail or require human intervention. The ODD is one of the most important, and most underreported, factors in evaluating AV safety claims.
Why do autonomous vehicles still struggle with bad weather?
Weather remains one of the most persistent unsolved challenges in autonomous vehicle development. Rain, heavy snow, fog, and dust interfere with LiDAR by scattering laser pulses, reducing the accuracy of depth perception. Snow can occlude camera lenses and cover lane markings that the vehicle depends on for navigation. Radar handles adverse conditions better than LiDAR or cameras, but it offers less spatial detail. No commercially deployed autonomous vehicle system currently operates with full confidence across all weather conditions, which is one reason why most robotaxi services operate only in climates with mild, predictable weather patterns.
Who is legally responsible when an autonomous vehicle causes an accident?
Liability in autonomous vehicle accidents is one of the most unresolved legal questions in the industry. Unlike conventional crashes where fault is typically assigned to a human driver, an AV accident can implicate the software developer, the hardware manufacturer, the mapping data provider, or the vehicle operator, sometimes all at once. Existing tort law was designed around human decision-making, and courts and legislators are still working out how concepts like negligence and product liability apply when the decision-maker is an algorithm. Most jurisdictions have not yet passed comprehensive AV liability legislation, leaving injured parties in legally ambiguous territory.
What is the edge case problem in autonomous driving?
The edge case problem refers to the long tail of rare, unusual, or unexpected driving scenarios that an autonomous vehicle may encounter infrequently but that occur regularly across a large fleet operating millions of miles. Examples include construction zones where temporary markings conflict with permanent lane lines, unusual weather events, animals crossing roads, or vehicles behaving unpredictably. Each individual scenario may be rare, but collectively they happen constantly at scale. Training a self-driving system to handle every possible edge case requires either an impossibly large and diverse dataset or fundamentally new approaches to generalized reasoning that the industry has not yet developed.
How does Tesla’s Full Self-Driving differ from Waymo’s autonomous system?
Tesla’s Full Self-Driving software is a driver assistance system that requires a licensed human driver to remain in the seat, attentive, and ready to intervene at any time, making it a Level 2 system in most operational contexts despite its name. Waymo’s Waymo Driver is a Level 4 autonomous system that operates without any human driver in defined service areas, with no expectation of human takeover. Beyond that classification difference, the two systems take opposing approaches to perception: Waymo relies on a multi-sensor stack including LiDAR, radar, and cameras combined with pre-built high-definition maps, while Tesla uses a camera-only neural network approach without LiDAR or pre-mapped routes. The philosophical and technical gap between the two is significant.
What is V2X communication and why does it matter for autonomous vehicles?
V2X, short for vehicle-to-everything, is a communication technology that allows autonomous vehicles to exchange data with traffic signals, other vehicles, pedestrians, and roadside infrastructure in real time. It gives AVs the ability to receive information about upcoming signal changes, road hazards, and the movements of other vehicles before their onboard sensors can detect them directly. The safety benefits are substantial in theory, but widespread V2X deployment requires coordinated infrastructure investment from local governments and transportation authorities that has been slow to materialize. Most autonomous vehicles operating today do not benefit from V2X because the infrastructure simply does not exist at scale in most cities.
Are autonomous vehicles vulnerable to hacking and cyberattacks?
Cybersecurity is a serious and underappreciated risk in autonomous vehicle deployment. An AV is essentially a networked computer operating at highway speeds, and it shares many of the same vulnerabilities as any internet-connected system, with far greater physical consequences if compromised. Over-the-air software update systems, which allow manufacturers to push improvements to vehicles remotely, are both a major safety feature and a potential attack surface. Researchers have demonstrated the ability to remotely interfere with vehicle systems in controlled environments. Regulatory frameworks around AV cybersecurity are still being developed, and the industry does not yet have a unified standard for how to validate and disclose vulnerabilities.