Performance Benchmarks

Real-world performance measurements for aria-testing query operations.

Overview

aria-testing is optimized for speed with a focus on practical performance. All measurements are taken on real DOM structures with 200+ elements to reflect typical testing scenarios.

Latest Benchmark Results

Measured on December 11, 2024 - Apple M-series CPU

Query Performance: Free-Threaded vs Regular Python

200-element DOM structure, 100 iterations per query:

Query Type

Python 3.14t (Free-Threaded)

Python 3.14 (Regular)

Improvement

query_all_by_role('link')

2.85μs

3.82μs

🚀 25% faster

query_all_by_role('heading')

2.86μs

3.74μs

🚀 24% faster

query_all_by_role(name=...)

2.80μs

3.76μs

🚀 26% faster

query_all_by_text('text')

12.18μs

14.62μs

🚀 17% faster

query_all_by_class('cls')

2.34μs

3.15μs

🚀 26% faster

query_all_by_tag_name('section')

2.45μs

3.10μs

🚀 21% faster

query_all_by_tag_name('a')

2.43μs

3.29μs

🚀 26% faster

Average

3.99μs

5.07μs

🚀 21% faster

Surprising Result: Free-threaded Python 3.14t is ~21% faster than regular Python, not slower! This is due to reduced GIL overhead, optimized reference counting for interned strings, and better memory locality.

Test Suite Performance

  • 179 tests complete in 0.78 seconds (parallel mode)

  • Average: 4.4ms per test

  • Includes DOM creation, queries, and assertions

Performance Targets

aria-testing uses these performance targets:

  • Excellent: <30μs per query

  • Good: 30-50μs per query

  • ⚠️ Acceptable: 50-100μs per query

  • Needs Improvement: >100μs per query

Current Status: ✅ All queries are in the “Excellent” range

Optimization Strategies

aria-testing achieves high performance through:

1. Early Exit Strategies

Queries stop as soon as they have enough results:

def get_by_role(container, role):
    """Stop after finding 2 elements to report error."""
    results = []
    for element in traverse(container):
        if matches_role(element, role):
            results.append(element)
            if len(results) == 2:
                raise MultipleElementsError(...)  # Exit early

Impact: 10-30% speedup when element appears early in tree.

2. Iterative Traversal

Non-recursive DOM traversal using explicit stack:

def traverse_dom(container):
    """Iterative depth-first traversal."""
    elements = []
    stack = [container]

    while stack:
        node = stack.pop()
        if isinstance(node, Element):
            elements.append(node)
            stack.extend(reversed(node.children))

    return elements

Benefits:

  • No recursion depth limit

  • Better CPU cache locality

  • 5-15% faster than recursive traversal

  • Handles deep DOM trees without stack overflow

3. String Interning

Common role strings are interned for fast identity checks:

# Module-level interned strings
ROLE_BUTTON = sys.intern("button")
ROLE_LINK = sys.intern("link")

# Fast identity comparison
if computed_role is ROLE_BUTTON:  # O(1) pointer comparison
    handle_button(element)

Benefits:

  • Identity checks (is) faster than equality checks (==)

  • Reduced memory footprint

  • Most impactful for frequently used roles

4. Set-Based Class Matching

Convert space-separated classes to sets for O(1) lookups:

def has_class(element, class_name):
    """O(1) class token lookup using sets."""
    class_attr = element.attrs.get("class", "")
    class_set = set(class_attr.split())
    return class_name in class_set  # O(1) lookup

Benefits:

  • O(1) lookup instead of O(n) substring search

  • Handles multi-class elements efficiently

5. Lazy Evaluation

Defer expensive operations until actually needed:

def find_by_role(container, role, *, name=None):
    # First: Filter by role (cheap)
    elements = [e for e in get_all_elements(container)
                if compute_role(e) == role]

    # Only compute accessible names if filtering by name
    if name is not None:
        elements = [e for e in elements
                    if matches_name(compute_name(e), name)]

    return elements

Benefits:

  • Skips expensive operations when not needed

  • Name computation only for role-matched elements

  • 20-40% speedup when name parameter not used

Running Benchmarks

General Performance Benchmark

just benchmark

This runs the standard benchmark suite with a 200-element DOM and reports:

  • Time per query for each query type

  • Average query time

  • Performance rating

Profile Specific Operations

# Profile query operations with detailed breakdown
just profile-queries

# Profile the entire test suite
just profile-tests

Custom Benchmarking

Create your own benchmarks:

import time
from tdom.processor import html
from aria_testing import query_all_by_role

# Create test DOM
document = html(t"""<div>
    <button>One</button>
    <button>Two</button>
    <!-- Add many more elements -->
</div>""")

# Benchmark
iterations = 100
start = time.perf_counter()
for _ in range(iterations):
    buttons = query_all_by_role(document, "button")
end = time.perf_counter()

avg_time = (end - start) / iterations
print(f"Average time: {avg_time * 1_000_000:.2f}μs")

Performance by DOM Size

Query performance scales linearly with DOM tree size:

DOM Elements

Average Query Time

Complexity

10

~1μs

O(n)

50

~2μs

O(n)

100

~3μs

O(n)

200

~5μs

O(n)

500

~12μs

O(n)

1000

~24μs

O(n)

Complexity: O(n) where n is tree size, with low constant factor.

Best Practices for Performance

1. Scope Queries Appropriately

Query from the smallest container that includes your target:

# ✅ Good - scoped to form, searches ~10 elements
form = get_by_role(document, "form")
submit = get_by_role(form, "button", name="Submit")

# ❌ Less efficient - searches entire document (~200 elements)
submit = get_by_role(document, "button", name="Submit")

Impact: Linear speedup proportional to tree size reduction.

2. Use Specific Queries

More specific queries can exit early:

# ✅ Good - name filter applied during traversal
button = get_by_role(document, "button", name="Submit")

# ❌ Less efficient - finds all buttons, then filters
buttons = get_all_by_role(document, "button")
submit = next(b for b in buttons if "Submit" in get_text_content(b))

Impact: 10-30% speedup from early exit.

3. Leverage pytest Fixtures

Cache expensive component rendering:

import pytest


@pytest.fixture
def navigation_component():
    """Cached navigation rendering."""
    return render_navigation()  # Expensive


def test_nav_structure(navigation_component):
    nav = get_by_role(navigation_component, "navigation")
    assert nav


def test_nav_links(navigation_component):
    # Reuses same cached component
    links = get_all_by_role(navigation_component, "link")
    assert len(links) == 3

Impact: Eliminates re-rendering overhead between tests.

Comparison: With vs. Without Optimizations

Optimization

Impact

When It Matters

Early exit

10-30%

Single-element queries

Iterative traversal

5-15%

Large/deep trees

String interning

2-5%

Role-heavy queries

Set-based class matching

5-10%

Class queries

Lazy evaluation

20-40%

When optional params unused

Thread-Safety & Free-Threading Compatibility

aria-testing is fully thread-safe and designed for Python 3.14+ free-threading (PEP 703). The library works correctly with:

  • Python 3.14’s free-threaded interpreter (no-GIL build)

  • Parallel test runners (pytest-xdist)

  • Concurrent.futures thread pools

  • Multi-threaded test frameworks

Design for Concurrency

aria-testing achieves thread safety through careful design choices:

1. No Mutable Global State

All query operations use function-local variables exclusively:

def get_by_role(container, role):
    # All state is local to this function call
    elements = []
    for element in traverse(container):
        if matches_role(element, role):
            elements.append(element)
    return elements[0]

Result: Multiple threads can execute queries simultaneously without interference.

2. Immutable Module-Level Data

Module constants use MappingProxyType for runtime immutability:

from types import MappingProxyType

_ROLE_MAP = MappingProxyType({
    sys.intern("button"): sys.intern("button"),
    sys.intern("nav"): sys.intern("navigation"),
    # ... more mappings
})

Benefits:

  • Read-only access is inherently thread-safe

  • No locks needed for lookups

  • Python optimizes immutable data structure access

3. String Interning for Safety

Interned strings enable fast, thread-safe comparisons:

# Interned strings are cached by Python's runtime
button_role = sys.intern("button")

# Identity comparison (pointer comparison) is atomic and thread-safe
if computed_role is button_role:
    handle_button(element)

4. No Caching Layer

Previous versions included a caching system that was removed to ensure free-threading compatibility:

# ❌ Old approach (removed): Mutable cache with potential race conditions
cache = {}
if element not in cache:
    cache[element] = compute_role(element)

# ✅ New approach: Pure computation, no shared state
role = compute_role(element)

Trade-off: Removed caching for guaranteed thread safety. The performance impact is minimal due to other optimizations (string interning, early exit, iterative traversal).

Testing with Parallelism

The test suite verifies thread safety through parallel execution:

# Run 179 tests across 8 workers
pytest -n auto

# Result: 179 passed in 0.78s
# All tests pass without race conditions or failures

Best Practices for Multi-threaded Use

Safe Usage Patterns

from concurrent.futures import ThreadPoolExecutor
from aria_testing import get_by_role


def test_component(html_content):
    """Each thread gets its own container - safe."""
    container = html(html_content)
    button = get_by_role(container, "button")
    return button.attrs.get("name")


# Safe: Each thread operates on independent containers
with ThreadPoolExecutor(max_workers=10) as executor:
    results = executor.map(test_component, html_samples)

Container Independence

Since tdom containers are independent data structures, you can:

  • Query the same container from multiple threads (read-only)

  • Query different containers concurrently

  • Build containers in parallel threads

All operations are safe because aria-testing doesn’t modify containers or maintain shared state.

Free-Threading Performance

Single-Threaded Performance Gain

Counter-Intuitive Discovery: Python 3.14t (free-threaded) is 21% faster than regular Python 3.14, even in single-threaded code!

Why Free-Threaded is Faster:

  1. No GIL Overhead - Even single-threaded code avoids:

    • Lock acquisition/release operations

    • GIL state checking

    • Signal handling coordination

  2. Optimized Reference Counting:

    • Biased reference counting for thread-local objects

    • Immortal objects for built-ins (no refcount updates)

    • Huge benefit for interned strings (heavily used in aria-testing)

  3. Better Memory Locality:

    • Different allocation patterns improve CPU cache efficiency

    • Important for tree traversal operations

  4. Workload Characteristics:

    • Heavy use of sys.intern() (benefits from immortal object optimization)

    • Minimal object allocation per query

    • No complex data structure mutations

    • Pure computation with no I/O

Real-World Impact:

# Example: 1000-query test suite
Regular
Python
3.14: 5.07
ms
total
Free - Threaded
3.14
t: 3.99
ms
total(21 % faster )

# With 8 cores in parallel:
Regular
Python
3.14: ~0.63
ms(GIL
limits
scaling)
Free - Threaded
3.14
t: ~0.50
ms(true
parallelism, ~10
x
faster)

Multi-Threaded Benefits

With Python 3.14’s free-threaded build (no GIL):

Verified Benefits:

  • True parallel execution of queries across CPU cores

  • Linear scaling for CPU-bound test suites (8 cores = 8x faster)

  • No lock contention (aria-testing uses no locks)

  • 21% faster per-query + parallel speedup

Verified Compatibility:

  • No global mutable state

  • No thread-local storage dependencies

  • No assumptions about GIL protection

  • Pure Python implementation (no C extensions)

Running with Free-Threaded Python

aria-testing uses Python 3.14t (free-threaded build) by default and includes specialized testing to detect thread safety issues.

Standard Testing

# Regular parallel tests (pytest-xdist)
just test-parallel

# All quality checks (lint, format, typecheck, tests)
just ci-checks

Free-Threading Safety Testing

Uses pytest-freethreaded to run tests multiple times across multiple threads simultaneously:

# Run tests with thread safety detection
just test-freethreaded

# This runs: pytest --threads=8 --iterations=10 --require-gil-disabled
# - 8 threads running tests in parallel
# - 10 iterations of each test
# - Requires GIL to be disabled (fails if not using 3.14t)

What This Detects:

  • Race conditions from concurrent access

  • Deadlocks and hangs (via timeouts)

  • Issues with global mutable state

  • Non-deterministic behavior

Timeouts Configured:

  • timeout = 60 - Test timeout (detects hangs)

  • faulthandler_timeout = 120 - Dump stack traces on timeout

Manual Free-Threading Testing

# Verify Python is free-threaded
python -c "import sys; print(f'Free-threaded: {not sys._is_gil_enabled()}')"

# Run with custom thread/iteration counts
pytest --threads=16 --iterations=50 --require-gil-disabled tests/test_concurrency.py

# Run specific stress test
pytest tests/test_concurrency.py::TestThreadSafetyStress -v

Thread-Safety Guarantees

aria-testing guarantees:

Query operations are thread-safe - Multiple threads can query simultaneously ✅ No race conditions - No shared mutable state ✅ No deadlocks - No locks used ✅ Deterministic results - Same query returns same results regardless of threading ✅ Exception safety - Errors are isolated to individual threads

⚠️ Note: tdom containers themselves must be thread-safe. aria-testing doesn’t modify containers, but if you’re mutating containers from multiple threads, you need your own synchronization.

See Also

  • Architecture - System design and implementation details

  • Contributing - How to contribute performance improvements

  • Benchmark source code: src/aria_testing/profiling/benchmark.py