2026-04-23 iOS Test Plans, Parallel XCTest & xcresult Triage for CI on Leased Cloud Mac
iOS QA owners and CI maintainers who run xcodebuild test on leased Apple Silicon Macs in HK / JP / KR / SG / US outgrow ad hoc -only-testing flags quickly. This 2026-04-23 runbook standardizes test plans, shows how to parallelize by destination, and how to ship xcresult bundles to your object store with stable names—complementing headless Simulator tests, DerivedData + xcresult isolation, and parallel archive fan-out.
Why Test Plans Belong in Remote CI
A *.xctestplan is version-controlled intent: it lists default test targets, options, and environment variable matrices your team can review in Git. That beats giant shell strings embedded in a CI YAML file—especially when a PM asks “which configurations actually ran” after a point release. Pair plans with a single scheme and keep the scheme stable; the plan is where you model Debug vs Release, address sanitizer toggles, or disable flaky UI suites temporarily without deleting tests.
Test Plan vs Scheme-Only: Decision Table
| Approach | When it shines | Operability on SSH |
|---|---|---|
Committed .xctestplan + -testPlan Name |
Repro across laptops and builders; diffs in PRs | High—one command per shard |
Scheme defaults + -only-testing |
Spikes and hotfix lanes | Medium—string sprawl in YAML |
| Test target splits across separate schemes | Monorepos with unrelated apps | Lower—duplicated scheme maintenance |
xcodebuild Test Invocation Blueprint
Pin DEVELOPER_DIR to the Xcode you validated on the host, then run tests with explicit result bundle path and derived data per job. Keep job IDs in the path to simplify correlation with self-hosted runners or in-house schedulers.
DEVELOPER_DIR=/Applications/Xcode.app xcodebuild test -workspace App.xcworkspace -scheme App -testPlan Nightly -destination 'platform=iOS Simulator,name=iPhone 16' -resultBundlePath "$CI_RESULTS/run-${CI_SHARD_ID}.xcresult" -derivedDataPath "$CI_DERIVED_DATA/${CI_SHARD_ID}"
xcodebuild test processes on a 12-core M4 class host before contention dominates wall time.
Parallel Lanes, Shards, and Queues
There are two distinct axes: (a) multiple simulators on one host vs (b) separate child jobs each with one destination. Option (a) is seductive for cost but drives CoreSimulator and disk I/O into nonlinear slowdowns. Option (b) mirrors how serious teams in Singapore and US East run fan-out: each shard owns its CORESIMULATOR_HOST prefix and derived data, then uploads its own *.xcresult. Re-use the queue discipline from the parallel xcodebuild article, but with test-specific retry budgets and per-shard timeouts that are 20–35% looser than your archive jobs—XCTest startup variance is real.
Shard Ownership, Naming, and Reruns
Give every child job a stable CI_SHARD_ID that is independent of the Git branch name, because branches with slashes and emoji break naive shell globbing. Encode index and count in that ID so operators can read logs without opening YAML: e.g. test-3-of-12 beats shard-uuid in pager scripts. When someone clicks “re-run failed tests only,” teach your orchestration to re-map only the failed bundle identifiers into -only-testing: lists while keeping the same DEVELOPER_DIR and resultBundlePath family—this avoids a second, unrelated xcresult with almost the same name colliding in your storage bucket. If you gate merges on “all green,” document whether re-run policies are allowed to substitute a new shard label; ambiguity here is what turns a flake into a policy incident.
For mixed pipelines where unit tests finish in minutes and UI tests stretch past lunch, do not let slow shards block fast feedback: publish partial xcresult uploads to a staging prefix, then promote the set to “release” only when the last shard passes—your dashboard can still show wall-clock progress from partial bundles without conflating incomplete runs with a green mainline. This pattern pairs naturally with the isolation practices in the DerivedData article, because every shard can delete its DerivedData root independently during cleanup while leaving sibling shards untouched on the same NVMe.
xcresult Bundles, Merging, and What to Store
Each xcresult is a self-contained test report—keep one per shard rather than trying to “merge on disk” in ways Xcode does not guarantee across versions. Downstream, many teams export JUnit or JSON in CI to feed quality dashboards, while also archiving the raw xcresult for interactive triage. Align retention with the same NVMe policy you use for dSYM: see dSYM and crash symbolication for retention matrices when tests and builds share a host.
Flake Triage Matrix for XCTest on Bare Metal
| Pattern | Likely root cause | First action |
|---|---|---|
| 1-in-6 failures, tests touch UI + animations | Timeout vs sim render variance | Stabilize animations; raise timeout + capture screen recordings in artifacts |
| Entire class fails in one shard | Shard-specific env, missing data seed | assert env parity; mount fixtures read-only from shared volume |
| All shards slow after a Monday | OS patch + simulator runtime change | Re-baseline p95, pin Xcode/ runtime versions explicitly |
NVMe, 1–2 TB Options, and “Why Not Unlimited CI?”
Test lanes multiply transient I/O faster than archive-only lanes. With 1 TB or 2 TB configurations on a leased Mac, schedule weekly sweeps: delete 7-day-old xcresult for green branches, keep 30 days for default branches, and 90 days for tags used in App Store Connect corresponds—your compliance team can tighten those. When signing flakiness mingles with test flakiness, re-check the signing host guidance in keychain and provisioning so you are not misreading XCTest output.
Related MacXCode Guides
Connect this runbook to Simulator test basics, help for access patterns, and node selection when you split test and archive fleets.
FAQ: Test Plans in Headless Environments
| Question | Practical answer |
|---|---|
| Do I need VNC for test debugging? | Rarely for automation—use artifacts and VNC as break-glass for UI issues. |
| How many shards per Mac? | Start with CPU cores / 3 for UI tests; raise only after p95 is stable for a week. |
| Should I mix unit + UI in one plan? | OK for small apps; for large repos split plans by layer to shorten the critical path to signal. |
Why Mac mini M4 Bare Metal Fits XCTest-Heavy CI
Mac mini M4 nodes on MacXCode give deterministic CoreSimulator performance and NVMe that does not sit behind a noisy hypervisor—exactly the surface XCTest p95 metrics care about. With regions across HK · JP · KR · SG · US, you can colocate test builders next to archive hosts while keeping the same SSH workflows, add 1–2 TB for artifact-heavy orgs, and elastically add hosts from pricing when your retry budgets prove you are I/O or CPU bound—not misconfigured test plans.