Troubleshooting
What you will find here
Real failure modes observed in current BNNR runs, with reproducible checks and fixes.
When to use this page
Use this when a command fails, a run hangs, dashboard is empty, or export output is unexpected.
1) Training finishes but process does not exit
Cause:
- run started with
--with-dashboard(default), so live server stays active.
Fix:
- stop with
Ctrl+C, or - run one-shot with
--without-dashboard.
2) Dashboard backend dependencies are missing
Cause:
- dashboard extras not installed.
Fix:
python3 -m pip install -e ".[dashboard]"3) --data-path is required ...
Cause:
- missing path for dataset types that require external structure.
Applies to:
imagefolder
Fix: pass --data-path.
4) Dashboard shows zero runs
Cause:
- wrong
--run-dir, or run directory missingevents.jsonl.
Fix:
- use a parent folder containing
run_*directories, or - point directly at a run directory that has
events.jsonl.
6) (Historical) Python 3.9 and union types in CLI/FastAPI
Supported releases (0.1.1+) require Python >=3.10; this scenario does not apply to current wheels.
If you run an older checkout on Python 3.9, you could see unsupported operand type(s) for | from runtime-evaluated X | None annotations. Mitigations were: compatible annotations in introspected paths, or eval-type-backport for python < 3.10 (removed when 3.9 support was dropped).
7) CI/test import error: ModuleNotFoundError: No module named 'httpx'
Cause:
- async dashboard tests require
httpx.
Fix:
- include
httpxin test/dev dependencies.
8) pip install -e ".[dashboard]" or python -m build fails in restricted environments
Cause:
- isolated build environment cannot fetch build backend packages (for example
hatchling) due network restrictions.
Fix:
- in restricted/offline environments, use prepared build env and
python -m build --no-isolation, - in GitHub CI (network-enabled), standard
python -m buildshould work.
9) CUDA appears unavailable
Check:
python3 -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"Fix:
- install CUDA-compatible PyTorch build for your OS/driver.
10) Augmentation instability on grayscale datasets
Symptom:
- shape/broadcast errors with aggressive presets on grayscale data.
Fix:
- start with RGB datasets for preset stress tests,
- keep grayscale smoke runs minimal and conservative.
11) python3 -m venv fails with ensurepip is not available
Cause:
- Python venv support package is missing on the host OS.
Fix:
- install system venv package (for example
python3.12-venvon Ubuntu), - recreate the virtual environment and continue quickstart steps.
12) QR code is visible but dashboard does not open on phone
Cause:
- phone is on a different network,
- local firewall blocks dashboard port,
- router/client isolation blocks peer-to-peer traffic.
Fix:
- confirm phone and machine are on the same Wi-Fi,
- open Network URL manually on phone (not only QR scan),
- test with explicit port, e.g.
--dashboard-port 8080, - allow Python/server traffic for that port in firewall settings.
13) Loading BNNR checkpoints for inference
BNNR checkpoints include RNG state for deterministic resume. When loading
for inference with PyTorch >= 2.6, pass weights_only=False:
import torch
ckpt = torch.load(
"checkpoints/iter_1_augname.pt",
map_location="cpu",
weights_only=False,
)
model.load_state_dict(ckpt["model_state"])
model.eval()Checkpoint keys: model_state, iteration, augmentation_name, metrics,
config_snapshot, rng_state (safe to ignore for inference).
14) Choosing XAI target layers for SimpleTorchAdapter
By default, SimpleTorchAdapter picks the last Conv2d layer for XAI.
To override, pass target_layers explicitly:
# Example: EfficientNet-B0
target_layer = model.features[-1][0] # last MBConv block
adapter = SimpleTorchAdapter(
model=model,
criterion=criterion,
optimizer=optimizer,
target_layers=[target_layer],
)Common choices:
- ResNet:
model.layer4[-1] - EfficientNet:
model.features[-1][0] - ViT: last attention block (may require custom wrapper)
15) Windows: RuntimeError: An attempt has been made to start a new process
Cause:
- PyTorch
DataLoaderwithnum_workers > 0on Windows requiresif __name__ == "__main__":guard.
Fix:
- wrap your training script entry point:
if __name__ == "__main__":
main()- or set
num_workers=0in your DataLoader.
16) Multi-label: task: multilabel in YAML but training still looks single-label
Cause:
bnnr trainpreset pipelines (build_*_pipelineinpipelines.py) always useCrossEntropyLossand single-label targets, regardless oftaskin the config file.
Fix:
- integrate multi-label data with
SimpleTorchAdapter(multilabel=True)andBCEWithLogitsLoss(Golden Path), or runexamples/multilabel/multilabel_demo.py; do not expect--dataset cifar10(or similar) alone to become multi-label.
17) Detection XAI with YOLO / Ultralytics
Symptom:
- Log says detection XAI or probe prediction snapshots were skipped for an Ultralytics backbone.
Cause:
- BNNR needs the high-level
YOLO.predictpath and the same image scaling as training. A rawultralytics.nn.tasks.DetectionModelpassed throughDetectionAdapterdoes not expose that.
Fix:
- Use
UltralyticsDetectionAdapterfrombnnr.detection_adapterfor Ultralytics YOLO training. That adapter implementspredict_detection_dictsso detection XAI (activation / occlusion) and probe snapshots work withxai_enabled=True.