Troubleshooting¶
"RuntimeError: Event loop is closed"¶
This happens when an async task is called multiple times and the event loop created by the first call was closed before the next call. evalwire uses a per-thread persistent event loop (since v0.2.2) to prevent this. If you see it:
- Make sure you are on evalwire
>=0.2.2. - Do not call
asyncio.run()inside your task function. Useawaitdirectly.
Phoenix connection errors¶
Symptom: ConnectionRefusedError or httpx.ConnectError when running evalwire upload or evalwire run.
Causes and fixes:
- Phoenix is not running. Start it with
docker compose up -dor follow the Phoenix docs. - The wrong endpoint is set. evalwire uses the
PHOENIX_COLLECTOR_ENDPOINTenvironment variable. Check it points to your Phoenix instance: - A firewall or Docker network is blocking the port.
"No CSV path provided"¶
Either pass --csv data/testset.csv on the command line or add this to evalwire.toml:
Dataset not found during evalwire run¶
evalwire looks up each experiment directory name as a Phoenix dataset. If the dataset does not exist, the run fails. Make sure you have run evalwire upload first and that the directory name matches the dataset tag exactly (case-sensitive).
Run a subset to narrow down which experiment is failing:
Import errors for optional extras¶
ImportError: langgraph is required
Install the langgraph extra:
ImportError: langchain-core is required to use make_llm_judge_evaluator
Install the llm-judge extra:
CSV formatting issues¶
Rows appear in the wrong dataset or not at all.
- Check that the tag column name matches
tag_column(default:"tags"). Use--tag-columnor set it inevalwire.toml. - Check that the delimiter matches
delimiter(default:"|"). If your values contain pipes for other reasons, choose a different delimiter and set--delimiter. - Rows with an empty tag cell are skipped silently.
expected_output is not parsed correctly.
The _parse_expected helper uses ast.literal_eval on string values. This means numeric-looking strings like "0" or "1.5" are converted to Python numeric types (0, 1.5), which can cause type mismatches in string-based evaluators. Use distinct string values for expected outputs when possible.
"At least one evaluator is required"¶
Phoenix raises this when run_experiment is called with an empty evaluator list. Make sure each experiment directory contains at least one .py file (other than task.py) that defines a callable matching the filename.
Experiment name conflicts¶
Phoenix experiment names include a timestamp, so duplicate names are unlikely. If you need to re-run the same experiment, the old results are preserved and the new run appears alongside it in the UI.