Configuration Reference¶
evalwire can be driven entirely from a evalwire.toml file at the root of your project. CLI flags always take precedence over config file values, which take precedence over hardcoded defaults.
Precedence¶
Example file¶
[dataset]
csv_path = "data/testset.csv"
on_exist = "skip"
input_keys = ["user_query"]
output_keys = ["expected_output"]
tag_column = "tags"
delimiter = "|"
[experiments]
dir = "experiments"
prefix = "eval"
concurrency = 4
[dataset] section¶
Used by evalwire upload.
| Key | Type | Default | Description |
|---|---|---|---|
csv_path |
string | none (required) | Path to the CSV testset file. |
on_exist |
"skip" / "overwrite" / "append" |
"skip" |
How to handle a dataset that already exists in Phoenix. |
input_keys |
list of strings | ["user_query"] |
CSV column names treated as example inputs. |
output_keys |
list of strings | ["expected_output"] |
CSV column names treated as expected outputs. |
tag_column |
string | "tags" |
CSV column used to split rows into separate datasets. |
delimiter |
string | "\|" |
Character used to assign a row to multiple datasets. |
on_exist modes¶
| Value | Behaviour |
|---|---|
skip |
Leave the existing dataset untouched. |
overwrite |
Delete the existing dataset and re-create it from the CSV. |
append |
Add the new rows to the existing dataset. |
[experiments] section¶
Used by evalwire run.
| Key | Type | Default | Description |
|---|---|---|---|
dir |
string | "experiments" |
Path to the experiments directory. |
prefix |
string | "eval" |
Prefix prepended to every experiment name in Phoenix. |
concurrency |
integer | 1 |
Number of experiments to run in parallel. |
[phoenix] section¶
Reserved for future use. Currently read by get_phoenix_config() but not consumed by the CLI.
CLI flag reference¶
evalwire upload¶
| Flag | Config key | Default |
|---|---|---|
--csv PATH |
dataset.csv_path |
none |
--on-exist MODE |
dataset.on_exist |
skip |
--input-keys COLS |
dataset.input_keys |
user_query |
--output-keys COLS |
dataset.output_keys |
expected_output |
--tag-column COL |
dataset.tag_column |
tags |
--delimiter CHAR |
dataset.delimiter |
\| |
--config PATH |
n/a | ./evalwire.toml |
evalwire run¶
| Flag | Config key | Default |
|---|---|---|
--experiments PATH |
experiments.dir |
experiments |
--name NAME |
n/a | all experiments |
--prefix PREFIX |
experiments.prefix |
eval |
--concurrency N |
experiments.concurrency |
1 |
--dry-run [N] |
n/a | off |
--config PATH |
n/a | ./evalwire.toml |
--name is repeatable: evalwire run --name es_search --name source_router.
--dry-run runs the task but does not upload results to Phoenix. Optionally accepts a number to limit the run to the first N examples per dataset.