evalwire.uploader¶
DatasetUploader reads a CSV testset and uploads it to Arize Phoenix as one named dataset per unique tag value. It handles three conflict modes (skip, overwrite, append) and supports multi-tag rows via a configurable delimiter.
Basic usage¶
from phoenix.client import Client
from evalwire.uploader import DatasetUploader
client = Client()
uploader = DatasetUploader(
csv_path="data/testset.csv",
phoenix_client=client,
)
datasets = uploader.upload(on_exist="skip")
print(datasets) # {"es_search": <Dataset>, "source_router": <Dataset>}
CSV format¶
The CSV must contain at least a tag column, one input column, and one expected-output column:
user_query,expected_output,tags
"find cycling paths","url-a | url-b","es_search | source_router"
"find parks","url-c","es_search"
Pipe-delimited values in any column are split into lists. A row with tags = "es_search | source_router" is added to both datasets.
Conflict modes¶
on_exist |
Behaviour |
|---|---|
"skip" |
Do nothing if the dataset already exists. |
"overwrite" |
Delete the existing dataset and re-create it. |
"append" |
Call add_examples_to_dataset on the existing dataset. If not found, create it. |
Pitfalls¶
- Phoenix raises
ValueError(not a Phoenix-specific exception) whenget_datasetis called for a non-existent dataset. evalwire catches this and creates the dataset instead. - There is no official delete method in the Phoenix Python client. evalwire calls the REST endpoint
DELETE /v1/datasets/{id}directly for theoverwritemode. - Creating a dataset with a name that already exists returns a 409 Conflict error, not a new version. Use
on_exist="overwrite"to replace it.
See also¶
- Configuration Reference for
evalwire.tomlkeys - CLI Reference for
evalwire upload
evalwire.uploader
¶
DatasetUploader — uploads a CSV testset to Arize Phoenix as named datasets.
DatasetUploader
¶
Upload a human-curated CSV testset to Arize Phoenix.
Each unique value found in tag_column becomes a separate Phoenix
dataset. A row tagged with multiple pipe-delimited values is added to
each corresponding dataset.
Parameters¶
csv_path:
Path to the CSV file.
phoenix_client:
An initialised phoenix.client.Client instance.
input_keys:
Column names that form the input of each dataset example.
output_keys:
Column names that form the output of each dataset example.
tag_column:
Column used to split rows into separate datasets.
delimiter:
Delimiter used to split multi-value cells (tags and output columns).
Source code in src/evalwire/uploader.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
upload(on_exist='skip')
¶
Upload one Phoenix dataset per unique tag value found in the CSV.
Parameters¶
on_exist:
How to handle a dataset that already exists in Phoenix:
- "skip" — leave the existing dataset untouched (default).
- "overwrite" — delete and re-create.
- "append" — add new examples to the existing dataset.
Returns¶
dict[str, Any] Mapping of tag name → created/updated dataset object.