Analytics
Read Bilbis spend, quality, engine usage, and monthly budget alerts.
What this means
Analytics helps you understand how Bilbis is being used and what it costs. It combines spend, completed pipeline count, quality scores, engine breakdowns, and budget alerts in one place.
Use Analytics to answer questions like:
- How much did Bilbis spend this month?
- Which engines are used most often?
- Which repositories or engines have higher cost?
- Are quality scores improving or dropping?
- Should we add or adjust a monthly budget alert?
Before you start
You need access to the organization you want to review.
For useful charts, the organization needs completed pipelines. In-flight pipelines are not included in the Analytics cost rows yet.
Choose a time window
The top-right Window selector changes most trend cards. The available windows are:
| Window | Use it for |
|---|---|
| Last 7 days | Recent changes, debugging a spike, checking this week's activity. |
| Last 30 days | Normal monthly operating review. |
| Last 90 days | Longer trends and engine comparisons. |
The This month KPI and budget alerts use the current calendar month, not the selected window.
KPI reference
| KPI | What it means | Notes |
|---|---|---|
| This month | Sum of completed pipeline costs in the current calendar month. | Used by budget alerts. |
| Spend - last window | Total cost in the selected 7, 30, or 90 day window. | Aggregated across all engines. |
| Pipelines - last window | Completed pipeline runs in the selected window. | Based on completed runs in engine stats. |
| Avg quality | Weighted average quality score on a 0-100 scale. | Weighted by pipeline count so one run does not dominate the score. |
Charts and tables
| Area | What it shows | How to read it |
|---|---|---|
| Cost trend | Daily LLM spend stacked by engine. | Look for sudden spikes and which engine caused them. |
| Quality trend | Average quality score per day on a 0-100 scale. | Use it as a directional signal, not a replacement for code review. |
| Quality by engine | Average quality score per engine across completed pipelines. | Compare engines only when each has enough runs to be meaningful. |
| Engine x repo breakdown | Engine, repo, run count, average cost, total cost, average quality, and success rate. | Use it to find expensive repos, low-success combinations, or engines worth changing. |
Budget alerts
Budget alerts send email notifications when monthly spend crosses a threshold. Each alert can trigger once per calendar month and resets on the 1st.
Field reference
| Field | What it means | Default |
|---|---|---|
| Name | Short label for the alert. It appears in the alerts list and email subject. | Required |
| Monthly threshold (USD) | Dollar amount that triggers the email when current-month spend crosses it. | 100 |
| Notify email | Email address that receives the alert. Leave blank to use the organization's default billing contact. | Optional |
| Enabled | Whether the alert can send email. Disabled alerts stay saved but do not notify anyone. | On |
Reading alert status
| Status | What it means |
|---|---|
| Disabled | The alert is saved but will not send email. |
| Near threshold | Current monthly spend is at least 80% of the alert threshold. |
| Breached | Current monthly spend has reached or passed the threshold. |
| Last fired | The last date this alert sent an email. |
Steps
Create a budget alert
- Open Analytics.
- Go to Budget alerts.
- Select Add alert.
- Enter a name.
- Set the monthly threshold in USD.
- Add a notify email, or leave it blank to use the default billing contact.
- Keep Enabled on if the alert should send email.
- Create the alert.
Delete a budget alert
- Open Analytics.
- Find the alert.
- Select the delete button.
- Confirm the deletion.
Deleting an alert removes future notifications for that threshold.
A note on quality scores
Quality scores come from agent self-assessment plus reviewer feedback. They are useful for spotting changes over time, but they are not a substitute for human code review.
Treat quality as a signal. A low score means "inspect this area," not automatically "the code is bad." A high score means the agents and reviewer feedback were positive, not that the change is risk-free.
Problems and fixes
| Problem | What to check |
|---|---|
| Analytics is empty | Complete at least one pipeline. Most cards rely on completed runs. |
| Spend does not include a running pipeline | Current Analytics cost rows include completed pipelines only. Check the pipeline detail page for in-progress cost. |
| A budget alert did not send | Confirm the alert is enabled, the threshold was crossed this calendar month, and the notify email or billing contact is correct. |
| A budget alert only sent once | This is expected. Each alert triggers once per month and resets on the 1st. |
| Quality looks noisy | Use a longer window and compare engines only after several completed runs. |
| An engine looks expensive | Check the engine x repo breakdown, then consider a cheaper engine, a smaller task, or a stricter budget cap for future pipelines. |