25 August, 2025

Insights

The first version of evaluation insights is now live! 🎉

With evaluation insights you can have a detailed view of your team's performance on a criterion level.

You can identify trends or issues that might need to be picked up immediately and break things down by group or individual. This allows you to see which groups or individuals might contribute the most to the each root cause and understand if the issue id driven from specific individuals or is an org-level issue.

Made fixes in several places to fix redirection from reporting to evaluations table.

Also, we've hidden scores of archived workloads from AutoQA > Evaluation scores.

AutoQA

We've released AutoQA Optimisation! For each criterion, we automatically collection and analyse all the co-pilot and reviewed AutoQA evaluations which were corrected, compare then against the instructions and suggest new optimised instructions.

We also backtest the new optimised instructions in past data and calculate an estimate of how much it can reduce the errors.

You can click "Review and apply", do edit changes you want in the new instructions, and either apply them or reject them if you're not happy with the content or the performance.

circle-info

If you already have "Pending" optimised instructions for a criterion, you will need to either accept or reject them before you receive another, new suggestion.

We've improved how we pass the context to AutoQA to avoid issues with formatting.

Scorecard

We've fixed the scorecards table and form to not consider "Soft" criteria in the Max score and Pass score calculations.

Also, on the form, we show the type of each criterion

Misc

In both evaluations and disputes, we now support all the basic markdown functionalities, so you can better format feedback and communication between evaluators and reviewees.

We've updated the "Reviewees" drop-down filter across the app to make more comprehensive.

Fix to not allow calibration sessions with the same name.

On criterion form, we've also made a fix to ensure rating always appear in descending order and validation doesn't fail.

Handled cases when there are multiple open tabs to not allow creation of multiple in progress evaluations for the same ticket.

Last updated

Was this helpful?