AI Tutor Coach

Designing Human-Centered AI for Scalable Supervision to Support Invisible Work

I led the design of Tutor Coach, an AI-powered dashboard that automates repetitive supervision tasks for managing 100+ remote tutors.

PLUS - Personalized Learning Squared, is a tutoring platform that combines human & AI tutoring to boost learning gains for middle school students from historically underserved communities. The platform supports over 3,000 students and 500 tutors, completing more than 90,000 hours of tutoring each month.

Instead of replacing human judgment, the tool amplifies human work without losing the human heart.

The problem

Watch LOTS OF zoom recording sessions.

For supervisors, manual reviewing 100+ sessions weekly was exhausting and inefficient—leaving no room for deep insights or proactive coaching.

How might we automate the tedious parts of supervision?

Early research

Ease this burden, but designing AI?

We saw an opportunity to ease this burden by automating low-value tasks.

Before jumping into design, I partnered with researchers and engineers to align on how AI could responsibly support this shift—without compromising human oversight.

We're crafting the narrative by understanding AI usages and edges.

We using confusion matrix ,user flows, and study on capabilities to understand AI usage.

We're crafting the narrative by fitting humanity-in-the-loop.

We conducted co-design with end-users to align what they expect for AI to help.

AI design principles

What should we follow to bridge between technical & humanity?

After conducing AI evaluations and talk with end users, we should focus on those informed AI design principles.

Automate detection, not judgment
Facilitated more alignment on what should vs shouldn’t be automatedIt would be difficulty identifying which financial metrics require immediate attention.
Prioritize explainability over complexity
By creating scoring matrices (desirability × feasibility × explainability).
Keep humans in charge of outcomes
Designs need to draft ethical constraints for feature development.

Design goals

Where AI Could Help, What to Leave to Humans

From user interviews and technical feasibility reviews, we identified clear opportunities for AI to reduce manual workload—without overstepping human judgment.

Supervisors didn’t need AI to evaluate emotional nuance or replace their decisions; they needed help surfacing what matters most.

Business Goal: Efficiently

Scale tutor supervision efficiently without risking unjust churn or eroding trust.

User Goal: Speed

Supervisors need to reduce time spent on repetitive performance reviews while maintaining fairness & human oversight.

Design Goal: Humanity

Design AI features that automate pattern detection while keeping humans in control of final decisions.

Solution #1

Engagement measurements

Can AI understand "engagement?"

Supervisors were spending hours watching Zoom recordings to gauge whether tutors were engaged.

They wanted help detecting signals like “warmth” or “proactiveness,” which led us to initially use AI models to score sentiment, tone, and conversational pace.

Unreliable, Biased & Infeasible

Our early approach faced multiple challenges that Ai can't do because models were unreliable, biased, and technically infeasible for product development.

NLP-models Misinterpreted
NLP models misinterpreted accents, low-quality audio, or quiet speakers.
Culturally biased and unexplainable
Emotional scoring felt culturally biased and unexplainable.
Unstable Performance
High technical complexity and low confidence rates made the system unusable.

Design Decisions

AI into a tracker, not an enforcer

We redesigned the system to surface tutor-level performance patterns while keeping humans in control: AI identifies repeated issues but doesn't act on them.

Supervisors receive trend summaries and alerts in the dashboard and they can override and notes ensure decisions.

👥 User Impact

Gave supervisors control and context over tutor issues, improving fairness and reducing micromanagement.

💼 Business Impact

Saved supervisor time and improved evaluation consistency by replacing vague cues with actionable data.

Solution #2

Keep tracking humanity

Tracking Tutor is Manual

Supervisors were manually documenting performance issues like no-shows by checking Excel sign-up sheets and matching them with Zoom attendance logs.

It was time-consuming and unreliable to track patterns at the tutor level—supervisors could only see if a single session was missed, not how often someone missed sessions overall.

Initial Approach: Too Robotic, Too Many False Alarms
We designed AI to detect common performance issues (e.g., missed sign-ups, late logins, no-shows) and automatically issue warnings once a threshold was reached. The goal was to summarize session data into tutor-level performance insights and scale up intervention.

AI makes harsh decisions

Early AI models were too opaque or made decisions on their own, eroding trust. Supervisors wanted assistance, not automation without explanation.

Real Impact & Recognition

Tutor Coach brought measurable impact across the PLUS platform

Tutors received fairer, more transparent feedback, while supervisors saved time by focusing only on sessions that needed attention.

+25%

AI insight interaction rate

+92%

Override usage on AI-flagged warnings

-75%

Reduction in avg. time / week / per supervisor

Reflection

(1) Ai and Humanity?

AI doesn’t need to feel human—it needs to make humans feel confident.

(2) Great AI UX means knowing what not to automate.

The value of AI isn’t just speed—it’s clarity, focus, and trust.