Most Copilot pilots we are asked to review at month three have not really happened. The licences were bought, the seats were assigned, two or three users tried it for a fortnight, and the question "should we roll wider" gets answered on vibes. That is not a pilot, it is a software purchase that did not get used.
A pilot is a structured experiment with a defined population, a defined duration, defined measurements, and a defined decision at the end. Five users, three months, four numbers. We have run this shape a dozen times now and the pattern holds.
Why five users
Not three (statistically thin and noisy if one is on holiday for a fortnight), not ten (the budget conversation gets harder, and the variance does not narrow much). Five is the population that gives a usable signal at a cost the business can absorb if the answer is "do not roll wider".
Who the five are matters more than how many. Three principles:
- Cover the work shape. Not five people from the same team doing the same job. A spread across the kinds of work the business does: a sales person, a finance person, a delivery lead, an operations person, an admin role. If Copilot does not earn its keep for one of these, you want to know before you commit £10,000 a year.
- Pick people who will use it. Not the loudest sceptic, not the loudest enthusiast. The colleague who tries things, gives honest feedback, and is busy enough that they will not use a tool that costs them time.
- Brief them once, properly. A 45-minute session explaining what Copilot can and cannot do, with examples from their actual work, not a generic training video. The pilot starts with people who know what they have been given.
Why three months
Two months is not enough; the team is still in the novelty phase, the workflows have not stabilised, and the early enthusiasm has not faded into the realistic mid-quarter pattern. Four months is more than is needed; the answer is usually visible by week ten and waiting longer just defers the decision.
Three months also matches the natural M365 billing rhythm and gives enough monthly check-ins to spot patterns. The shape we run:
- Week 1. Briefing, account setup, first workflow scoped per user. The pilot lead checks each user has a real first task to try by Friday.
- Weeks 2 to 4. Daily use. A 15-minute check-in at the end of week 2 and week 4. The check-in is not a survey; it is a conversation about what worked and what did not.
- Month 2. Steady-state use. A 30-minute review at the midpoint. Workflows that are sticking should be visible. Workflows that are not sticking get a second look.
- Month 3. Continued use plus a structured 60-minute review at the end. The decision conversation is the deliverable.
What to measure
Four numbers, all of which the business can collect without buying a measurement tool.
Usage frequency per user. Microsoft 365 admin gives you Copilot usage per user per app per week. Not a perfect measure (a user who opens Copilot twice a day for thirty seconds each looks the same as one who runs a forty-minute summarisation session), but a usable proxy. The pattern to look for is a steady or rising line. A user whose usage drops to zero in week six is telling you something.
Self-reported time saved per user per week. Ask the pilot users at each check-in: "Hours saved this week by Copilot, your honest estimate." The number is fuzzy by design. What matters is whether it is rising, stable, or falling, and whether the user can give an example of where the saving came from. "Three hours, mostly in meeting summaries and Outlook drafting" is a useful answer. "Hard to say, maybe an hour" means the value is not yet clear.
Quality concerns per user. Did Copilot get something wrong this week, and what happened? A non-zero answer is fine and expected. A pattern (the same user catching the same kind of mistake repeatedly) tells you which workflows to add to the band-three list in the policy.
Would you keep it? Asked at the end of month three. A binary yes or no, with one sentence of why. The sentence is the value of the question.
That is the dashboard: four numbers, ten minutes to update per user per week.
The decision conversation
The 60-minute review at the end of month three is the point. The shape:
- Each pilot user reports. Usage, hours saved, quality concerns, keep-or-drop, in two minutes each. Ten minutes total.
- Patterns across the five. Which workflows showed up everywhere? Which kinds of work was Copilot good at, which was it bad at? Twenty minutes.
- The roll-wider question. If we expanded to the next twenty users, which of those would land in band one and which would not? Twenty minutes.
- Decision. Roll wider with a defined scope, run a second pilot with a different population, or do not buy. Ten minutes.
The decision is usually clear by the end of the hour. When it is not, the answer is "run another pilot with a different team", not "buy it anyway and hope".
What to do with a mixed answer
Sometimes the pilot lands ambiguous. Two users are getting value, three are not, and the team is split on whether to expand.
Three follow-ups that work better than "vote on it":
- Look at what the two are doing differently. If their workflows are recognisably different from the three, the question is not whether Copilot is good. It is whether the business has enough of those workflows to justify the seat cost.
- Try a different five. A second pilot with a different population. Cheaper than rolling wider on a guess.
- Drop to a smaller licence count and run on. If two users want to keep it, run two seats for a quarter and revisit. Most Copilot work is annual, but some packages allow monthly true-ups; check before assuming.
Where this lands with us
The pilot shape is the part of the AI Enablement engagement that we run alongside the client. We run the briefing, the check-ins, the review at month three, and the decision conversation. The deliverables are the four numbers, the patterns synthesis, and a roll-wider plan if the answer is yes.
The pilot that lands is the pilot that is run properly. The pilot that does not is the one that gets bought, half-tried, and rolled wider because the renewal date came around. We have helped more clients unwind the second pattern than we have helped run the first, which is why this shape exists.
About to start a Copilot pilot and want a second pair of eyes on the shape? Drop us a note at info@jmopartners.co.uk.
JMO|Partners · Enterprise IT, sized for SMEs.