What is The Test Strategist?

The Test Strategist is a 3-step prompt flow for building test suites that actually catch bugs — not just test suites that pass CI and hit coverage thresholds. It chains CRISPE to frame the model as an experienced test engineer reasoning over the real risk surface of your code, Self-Consistency to generate multiple independent coverage strategies and find consensus on what matters most, and Few-Shot to anchor the final test output to your codebase's real style and patterns.

The flow addresses the most common testing anti-pattern: generating tests bottom-up from the code, which produces tests that cover what exists rather than what could go wrong. Risk surface first, consensus strategy second, style-anchored generation third.

When to Use The Test Strategist

🐛

Critical Business Logic

Testing payment processing, authentication flows, data transformations, or any code where a bug has direct business impact.

🔄

Refactoring Coverage

Adding tests before a refactor to ensure behavior is preserved — where you need confidence, not just coverage percentage.

🧩

Complex Integration Points

Testing code at the boundary between two systems — where assumptions on both sides can be wrong simultaneously.

📊

Data Processing Pipelines

Validating ETL steps, aggregation logic, and data transformations where edge cases in input data cause silent failures.

🔐

Security-Sensitive Code

Testing authorization checks, input sanitization, and rate limiting where a missed edge case is a vulnerability.

🆕

New Team Onboarding

When a new team takes over a module and needs to understand the risk surface before making changes.

The Flow Algorithm

1

CRISPE — Map the Risk Surface

Use CRISPE to frame the analysis: Capacity (senior test engineer with experience in this domain — payment processing, auth systems, etc.), Role (your task is to identify every failure mode, not to generate tests), Insight (here is the code and its business context), Statement (produce a ranked risk map), Personality (be specific and skeptical — assume the implementation has bugs), Experiment (identify the 3 highest-risk areas and what test types would catch failures there). The Insight section should include the actual code plus business context that explains what a bug would cost.

Produces:

A ranked risk map: the top failure modes by likelihood and impact, the specific edge cases and boundary conditions within each, and the test types (unit, integration, property-based, etc.) most likely to catch each class of failure.

2

Self-Consistency — Validate Coverage Priorities

Generate three independent coverage strategies from the same risk map using Self-Consistency: ask the model to approach the coverage question three times with different reasoning paths (e.g., from the perspective of a developer who has been paged at 3am, a security auditor, and a QA engineer doing exploratory testing). Look for consensus: which risk areas do all three strategies agree need tests? Which areas only one strategy flags? High-consensus areas get tests first; single-strategy flags get investigated before investing test time.

Produces:

A validated coverage strategy with confidence levels — high confidence on consensus items, flagged-for-investigation on single-strategy items. This prevents both under-testing critical paths and over-testing low-risk code.

3

Few-Shot — Generate Style-Anchored Tests

Provide 3 existing tests from your codebase as Few-Shot examples — ideally one simple unit test, one test with mocking, and one async or integration test. Then ask the model to generate tests for the high-consensus risk areas identified in Step 2, using the same patterns, assertion style, naming conventions, and setup/teardown structure as the examples. The Few-Shot anchoring ensures the output integrates without a style review cycle.

Produces:

Test cases targeting the validated highest-risk areas, written in the codebase's actual style — ready to commit alongside the code with minimal review friction.

Example Prompt Sequence

Step 1 — CRISPE Risk Surface Analysis

Capacity: You are a senior test engineer with 10+ years of experience writing tests for financial transaction systems. You have been paged at 3am because a bug in code like this caused a production incident.
Role: Your task is to produce a ranked risk map for this code — NOT to write tests yet. Identify failure modes, edge cases, and boundary conditions.
Insight: This is a discount code validation function for an e-commerce checkout. A bug here either gives users unauthorized discounts (revenue loss) or incorrectly blocks valid codes (lost sales).
Statement: Produce: (1) top 5 failure modes by likelihood × impact, (2) the specific inputs or states that trigger each, (3) what test type would catch each.
Personality: Be specific and skeptical. Assume there are bugs. Do not flag obvious happy-path cases.
Experiment: For each failure mode, write a one-line hypothesis: "If [condition], then [failure]."

[PASTE FUNCTION CODE HERE]

Step 2 — Self-Consistency Coverage Validation

Using the risk map below, generate three independent test coverage strategies. Approach the problem from three different perspectives:

Perspective A: A developer who was paged at 3am when this code broke in production — what tests would have caught it?
Perspective B: A security auditor looking for discount abuse vectors — what inputs would they try?
Perspective C: A QA engineer doing exploratory testing on the checkout flow — what user behavior would they test?

For each perspective, list the top 5 test cases in priority order.
Then: identify which test cases appear in at least 2 of the 3 perspectives (high confidence) vs. only 1 (investigate further).

Risk map from Step 1:
[PASTE STEP 1 OUTPUT HERE]

Step 3 — Few-Shot Style-Anchored Generation

Here are 3 examples of tests from our codebase. Study the style: assertion library, naming convention, describe/it structure, mock setup pattern, and error message format.

Example 1: [PASTE EXISTING TEST]
Example 2: [PASTE EXISTING TEST]
Example 3: [PASTE EXISTING TEST]

Now write tests for the following high-confidence risk areas (from Step 2 consensus):
1. [RISK AREA 1]
2. [RISK AREA 2]
3. [RISK AREA 3]

Match the style of the examples exactly. Do not introduce new assertion libraries or patterns not present in the examples.

Pros and Cons

Strengths

  • Risk-driven approach catches bugs that coverage metrics miss
  • Self-Consistency reduces blind spots in coverage strategy
  • Few-Shot style anchoring eliminates review friction on generated tests
  • Works for any language, framework, or test runner
  • Produces a risk map artifact useful beyond testing

Trade-offs

  • More setup than "generate tests for this function"
  • Requires existing test examples for optimal style anchoring
  • Self-Consistency step requires reviewing 3 independent outputs
  • Not optimized for simple utility functions with obvious test cases

Frequently Asked Questions

What is The Test Strategist prompt flow?

The Test Strategist is a 3-step prompt flow that chains CRISPE, Self-Consistency, and Few-Shot to build test suites that catch real bugs rather than just hitting coverage metrics. It reasons about risk before writing tests, validates coverage priorities through multiple independent strategies, and anchors test output to the codebase's actual style using examples.

Why start with risk analysis instead of just generating tests?

Most AI-generated tests cover the happy path and obvious inputs — which is exactly where bugs aren't. CRISPE's Capacity and Insight parameters force the model to reason as an experienced test engineer who has seen the specific failure modes of this type of code. The output is a ranked risk map, not a list of test cases, so every test you write is justified by a specific risk.

What does Self-Consistency add to test strategy?

A single coverage strategy reflects one reasoning path, which may have blind spots. Self-Consistency generates three independent strategies from the same risk map and looks for consensus: if all three strategies agree that the authentication edge cases are highest priority, that consensus is high-confidence. If only one strategy flags a risk area, that's a signal to investigate further before investing tests there.

Why use Few-Shot for the actual test generation?

Tests that don't match the codebase's existing style create friction — they use different assertion libraries, naming conventions, or setup patterns. Few-Shot locks the generated tests to 3 existing tests from the real test suite, so the output looks like a human wrote it and integrates without style review cycles. The style anchoring is as important as the content.

How many existing test examples should I provide in Step 3?

Three examples is the recommended minimum — one for a simple unit test, one for a test with mocking, and one for an integration or async test if applicable. The examples should represent the actual patterns in use, not the cleanest tests in the suite. You want the model to match what's there, not produce idealized tests that conflict with the codebase's conventions.

Can this flow work for a codebase with no existing tests?

Yes, with a modification. In Step 3, instead of providing existing test examples, provide the testing framework documentation and 2-3 examples from the framework's own docs as your Few-Shot examples. Clearly state in the prompt that these are canonical style examples, not codebase examples. The output will match the framework's conventions, which is a reasonable starting point for a greenfield test suite.