Self-Consistency Prompting

What is Self-Consistency Prompting?

Self-Consistency is a decoding strategy and prompting technique introduced by Wang et al. at Google Brain in 2022. It extends Chain-of-Thought prompting by sampling multiple diverse reasoning paths for the same question and aggregating their final answers via majority vote.

The core intuition: if a correct answer can be reached through many different valid reasoning chains, then generating multiple chains and picking the most common final answer is far more reliable than trusting any single chain. Reasoning errors that lead to wrong answers tend to vary between paths, while correct reasoning converges on the same answer from different angles.

The process has three steps:

Sample: Run the same Chain-of-Thought prompt N times with temperature > 0 to get N diverse reasoning paths.
Extract: Parse the final answer from each reasoning path.
Aggregate: Select the answer that appears most frequently (majority vote).

In the original paper, Self-Consistency improved Chain-of-Thought accuracy on mathematical and commonsense reasoning benchmarks by 10–18 percentage points.

When to Use Self-Consistency

🧮

Math Word Problems

Multi-step arithmetic and algebra where calculation errors in any single chain can compound to a wrong final answer.

⚖️

Logical Reasoning

Syllogisms, deductive reasoning, and multi-premise logical problems where a single wrong inference step ruins the conclusion.

📋

Multiple-Choice Questions

High-stakes multiple-choice tests (bar exams, medical licensing, standardized tests) where confidence and accuracy are critical.

🔬

Scientific Analysis

Interpreting experimental data, reasoning about causality, or evaluating competing hypotheses where the "right" answer matters.

💼

High-Stakes Decisions

Business decisions, risk assessments, and strategic choices where a reasoning error could have significant consequences.

🏥

Medical Diagnosis Support

Differential diagnosis reasoning where ruling out wrong conclusions before committing to a recommendation is essential.

How to Use Self-Consistency

1
Write a strong Chain-of-Thought prompt
Self-Consistency amplifies CoT — start with a well-crafted CoT prompt (zero-shot with "Let's think step by step" or few-shot with worked examples). The diversity of reasoning paths depends on the quality of the base prompt.
2
Set temperature above 0
Use temperature 0.5–0.8 when calling the API. Temperature 0 (greedy decoding) produces identical outputs every time — you need diversity to benefit from self-consistency. Each run should take a meaningfully different reasoning path.
3
Run the prompt N times
Sample 5–20 independent responses for the same query. More paths = higher confidence, but also higher cost. For most practical tasks, 5–10 paths provide substantial improvement with manageable cost.
4
Extract and aggregate final answers
Parse the final answer from each response. Count occurrences and select the majority answer. For text answers, you may need a normalization step or an aggregation prompt: "Given these 5 answers, what is the most common conclusion?"

Prompt Examples

Self-Consistency Setup (run this prompt 5–10 times)

Solve this problem step by step. Show all your work clearly.
State your final answer on the last line, preceded by "Answer:".

Problem: A train leaves Station A at 9:00 AM traveling at 80 mph.
Another train leaves Station B (240 miles away) at 10:00 AM
traveling toward Station A at 60 mph.
At what time will the two trains meet?

Let's think step by step.

Single-Prompt Self-Consistency Simulation

Solve the following problem using 5 different reasoning approaches.
For each approach, show the full reasoning chain and state the answer.
After all 5, identify which answer appears most frequently and
declare it the final answer.

Problem: If a store has a 30% off sale, and then offers an additional
20% off the sale price, what is the total percentage discount
from the original price?

Approach 1 — Algebraic:
[reason through algebraically]

Approach 2 — Numerical example ($100 original price):
[reason through with concrete numbers]

Approach 3 — Percentage multiplication:
[reason through as decimals]

Approach 4 — Step-by-step discount:
[reason through each discount sequentially]

Approach 5 — Common mistake check:
[deliberately test if 30+20=50% is wrong and why]

Final Answer (majority vote):

Aggregation Prompt (after collecting N responses)

I asked an AI model the same math problem 8 times and got these answers:
42, 42, 38, 42, 45, 42, 38, 42

Count the occurrences of each unique answer and identify
the majority answer. Then explain whether the minority answers
suggest any plausible alternative interpretations of the problem.

Pros and Cons

🟢 Pros	🔴 Cons
Proven 10–18% accuracy improvement on reasoning benchmarks	N× the token cost and latency of a single query
Works without any prompt changes — just run CoT multiple times	Requires aggregation logic — more complex to implement
Naturally surfaces uncertainty (low consensus = low confidence)	Not useful for creative, open-ended, or subjective tasks
Composable with CoT, Few-Shot, and Role Prompting	Majority vote can be wrong if errors are systematic (not random)

Frequently Asked Questions

What is Self-Consistency prompting?

Self-Consistency is a prompting technique introduced by Wang et al. (2022) that generates multiple independent reasoning paths for the same question and then aggregates the results, typically by selecting the most common (majority-vote) final answer. Instead of trusting a single chain of thought, it samples diverse reasoning paths and uses their collective output to improve accuracy — particularly on tasks where reasoning mistakes are possible.

How does Self-Consistency improve accuracy?

By generating multiple independent reasoning chains (typically 5–20), Self-Consistency reduces the impact of any single incorrect reasoning path. If 8 out of 10 reasoning paths arrive at the same answer through different routes, that answer is highly likely to be correct. The diversity of paths catches different types of reasoning errors that would go undetected in a single CoT trace.

When should I use Self-Consistency prompting?

Self-Consistency is best for high-accuracy reasoning tasks where errors are costly: math word problems, logical deduction, multiple-choice questions, scientific reasoning, and any task where you need high confidence in the correctness of the final answer. It is overkill for creative tasks, open-ended questions, and tasks without a single correct answer.

How many reasoning paths should I generate?

Research shows significant gains with 5–10 paths for most tasks, with diminishing returns beyond 20–40. For most practical applications, 5 paths provide a good balance between accuracy gain and token cost. If cost is a concern, start with 5; if you need maximum accuracy for a critical decision, use 20+.

How do I aggregate Self-Consistency results?

The standard approach is majority voting: run the same prompt N times (with temperature > 0 for diversity), extract the final answer from each response, and select the answer that appears most frequently. For numerical answers, this is straightforward. For text answers, you may need a normalization step or a separate aggregation prompt to identify the most common conclusion.

What temperature should I use for Self-Consistency?

Use temperature > 0 (typically 0.5–0.8) to ensure diverse reasoning paths. If you use temperature 0 (greedy decoding), all paths will be identical and self-consistency adds no value. The diversity of paths is what makes the technique work — you need the model to explore different reasoning routes, not repeat the same one.

Is Self-Consistency compatible with other techniques?

Yes — Self-Consistency is most powerful when combined with Chain-of-Thought. Each of the N paths uses CoT reasoning, producing N diverse reasoning chains. The combination (CoT + Self-Consistency) is one of the best-performing prompting approaches for mathematical and logical reasoning benchmarks. You can also combine it with Role Prompting to add expert personas to each reasoning path.