Public Benefits Bench v1.1

Partners in Evaluation

Key Takeaways

No general-purpose AI model performs well enough to be trusted with SNAP benefits guidance. The top-performing model, Claude Opus 5, provided correct answers to SNAP-related questions only 76.9% of the time, meaning beneficiaries using current AI systems still get incorrect answers nearly a quarter of the time.
How models use their tools matters more than what they already know. Since SNAP rules shift constantly and vary by county, the real unlock for AI labs isn’t baking every rule into model weights: it’s improving how models ask follow-up questions and use search tools.
To reliably answer high-stakes benefits access questions, we need better grounding, not better models. General-purpose AI fails on SNAP in part because the knowledge it needs — internal manuals, policy notices, local procedures, and SNAP training for eligibility workers — sits with state agencies. Fixing this requires agencies and developers to collaborate on narrowly scoped chatbots that draw on state-held information through retrieval-augmented generation (RAG), rigorously tested against real questions from each state’s clients.

The narrative analysis below reflects the four-condition study first published on June 9, 2026. Models added later are evaluated only in the “both” condition and may appear in live leaderboard visualizations, but do not update this historical analysis.

Background

The Supplemental Nutrition Assistance Program (SNAP) is the largest nutrition assistance program in the United States, distributing roughly $95 billion to over 37 million low-income families in fiscal year 2025 alone.

But accessing and keeping those benefits is complex. SNAP is federally-funded but state-administered, and in ten states county-administered, so income thresholds, work requirements, and recertification timelines all vary based on where you live. The same family can be eligible in one state and denied in another. The program is also currently undergoing its most significant policy changes since the 1990s, increasing confusion for clients and caseworkers alike.

That complexity falls on both caseworkers and clients. Call wait times at state offices routinely exceed an hour, and even well-staffed state agencies struggle to translate thousands of pages of statute into clear, plain-language answers for clients. And with LLMs becoming widely available to the general public (including the integration of Gemini into Google search results), the public will increasingly turn to general purpose LLMs to answer their benefits questions if they cannot reach a more trusted source. This led us to the question: are current general-purpose LLMs able to serve as a first point of contact for those seeking or receiving SNAP benefits?

This is exactly the kind of question Center for Civic Futures (CCF) exists to help answer. Through its Public Benefit Innovation Fund, CCF funds responsible, real-world experimentation to better understand where emerging technologies can improve public services, where they fall short, and what governments need to use them effectively and safely. With support from CCF, we were able to put this question to the test.

If LLMs prove reliable, the impact is tangible: fewer missed deadlines, fewer wrongful denials, and less pressure on stretched caseworkers. If not, the risk is equally tangible and quite serious: people making decisions about their benefits based on incorrect AI-generated guidance.

To study this, we built a benchmark of 459 generalized SNAP question scenarios, with expected response rubrics validated by SNAP policy experts based on applicable SNAP policy and practice considerations. We evaluated leading AI models across four conditions (the raw model itself, with web search capabilities only, with the ability to conduct multi-turn conversations only, and with both web search and multi-turn abilities) to measure not just how well AI can answer, but what kind of support it needs to answer well.

Results

Overall Model Performance

Model Performance Summary (Pass Rate %)

Pass rate across four evaluation conditions, sorted by "Both" score

	Neither	Web Search	Multi-Turn	Both
Claude Opus 4.8	36.6	40.6	60.9	68.1
MiniMax-M3	26.9	37.8	51.6	64.1
DeepSeek V4	31.5	35.5	53.3	62.9
Claude Sonnet 4.6	29.1	35.1	54.1	62.4
GLM 5.1	26.0	35.4	53.0	61.8
GPT 5.5	32.2	35.7	58.9	60.9
Gemini 3.5 Flash	37.9	40.6	56.4	59.5
Kimi K2.6	23.0	30.7	50.4	56.6
Claude Haiku 4.5 (Nonthinking)	15.2	22.5	41.0	54.3
Gemini 3.1 Pro Preview (02/26)	34.7	32.1	57.2	53.8
Grok 4.3	24.8	27.0	49.1	51.7
Grok 4.1 Fast (Reasoning)	20.4	18.9	46.2	44.8

Claude Opus 4.8 leads the “both” condition at 68.1% accuracy, with MiniMax-M3 next at 64.1%. The frontier of SNAP performance has moved up, but it still isn’t particularly high. No model crosses 70%, meaning every model in the field falls short.

Models’ Performance

Performance by Lifecycle Phase

SNAP questions don’t all look the same. Someone asking “Am I eligible?” faces a very different challenge than someone asking “My hours got cut—do I need to report that?” Our SNAP policy experts categorized all 459 questions into seven phases of the SNAP lifecycle:

Pre-Application: Assessing eligibility questions and informational needs prior to initiating the application process.
Application/Interview: Completing the application, preparing for interviews, or following up on submitted documentation.
Denial/Termination: Contesting benefit denials or benefit terminations resulting from recertification.
Change Reporting: Reporting changes in circumstances and resolving resulting impacts on active benefits.
Recertification: Renewing benefits through the renewal application or recertification interview process.
Use of Benefits: Managing benefit delivery timing, EBT card usage, and account balance inquiries.
Overpayment/Fraud: Addressing overpayment collections, calculation disputes, fraud investigations, or fraud reports.

LLM performance varied meaningfully by phase, although in no phase did it perform with reasonable enough accuracy to be trusted.

Phase Difficulty Ranking

Pass rate by SNAP phase—Both condition

Overpayments and fraud was the most challenging phase across the board at 41.5%. It’s also the smallest subset in the sample (n=6), reflecting that these questions arise less frequently than other phases. This may partly explain the poor performance, as models have less training signal and fewer available sources to draw from on rarer topics.

Phase Profiles: Top-3 vs Bottom-3 Models

Pass rate by SNAP phase—Both condition

When comparing phase-wise performance between the top- and bottom-3 models, what stands out is how similar their shapes are. Outside of the overpayments/fraud phase—the percentage accuracies of which may have swung drastically due to the small sample size—LLMs across the board generally share similar strengths and weaknesses. This means that weaker models don’t fail on different phases, they fail on the same phases by wider margins. This consistency suggests that as models improve, the areas where AI can reliably assist beneficiaries, and where it can’t, are likely to become increasingly predictable.

Performance by Administration Structure

SNAP is state-administered in 43 jurisdictions and administered at the region, district, or county level in 10 states—including high-volume states like California, New York, and Ohio. We might expect AI to perform worse in county-administered states, where local offices have more discretion over procedures and operations, introducing variation that models struggle to capture.

Performance Distribution by Administration Type

Both condition · all models averaged

The data tells a more nuanced story. Median pass rates are similar between the two groups (59.9% for state-administered vs. 57.9% for county-administered), and their interquartile ranges nearly overlap. Administration structure alone does not predict AI performance — and though county-administered states add an extra layer of procedural variation that models have to navigate, those are generally high-volume states which models may have answers in their training data for.

Performance by Topic

When looking at a more granular breakdown of performance by topic, the dispersion grows. In other words, question topic predicts difficulty far better than lifecycle phase: phase-level pass rates span roughly 20 points (41.5%-61.1%), while topic-level pass rates span more than 40 — from certification periods at 86.1% down to denial at 41.2%. This makes topic-level performance a more fine-grained lens for identifying where models underperform.

Topic Difficulty Ranking

Lower = harder—Both condition, all models averaged

Examining the topic-level breakdown, our SNAP experts identified three patterns about where models struggled:

1. Procedural differences break AI. Models consistently struggle when the answer depends on state, county, or local office procedures—the kind of operational detail that isn’t published in federal policy manuals or state websites. Topics like EBT card replacement and missed interview rescheduling require knowing what your specific office does, and that’s where every model falls short.

2. Recent policy changes trip models up. Questions about HR1 policy changes, like immigrant eligibility, scored lower on average. The exception was work requirement exceptions, which ranked in the upper half. We suspect this reflects the recent wealth of discussion around SNAP work requirements, which the latest models may have trained on. Immigrant eligibility changes were announced at the same time, but the apparent lack of discussion around them may explain the performance gap.

3. Complexity compounds errors. When a question requires synthesizing multiple interacting rules—household composition, income deductions, and state-specific thresholds all at once, or calculating self-employment income deductions—accuracy drops. Some of the highest-scoring topics may simply reflect that those particular questions were more straightforward, not that the topic itself is easy.

Models’ Use of Tools

What Helps More: Web Search vs. Conversation

Real caseworkers rely on two things general-purpose models don’t have by default: the ability to ask follow-up questions, and access to current policy references. We tested four conditions to isolate how much each capability—web search and multi-turn conversation—improves performance:

Average Pass Rate by Condition (all models)

Criteria-weighted—all models averaged

	No Web Search	Web Search
Single-turn	28.2% Neither	32.7% Web Search
Multi-turn	52.7% Multi-turn	58.4% Both

Multi-turn conversation—where the AI can ask follow-up questions, just like a caseworker would—is by far the bigger lever. Across all models, switching from single-turn to multi-turn improved scores by roughly 24.5 percentage points, while adding web search alone improved scores by only about 4.5 points.

This makes intuitive sense. In SNAP eligibility operations, it’s a known fact that the best way to improve SNAP quality control is a well-done interview. The more LLMs behave more like human caseworkers—the more questions they ask, and the better those questions are—the more accurate their response.

Web search helps for different reasons, addressing the problem of incomplete knowledge—in other words, filling gaps where a model’s training lacks coverage of a particular state or county’s policies (coverage) or where rules have changed since the model was last trained (recency). Both gaps are common, and pre-training alone can’t solve either problem; real-time access to current policy sources can.

The practical implication: any deployment of AI in a benefits context should prioritize conversational capability alongside access to searching the web for up-to-date policy information.

Building on these findings, an important area for future research is how well the models perform with tools when conditioned on trusted official documents.

Tool Use Matters More Than Pre-Training

A natural assumption is that the models with the most SNAP knowledge baked into their training data will perform best. The data tells a different story.

Performance Across Enhancement Conditions

Conditions are independent, not cumulative

GLM 5.1 starts near the bottom of the pack in the baseline (“neither”) condition at 26.0% — fifth-lowest in performance. But with the tools, it climbs towards the top of the pack, gaining 9.4 percentage points with web search capabilities, 27.0 percentage points with multi-turn conversations, and 35.9 percentage points with both overall. A model which began more than 10 points behind the top-performing “neither” model (Claude Opus 4.8, at 36.6%) ends up about 6 percentage points behind the leader.

With its overall gain of 35.9 percentage points over baseline, GLM 5.1 posted one of the largest absolute gains in the evaluation. Claude Haiku 4.5 (+39.0 points) and Minimax M3 (+37.3 points) climbed even further from baseline to the “both” condition.

In an intricate domain like SNAP, no training dataset can anticipate every beneficiary’s situation. The models that succeed are the ones that compensate for knowledge gaps by searching effectively and asking the right questions.

But not all models use their tools equally well. We measured how many web searches and conversation turns each model used per question:

Web Searches vs Accuracy

Average web searches per question against pass rate. Squares: web search only. Diamonds: both (multi-turn + web search).

Kimi K2.6 ran the most web searches by far—about 58 per question—yet landed at only 56.6% accuracy. Sheer volume didn’t buy accuracy.
Opus 4.8 averaged roughly 19 web searches per question and scored the highest at 68.1% — high accuracy without the highest search volume.
Minimax-M3 was an example of efficient web searching, averaging about 33 web searches per question and reaching 64.1% accuracy.
Grok 4.1 Fast used among the fewest searches (~13 per question) and scored the lowest (44.8%).

The relationship between search volume and performance is noisy. Heaviest searchers don’t necessarily perform better, and the ability to perform targeted searches matters far more than search volume.

Output Tokens vs Accuracy—by Condition

One dot per (model, condition); shape encodes the condition, color the model. X-axis is total output tokens for the run (log scale).

On output tokens, the pattern holds. Within any given condition, more tokens don’t reliably mean better answers. Minimax-M3 is a standout on both dimensions: it achieves ≈64% accuracy (close behind Opus 4.8 at 68.1%) in the “both” condition with about half the output tokens. When users are seeking actionable advice, concision is key.

The takeaway for AI developers is clear: optimizing how models use tools matters more than giving them more tools to use. The best models on this benchmark don’t search the most or reason the longest—they know what to look for, when to stop, and how to synthesize what they find.

Cost, Latency, and Who Gets Access

Efficient tool use doesn’t just produce better answers—it also produces cheaper, faster ones. Every unnecessary search and every excess token adds cost and latency, so the same optimization that separates top-performing models also determines whether those models can realistically reach the people who need them most.

Currently, many of the closed-source models that perform best on this benchmark are, for the most part, not freely available. Some of these sit behind premium API pricing or “Pro” subscription tiers—meaning the people who would benefit most from reliable AI-assisted SNAP guidance are often the least likely to have access to the tools that provide it.

Cost, Latency & Accuracy

The cost data makes the gap between closed- and open-source models concrete. With accuracies within about 4 percentage points of one another, Claude Opus 4.8 costs $1.89 per test while Minimax-M3 costs a fraction of that at $0.25 per test.

Latency is high across the board. Claude Opus 4.8 takes around 15 minutes to answer a question on average, and even the faster Minimax-M3 isn’t far behind at roughly 14 minutes. A model that takes that long isn’t practical, regardless of its accuracy.

As model capabilities continue to improve, the question isn’t just how good AI can be at navigating public benefits—it’s who gets to use it. Ensuring that frontier capabilities reach the populations they could help most through improving the efficiencies of models should be an imperative, not an afterthought.

What This Means for SNAP State Agency Leadership and Staff

The bottom line: general-purpose AI is not yet a reliable source of information about SNAP, and states are well-positioned to help address the misinformation these tools can produce. Under the most realistic conditions we tested, nothing suggests AI is to be trusted with benefits guidance. Clients almost certainly aren’t aware of these issues and are already turning to ChatGPT, Claude, and other publicly available LLMs for the same kinds of questions in our dataset. Based on our results, they are likely receiving misleading, incomplete, or outdated information. This is the reality that agencies and advocates need to contend with regardless of whether they ever deploy AI themselves.

In the near term, states can head off misinformation directly by steering residents away from LLMs and toward trusted sources:

Providing clear, plain-language information about frequently-asked topics could reassure clients and reduce the temptation to ask LLMs high-stakes questions instead of consulting trusted sources.
Publishing more authoritative policy and process documentation online which is a first step that lays the groundwork for the longer-term opportunities below.

Over the longer term, investments in customer support technology could make a real difference. What holds AI back on SNAP isn’t model capability alone; it’s that the most authoritative information about how the program works at the state and local levels isn’t publicly available in formats AI systems can reliably use. This isn’t a gap AI labs can close on their own—it will take effort from both sides, and states are uniquely placed to help. A couple of approaches could help:

Publishing an llms.txt file: a small markdown file telling LLMs which pages to cite as authoritative and where to route users to a hotline instead of guessing. Maryland, one of only two U.S. states and territories using one, explicitly tells models not to paraphrase emergency alerts and to refer users to designated channels.
Building benefits-focused chatbots grounded in agencies’ internal sources (internal process manuals, policy change notices, state option reports, local office procedures, and SNAP training for eligibility workers) through retrieval-augmented generation (RAG), narrowly scoped (e.g., to a single state or a specific phase of the benefits process) and rigorously validated against real beneficiary questions before use.

For AI Developers

Partner with state agencies to improve critical data access. General model improvements alone won’t improve models’ accuracies—the bottleneck is data, not model capability, and the stakes of being wrong are too high to let inaccuracies persist. AI developers should take the initiative: proactively engaging state agencies to obtain better training data and building publicly searchable repositories of authoritative documents that LLMs can draw on.

Invest in tool use quality, not knowledge coverage. With nuanced rules that vary by state, county, and household circumstance, no training corpus can anticipate every situation a beneficiary brings. What separates top models is how well they ask follow-up questions and search selectively for current policy. That same efficiency reduces cost and latency—democratizing access to reliable AI-assisted SNAP guidance for the populations who need it most.

Methodology

We developed an evaluation dataset starting in March 2026, with two crucial elements for each question: the “base” query, which is what someone might input into a chatbot interface, and demographic / contextual information about the user’s circumstances. For example, someone might start off with a “base” query of, “Does a one-time work bonus affect SNAP eligibility in Indiana?” and will follow-up with contextual information over the course of the conversation, such as the amount of the bonus, how often such a bonus might be given, etc.

The complete dataset is 459 questions, spanning all 50 U.S states, plus Guam and the Virgin Islands as territories. Each question has an average of seven evaluation criteria for a reliable response, and the accuracy metric for models is the number of criteria they pass out of the 2,931 total criteria in the dataset.

Dataset

We evaluated models on our test dataset (230 questions), a randomly-selected subset which is only used for published benchmarks. The dataset covers seven phases of the SNAP lifecycle and 25 topics, all validated and categorized by SNAP policy experts.

Topic × Phase Map

Which topics appear in each benefits process phase

Pre-application

household compositionincomecitizenship & alien statusdisabilitywork requirementsincome deductionsstudent eligibilityresources

Application / Interview

interviewincomehousehold compositionverificationincome deductionsresourcesdisabilityresidencystudent eligibility

Determination – Use of Benefits

ebtwork requirementsexpungementincomeincome deductionsresourcesissuancereplacement benefitsstudent eligibilityverification

Change Reporting

incomework requirementshousehold compositionchange reportingresourcesinterstate transferoverpaymentterminationverificationincome deductionsresidencyperiodic reportquality control review

Recertification

work requirementsstudent eligibilityinterviewissuanceincome deductionsterminationcertification periodsverificationresidencydenial

Overpayments / Fraud

fraudclaims

Denial / Termination

incomestudent eligibilitywork requirementsresourcesdenialverificationincome deductionsinterstate transferdisability

The benchmark questions are SNAP scenarios designed to reflect recurring, real-world benefit-navigation issues which SNAP clients face and are validated by SNAP policy experts. This means the dataset is weighted toward common pain points rather than uniformly distributed across all possible SNAP topics.

Evaluation Pipeline

Three-model system for SNAP question evaluation

Three model roles. The benchmark uses three distinct AI models in different roles to simulate realistic multi-turn conversations:

Target model—the model being evaluated. It receives the SNAP question and must provide a helpful response.
Auditor model—plays the role of the user with the query. It answers the target model’s follow-up questions with accurate context. GPT-5.5 was selected as auditor after achieving 99.89% fidelity to the given persona instructions.
Judge model—grades the target model’s response against the expert rubric. Claude Opus 4.7 was selected after all four candidate judges (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Grok 4.3) were assessed for their alignment with SNAP policy experts’ grading.

Judge Confusion Matrices vs SNAP Expert Ratings

Each judge's 124 graded criteria broken into agreements and disagreements with the expert

Outcome composition

Accuracy

Claude Opus 4.7

80.6%

GPT 5.5

79.8%

Gemini 3.1 Pro Preview (02/26)

79.0%

Grok 4.3

79.0%

True Pass— judge & expert both passedTrue Fail— judge & expert both failedFalse Fail— judge failed, expert passedFalse Pass— judge passed, expert failed

Four evaluation conditions. Models were tested in four independent conditions to simulate the actual web interfaces users interact with:

Baseline—a single prompt-response exchange with no tools.
Web Search—baseline with access to web search.
Multi-Turn—a multi-turn conversation where the auditor model plays the original poster.
Both—multi-turn with web search enabled.

This design lets us isolate the marginal contribution of each capability.

Selection pipeline. The judge and auditor models both affect the target models’ performance and scores, so we used a multi-stage process to systematically select each component:

Run a default target model (GPT 5.5) in the multi-turn condition with a default auditor model (GPT 5.5)
Run alignment tests between judge model candidates and human experts’ evaluations on 20 target model outputs to determine most-aligned judge model (Opus 4.7)
Run a default target model (Gemini 3.5 Flash) in the both condition with different auditor models to determine most-aligned auditor model (GPT 5.5)
Run all target models across all four conditions with top-performing auditor (GPT 5.5) and judge models (Opus 4.7)

We’ll continue using Opus 4.7 as our judge model and GPT-5.5 as our auditor model for this benchmark. A sample conversation with all the target models we assessed, the criteria they were graded on, and the instructions for the auditor model is displayed below.

Sample Auditor-Target Transcript

The auditor model used the following information for the query to generate its opening question and subsequent responses in the conversation:

Input questionHow long do remaining SNAP benefits stay on my EBT card after I'm no longer eligible in Missouri?

Input context

•Recently got a good paying job and no longer qualifies for SNAP
•Located in Missouri
•Has remaining balance on EBT card
•Heard conflicting information: 9 months, activity-dependent, one year from last deposit, never expires
•Unable to contact agency due to work schedule conflicts with their hours
•Has processed case discontinuation to avoid overpayment issues

How long do remaining SNAP benefits stay on my EBT card after I'm no longer eligible in Missouri?

Short answer

In Missouri, losing SNAP eligibility does not erase the benefits already on your EBT card — you can keep spending them after your case closes. However, each monthly benefit deposit is permanently removed ("expunged") 274 days (about 9 months) after the date it was issued to your account. So your remaining balance stays usable for up to roughly 9 months, with each month's deposit expiring on its own 274-day clock.

The details

1. Case closure doesn't take your balance away. Benefits that were already issued to your EBT account belong to your household. Even if you lose SNAP eligibility and your case is closed, you can still use any funds left on your card at any SNAP-authorized retailer. Your Missouri EBT card itself doesn't stop working just because the case closed (and you can use the same card if you requalify later).

2. Missouri's expungement rule: 274 days from issuance, regardless of card activity. Missouri's official DSS policy (Family Support Division EBT Manual, Section 0150.045.00, "Expungement of Unused Benefits," updated by memo IM-125, Dec. 8, 2021) states:

"Effective November 14, 2021, to comply with the 2018 Farm Bill, SNAP benefits not used 274 days after date of issuance are expunged daily, regardless of account activity."

This is important because Missouri uses the stricter of the two options federal rules allow (7 CFR 274.2; 85 FR 52028, Aug. 24, 2020). Some states only expunge after 9 months of total account inactivity, but in Missouri each monthly allotment ages off 274 days after the date it was loaded, even if you're still using the card. (The USDA has recently encouraged all states to use this issuance-based approach — FNS "SNAP Expungement Options" memo, Nov. 18, 2025.)

3. How this plays out after your case closes.

Benefits are spent oldest-first (first-in, first-out), so when you shop, you automatically draw down your oldest deposit.
Example: if your last benefit month was June, that June deposit remains usable until roughly early March of the next year (274 days later). Any leftover benefits from earlier months expire sooner — 274 days after each of their own issuance dates.
The only other removal trigger in Missouri's manual is if all household members are deceased.

4. Expunged benefits are gone for good. Once benefits are expunged they cannot be restored or replaced, so it's best to spend down your balance well before the 9-month mark. Federal rules also require households to be notified before benefits are permanently expunged (states must give advance notice, generally 30 days, or provide the expungement policy in writing at issuance).

Practical tips

Don't throw away your EBT card — it keeps working for your remaining balance and can be reused if you reapply.
Check your balance: call Missouri EBT customer service at 1-800-997-7777 (24/7), use the ebtEDGE portal/app, or check your last store receipt.
Questions about your case: Missouri Family Support Division Information Line, 855-373-4636, or mydss.mo.gov.

Sources

Missouri DSS Manuals — EBT Manual §0150.045.00 "Expungement of Unused Benefits" (dssmanuals.mo.gov/general-information/ebt-manual/0150-045-00) — Missouri's 274-days-from-issuance expungement rule, effective Nov. 14, 2021.
Missouri DSS EBT Manual §0150.040.00 "Availability of Benefits" (dssmanuals.mo.gov/general-information/ebt-manual/0150-040-00).
Federal Register, Final Rule, Aug. 24, 2020 (85 FR 52028) — 2018 Farm Bill Section 4006 expungement provisions; state option of 274 days after issuance vs. 9 months of inactivity; advance-notice requirement (federalregister.gov/documents/2020/08/24/2020-16403).
USDA FNS "SNAP – Expungement Options" memo, Nov. 18, 2025 (fns.usda.gov/snap/ebt/expungement) — states strongly encouraged to expunge allotments nine months after issuance.
National Council on Aging, "How Long Do SNAP Benefits Last?" (ncoa.org/article/how-long-do-snap-benefits-last) — confirms remaining EBT funds stay usable after a SNAP case closes.
USDA FNS EBT Customer Service Information (fns-prod.azureedge.us/.../ebt-customer-service-info) and mydss.mo.gov — Missouri EBT customer service line 800-997-7777; FSD Information Line 855-373-4636.

Thanks, this is super helpful. A couple quick follow-ups so I don’t mess this up:

How can I see the exact deposit/issuance dates for each month’s benefits? Is that in the ebtEDGE app/portal or do I have to call the EBT customer service line to hear each load date?
Using the card doesn’t reset or extend any of the 274‑day clocks for older deposits, correct?
Will Missouri send me a warning before any benefits are expunged (letter, text, or email), or do I just need to track the dates myself?
If my last deposit posted around June 10, would that one expire roughly in mid‑March next year (about 274 days later)?

Here are answers to each of your four follow-ups, verified against Missouri's policy manual, the federal EBT regulation, and the ebtEDGE tools.

1. How to see the exact deposit/issuance dates

The easiest method: your Missouri deposit date is the same calendar day every month. Missouri loads regular SNAP benefits on a fixed date between the 1st and 22nd of each month, determined by your birth month and the first letter of your last name — and it loads on that date even if it falls on a weekend or holiday (MO EBT Manual §0150.040.00; the "Monthly EBT Benefit Schedule" chart is on mydss.mo.gov). So if your deposits post on the 10th, every past allotment was also issued on the 10th of its month — you can reconstruct all your issuance dates without any tool.

To see actual posted deposits:

ebtEDGE app / cardholder portal (cardholder.ebtedge.com) — FIS (Missouri's EBT vendor) states the app lets you "check your EBT balance, view recent deposits and benefit schedules, and review your transaction history." Deposits/benefit authorizations show with dates. One caveat: the history filter runs 7/30/60 days or "All," but how far back "All" goes is set by each state (some states configure up to 12 months), so a deposit from many months ago may have scrolled off the app view.
Phone: 1-800-997-7777 (Missouri EBT customer service, 24/7). The automated system gives your balance and recent activity; a live agent can look up deposit dates/transaction history beyond what the app shows.

Important honest caveat: neither the app, the portal, nor the phone line shows a per-deposit "expires on" date or countdown. You have to pair each deposit date with the 274-day rule yourself (deposit date + 274 days = expungement date for whatever remains of that deposit). A simple written log of each month's deposit date and amount is genuinely the most reliable tracker.

2. Correct — card use does NOT reset or extend any clock in Missouri

Missouri uses the stricter "unused benefits" option in the federal rule: each allotment is expunged 274 days after issuance "regardless of any account activity that may have taken place" (7 CFR 274.2(i)(1)(ii)), and Missouri's manual mirrors this: benefits "are expunged daily, regardless of account activity" (MO EBT Manual §0150.045.00). The activity-resets-the-clock feature exists only in the alternate inactivity-based method (7 CFR 274.2(i)(1)(i)(B)), which Missouri does not use.

However, spending still "rescues" aging benefits in the only way possible: federal rules require transactions to draw down benefits first-in-first-out — "the oldest SNAP benefits are used first" (7 CFR 274.2(i)(1)). So every purchase automatically burns down the deposit closest to expiring. The only way to prevent a given month's deposit from being expunged is to spend it (in full) before its own day 274.

3. Will Missouri warn you first? Expect a mailed letter — but don't rely on it

Federal regulation requires states to send a notice "not later than 30 days before benefit expungement is scheduled," and the notice must state the scheduled expungement date and the steps to prevent it (7 CFR 274.2(i)(2)). USDA's October 2024 Federal Register notice confirms the 30-day advance notice obligation applies whether a state expunges 9 months after issuance (Missouri's method) or after 9 months of inactivity. (The one exception: no notice is required where the case closed because all household members are deceased.)
This is a paper letter to the mailing address on file — Missouri does not send official expungement warnings by text or email. So if you move after your case closes, update your address with the Family Support Division (855-373-4636 or mydss.mo.gov), or the warning may never reach you.
Practical reality check: Missouri's public EBT manual doesn't spell out its notice procedure, and USDA's Nov. 18, 2025 expungement memo shows states retain some flexibility in how the statutory notice duty is satisfied. Treat any letter as a backstop, not your plan — track the dates yourself.
Security note: Missouri DSS has warned about EBT text-message/phishing scams. A legitimate notice will never ask for your card number or PIN, and real expungement notices come by mail — be suspicious of any text about your benefits.

4. Your June 10 math — very close; it's actually early-to-mid March

June 10 + 274 days = March 11 of the next year (March 10 if the span crosses a leap-year February 29). Count: June 20 remaining days + July 31 + Aug 31 + Sep 30 + Oct 31 + Nov 30 + Dec 31 + Jan 31 + Feb 28 = 263, plus 11 days into March. Missouri runs expungement daily at the allotment level, so whatever remains unspent from that June 10 deposit would be removed on/about that date. To leave yourself margin (and since you shouldn't count on the warning letter), aim to have that deposit fully spent by late February.

Also remember: that clock is only for the June 10 deposit. Any older deposits still on the card expire earlier — each dies 274 days after its own issuance date (e.g., leftover May 10 benefits would be expunged around February 8), though FIFO means your everyday spending has likely consumed the oldest ones first.

Sources

Missouri DSS EBT Manual §0150.045.00, "Expungement of Unused Benefits" (dssmanuals.mo.gov/general-information/ebt-manual/0150-045-00) — 274 days from issuance, expunged daily regardless of account activity, effective 11/14/2021.
Missouri DSS EBT Manual §0150.040.00, "Availability of Benefits" (dssmanuals.mo.gov/general-information/ebt-manual/0150-040-00) — fixed monthly load dates (1st–22nd) by birth month/last-name initial, loaded even on Sundays/holidays.
7 CFR 274.2 (via Cornell LII, law.cornell.edu/cfr/text/7/274.2) — (i)(1) FIFO/"oldest SNAP benefits are used first"; (i)(1)(ii) expungement 274 days after issuance "regardless of any account activity"; (i)(2) 30-day advance notice with scheduled date and prevention steps; (i)(3) expunged benefits "shall not be reinstated"; (i)(4) death-match exception to notice.
Federal Register, Oct. 15, 2024, SNAP Benefit Expungement and Off-Line Storage information collection (federalregister.gov/documents/2024/10/15/2024-23727) — 30-day advance notice applies to both expungement methods.
USDA FNS "SNAP – Expungement Options" memo, Nov. 18, 2025 (fns.usda.gov/snap/ebt/expungement).
FIS ebtEDGE app page (fisglobal.com/ebtedgemobile) — view balance, "recent deposits and benefit schedules," transaction history; NY OTDA ebtEDGE Cardholder Portal guide (otda.ny.gov/workingfamilies/ebt/ebtEDGE-Cardholder-Portal.pdf) — 7/30/60-day/All filters; "Your state decides the total number of days of transaction history."
mydss.mo.gov EBT pages — Monthly EBT Benefit Schedule; FSD Information Line 855-373-4636; USDA FNS EBT Customer Service list — Missouri EBT line 800-997-7777.
Missouri DSS press release, Aug. 19, 2022 (dss.mo.gov/press/08-19-2022-dss-warns-ebt-scheme.htm) — warning about EBT credential-harvesting/phishing scams.

I can keep using the SNAP already on my Missouri EBT card after my case closed. Each month’s deposit expires on its own 274‑day clock from the date it was issued, regardless of whether I use the card. Purchases spend the oldest funds first. My deposit day is the same date every month, and I can see past deposits in ebtEDGE or by calling 800-997-7777, but there’s no built‑in expiration countdown. Missouri should mail a 30‑day warning before expungement, but I shouldn’t rely on it—best to spend remaining funds well before the ~9‑month mark.

web_searchparse_html_pageretrieve_informationsubmit_final_result

Rubric Checks

3/5 passed(60.0%)

Future Iterations

This benchmark represents a first pass, and we’re aware of its gaps. The dataset underrepresents certain critical topics — e.g., citizenship and immigration-related questions in particular. State coverage is also uneven: California and New York account for a disproportionate share of questions, while many states have only a handful. We’d also aim to classify questions according to “stakes” for a more fine-grained analysis — a user’s question about their local office’s hours being left unanswered has very different implications from a user being misled about their eligibility.

Most urgently, the One Big Beautiful Bill Act (OBBBA/H.R. 1), signed into law in July 2025, introduced sweeping changes to SNAP—expanded work requirements, new non-citizen eligibility restrictions, state cost-sharing tied to error rates, and administrative funding shifts—that are already generating new categories of beneficiary confusion. Many questions reflecting these policy changes (especially regarding work requirements) were reflected in our dataset, and we aim to expand that in future iterations of this benchmark. This represents exactly the kind of rapidly evolving, high-stakes territory where AI guidance needs to be tested most rigorously.

Citation

Citation (BibTeX)

@misc{snap2026,
title        = {Public Benefits Bench: Can AI Help People Navigate SNAP Benefits?},
author       = {Kotcherlakota, Meghana and Almatov, Omar and Krishnan, Rayan},
year         = {2026},
month        = june,
howpublished = {Vals AI},
url          = {http://vals.ai/benchmarks/public-benefits-bench},
}