vwadmin
May 28, 2026

By Vaughn L. Woods, CFP®, MBA

Senior Portfolio Manager | Vaughn Woods Financial Group, Inc.

Executive Summary

Every time someone asks an artificial intelligence system a question, something extraordinary happens — and almost no one sees it. The machine does not simply “look up” an answer. It processes language one tiny piece at a time, called a token, and assembles meaning from those pieces faster than any human mind could follow. What that process costs — in electricity, in hardware, in capital — is quietly reshaping the global economy. Understanding context, and what it takes to build it computationally, is rapidly becoming one of the most consequential ideas in both technology and investment strategy. For wealth management clients and advisors alike, the token economy is not an abstraction. It is the infrastructure beneath the next generation of asset creation.

The Word That Runs the World

Consider the sentence: “The bank was steep.”

Is “bank” a place to keep money? Or the edge of a river?

A seven-year-old decides instantly. Why? Because that child’s brain uses surrounding words, lived experience, and memory to fill in the meaning. The word “steep” points toward a riverbank, not a savings account. That process — using surrounding information to determine what something means — is called context.

Artificial intelligence systems face the same challenge, but at a scale almost impossible to imagine. Every word fed into an AI must be broken into tokens — roughly three-quarters of an English word each — before the system can process meaning. A single sentence contains about twenty tokens. A detailed legal contract, a financial plan, or a research report could run fifty thousand tokens or more. And every single one of those tokens must be paid for, processed, and held in computational memory simultaneously.

This is not a footnote to the AI revolution. It is the revolution.

The Brain Does It Too — Just Differently

In 2005, I began a thesis on Context, published in 2007, examining different human intelligence models and how they could be accommodated within a unified framework. My central conclusion was this: the human brain represents the world ten to one hundred times every second in what can be described as a neurobiological global-overlay-of-events. Call it worldview. Call it learning events. Call it an anticipatory schema, an integrated meaning field, a valence weighting map, a narrative compression engine — the naming matters less than the mechanism.

Think of it. Every fraction of a second, the brain assembles a fresh snapshot of reality, blending sensory input, memory, emotion, culture, and language into a single coherent experience. Refresh. Refresh. Refresh. That is what I called context in 2005. It is not a feature of cognition. It is the operating environment within which all cognition occurs.

By 2006, I was interviewing doctoral researchers on the topic — neuroscientists at UC San Diego, Vanderbilt University Brain Institute, and the UC Davis Center for Neuroscience, each engaged in fascinating computational research at the frontier of brain science. None of them, at the time, saw meaning-making as a tractable scientific variable. Qualitative expressions of human intelligence — the felt sense of understanding, the will behind a decision, the weight of a memory — could not be measured. No metric existed.

But that was then. Today, there is a metric. It is measured in tokens.

The Price of Understanding

Three years ago, processing one million tokens through a leading AI system cost approximately thirty dollars. Today, that same level of performance costs under one dollar — a collapse of more than thirty times in roughly thirty months. Epoch AI data shows that inference price declines range from nine times to nine hundred times per year depending on the task category, with a median rate of fifty times per year. Gartner projects that inference costs will fall more than ninety percent further by 2030.

But there is a critical twist that every investor must understand. Even as the cost per token falls, total AI expenditure is rising — because each task now requires vastly more tokens than before. Asking AI “What is the capital of France?” might use thirty tokens. Asking AI to review a financial plan, cross-reference current tax law across multiple jurisdictions, compare three portfolio strategies, and produce a recommendation memo might use three hundred thousand tokens — ten thousand times more. As Gartner framed it directly: “It will unlock higher-value applications. Those applications are going to be more expensive, not less.”

This is the defining paradox of the token economy: deflation at the unit level, inflation at the application level. Understanding which layer you are investing in determines whether you are positioned in a margin-compressing commodity or a durable, high-powered growth engine.

What Makes Context So Hard to Build

When an AI processes a long document, it does not read it line by line the way a human skims a report. It loads every token into computational working memory — a context window — and examines the relationship between every token and every other token simultaneously. For a document containing one hundred thousand tokens, that means evaluating ten billion token-to-token relationships in a fraction of a second.

This requires memory technology capable of moving data at speeds measured in terabytes per second. The chip architecture that makes this possible — High Bandwidth Memory, stacked directly on top of processing units — is engineered so that data can travel almost instantaneously between storage and compute. Without it, building context at scale would be physically impossible. The context window is not software. It is a hardware problem masquerading as a software feature.

The Token Tier Economy

By 2026, a tiered pricing structure for AI inference has emerged, mirroring with striking precision how airlines price seats or how financial services price execution:

Tier	Speed	Cost per 1 M Tokens
Free/Batch	Hours	~$0
Standard	Minutes	~$0.30 – $3
High Priority	Fast	~$5 – $15
Premium/Agentic	Real-Time	~$25 – $180

The economic logic at the infrastructure layer is equally striking. A single server rack costing five million dollars in hardware is projected to generate approximately seventy-five million dollars in token revenue over its operational lifetime — a fifteen-to-one return on invested capital. This is why the world’s largest technology companies continue spending hundreds of billions on compute infrastructure despite geopolitical headwinds, supply chain constraints, and macroeconomic uncertainty. The return on context is exceptional.

The Prefill Problem — and Its Solution

Before an AI generates its first word in response to a query, it must process every token in the input. This phase — called prefill — can consume eighty percent or more of total compute time for long-context tasks. It is the computational equivalent of a senior advisor reading a five-hundred-page prospectus from cover to cover before answering a single client question. The work is real. The cost is real. And it compounds with every new query on the same document.

The next generation of chip architecture is being engineered specifically to attack this bottleneck. An emerging solution — the token warehouse — stores previously processed context so the AI does not re-read the same document with every new interaction. This innovation alone could reduce inference costs on long-context tasks by fifty percent or more, while dramatically compressing response latency for enterprise and agentic applications.

From Neuroscience to the Data Center

The human brain does something remarkably parallel: it maintains a working model of the world and updates it in real time, ten to one hundred times per second, without re-reading every book it has ever encountered. It draws from compressed, continuously updated context — what neuroscientists describe as working memory and predictive processing — fluidly and without interruption.

Artificial intelligence is building toward exactly this capability: persistent, compressed, context-aware memory that updates over time rather than resetting with every query. The token warehouse is, in a meaningful sense, a machine’s first approximation of human working memory. These systems are no longer calculators running faster. They are beginning to approximate the architecture of contextual meaning-making that defines biological intelligence — the same architecture I was attempting to describe two decades ago when I wrote about the neurobiological global-overlay-of-events.

The difference, and it is a profound one, is this: the human brain builds context biologically — running on glucose, sleep, and twenty years of embodied development. Artificial intelligence builds context computationally — running on silicon, electricity, high-bandwidth memory, and data centers located on the other side of the world. Every token processed is an energy expenditure. Every long-context query is an infrastructure event. There is no free meaning. There never was.

What This Means for Investment Strategy

The token framework provides a clear and actionable investment map. The deflationary layer — commodity inference, simple queries, batch processing — compresses margins for any firm selling basic AI access as a standalone product. The premium layer — agentic AI, long-context reasoning, real-time enterprise applications, multi-step financial analysis — is where pricing power, revenue durability, and growth velocity converge.

The infrastructure supporting that premium layer — high-bandwidth memory chips, next-generation processing units, the networking fabric connecting distributed compute — represents the durable, capital-intensive value creation in this economy. The historical pattern is unmistakable. When railroads were built, the land corridors they traversed became permanently more valuable. When electrical grids were deployed, the utilities that owned transmission infrastructure captured decades of pricing power. When the internet was wired, the fiber and routing backbone — not the content riding it — was the enduring asset.

The firms building and supplying compute infrastructure for context-hungry AI are positioned at a chokepoint between near-unlimited demand for meaning-making and the finite physical resources required to deliver it. That chokepoint is where durable margin lives. That is where long-term capital should be attentive.

Context is not free. For the human brain, it runs on glucose, rest, and decades of accumulated experience. For artificial intelligence, it runs on silicon, electricity, and precision-engineered memory chips manufactured in a handful of facilities worldwide. Understanding who builds, owns, and profits from delivering computational context at scale may be one of the most important investment insights of the next decade.

Take the Next Step

If you are a high-net-worth individual, business owner, or institution navigating the investment implications of the AI infrastructure buildout, this analysis is the beginning of the conversation — not the end of it.

At Vaughn Woods Financial Group, we work with clients who want their portfolios to be as sophisticated as their thinking. We offer fee-only, fiduciary wealth management grounded in deep market research, macroeconomic analysis, and a long history of identifying structural investment themes before they become consensus.

If the ideas in this paper resonate with how you think about capital allocation, we invite you to schedule a complimentary strategy consultation.

📧 vw@vaughnwoods.com

📞 858-245-2445

🌐 vaughnwoods.com

📍 2226 Avenida De La Playa, La Jolla, CA 92037

The price of meaning is rising. The question is whether your portfolio is positioned to collect it.

Vaughn L. Woods, CFP®, MBA is Senior Portfolio Manager and founder of Vaughn Woods Financial Group, Inc., is based in La Jolla, California. His 2007 thesis on context and human intelligence models is held in the UC San Diego Geisel Library Research Collections.

This article is for informational purposes only and does not constitute investment advice. Past performance is not indicative of future results.

Sources: APA Level 7

Anthropic. (2024). Claude 3 technical report: Long-context capabilities and performance benchmarks.

Bommasani, R., et al. (2021). On the opportunities and risks of foundation models. Stanford Center for Research on Foundation Models.

Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W. W. Norton & Company.

Epoch AI. (2025). LLM inference price trends: How frontier model costs have evolved since 2023.

Gartner, Inc. (2026, March). AI inference costs set to plunge 90 percent by 2030 [Research report]. https://www.gartner.com

NVIDIA Corporation. (2026, April 14). NVIDIA platform delivers lowest token cost enabled by extreme co-design

Patterson, D., et al. (2021). Carbon considerations for large AI models. Communications of the ACM, 64(11), 58–68.

Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

Woods, V. L. (2006). The neurobiological global congruence-overlay-of-events and models of human decision-making [Undergraduate thesis, Point Loma Nazarene University]. UC San Diego Geisel Library Research Collections.

This article is for general informational purposes only and should not be considered legal, tax, or individualized investment advice. Trustees and families should consult their estate attorney, CPA, and qualified financial professional before acting.

Disclosures

Vaughn Woods, CFP®, MBA is President and Founder of Vaughn Woods Financial Group, Inc., an Investment Advisor Representative of Bolton Global Capital, Inc. Client assets are held in custody through Pershing LLC, a subsidiary of Bank of New York Mellon. This article is for informational purposes only and does not constitute personalized investment or tax advice.

We are unable to accept orders via email. If you wish to place an order, please consult your registered representative or contact the home office trading desk at (800) 649-4554.

This email system is for business purposes only and any information, including attachments, transmitted in this email is not confidential. Any message may be reviewed by authorized compliance personnel and/or produced to regulatory agencies or others with a legal right to access such information.

Past investment performance is not indicative of future results. Securities offered through Bolton Global Capital, Inc., Bolton, MA. Member FINRA, SIPC. Advisory services offered through Bolton Global Asset Management, a registered investment advisor, 579 Main St., Bolton, MA 01740 (978) 779-5361.

Investors should be aware that there are risks inherent in all investments such as fluctuations in investment principal. Past performance is not a guarantee of future results. Asset allocation cannot assure a profit nor protect against loss. Although the information has been gathered from sources believed to be reliable, it cannot be guaranteed. Views expressed in this newsletter are those of Vaughn Woods and Vaughn Woods Financial Group and may not reflect the views of Bolton Global Capital or Bolton Global Asset Management. The information provided is for general informational purposes only and should not be considered individual recommendation or personalized investment advice. Representatives and Advisors of Vaughn Woods Financial Group are not tax or legal professionals, if you need tax or legal advice, please make sure to consult a tax professional/CPA and/or a lawyer.

VW1/VWA0376

The Price of Meaning: What Every AI Answer Actually Costs

Vaughn Woods
Financial Group

Actions

Notice