The Vision

There is a concept I’ve heard of that has been referred to in multiple forms, but I think it’s appropriate to call it The Eye or The Vision. This is essentially the way your brain hews to a learned process such that the processing occurs before your conscious awareness. The most obvious example of it is what you are likely doing right now. The words you see on the page are processed as a whole before you would even be able to sound out each letter. It is probable that you could make out the next word with a fair amount of accuracy if you only saw the first letter. 

When you learned to read, you developed The Vision for words, which is something that you were not born with. As a baby, letters were garbled shapes, and words an incomprehensible visual obstruction. When I was learning to read, it seemed comical that I would be able to stare at the confused mess at the page and know what it was saying, but over time, it happened: I entered a mental space where the meaning seamlessly springs automatically whenever I glance at a word.

One way of thinking about culture is that it is a shared inside joke among people who never met each other, and a language is the most pervasive, ubiquitous element of it. Even far away from anything and anyone, it is still carried within your brain, shaping your neurons at every syllable of your inner monologue. It is largely an arbitrary permutation of sounds and grunts that, in anyone who learns it, is inexorably mapped to their model of the world.

That is why puns are classified as humorous, because we are referring to a confusion we all collectively had to disambiguate at some point as our precise sense of the language resolved in our brain. In a pun, you are violating the graduation between words, poking at how two dissimilar concepts are stationed adjacent in lexical space. Reading that sentence just now hopefully makes it clear why puns aren’t really that funny. Nevertheless, being able to recognize them is among the smallest units of camaraderie we have with any speaker of a given language.

There is a common reference I see to “taste” in software engineering. Absent of the need to perform many menial tasks, and many more complex ones with the aid of LLMs and agentic programming tools, the most leverage, or alpha, an individual possesses, is their ability to distinguish good code from bad code, efficient architecture from bad, and direct the many machinations of constructed logic constantly humming towards better ends. 

This “taste” is the Vision for systems, logical structures, and how the technology we’ve built either exploits that or gets around it. It is knowing what just “makes sense" to build, what looks good when looking good has a logical underpinning. As a consequence of it being affixed to the utility of structures that are constantly being reinvented and reformed, to maintain this taste requires a constant immersion in the change landscape of the tools to which your taste is applied.

When you "copy" a snippet of words on your computer, your present awareness that you have something yet to be pasted bears a mental weight on your brain. It is the smallest unit of thought, a pin in your attention that your other thoughts must navigate around and not disturb it too much before you deliver your payload to the desktop. 

This is the base atomic unit of the mental work you perform when programming. You have hundreds of variable concepts, snippets working out their form in your head all suspended by the needs, barriers and shape of the present codebase. This mental exercise is how good code is written, and the process by which people become great engineers. When you deprive yourself of that, you gradually allow that great construct of mental deftness to atrophy. You deprive yourself of your lease on the Vision.

I don't know precisely what the future of work will look like. Undoubtedly much of the tasks performed manually by programmers, engineers and the full breadth of the technological economy will be automated in the coming years. There is a choice though, to keep the Vision or not, and leave all this great making of the future within clear resolution, or a haze whose complexity gradually exceeds one's view.

“Vibe Coding” as Empathetic Practice

The major criticism of “vibe coding”, or essentially just telling some AI-agent what kind of app/tool/software product you want, is that it provides the user functional code only “to a point”. After this point, where the capability of the model, stretched over an unwieldy breadth of context, hits a wall, and the user is left adrift with a pile of code they possess little understanding of. 

I think this is certainly true at the present moment, but I don’t think even clumsy requests to a model to perform some complex software task is altogether a dead end, it is just more similar to the role of a product manager, or someone commissioning a work of art: who knows what they want, and just needs to evaluate with high specificity until they get it.

I think vibe coding, or working in a somewhat non-deterministic dialogue with a language model, will become an increasingly essential paradigm that will break free from its present “triviality” in the coming months/years. This is because AI models, and the agents they operate, will become an unavoidable force in the world, operating a larger and larger share of the GDP on their autonomous machinations alone (rather than a mere enhancer of existing human knowledge work). 

This is due to put humans in a strange position, as it has been so far with certain specific capabilities of language models, where a great, unknown utility exists “within” the model, but can only be summoned with the right combination of prompt, context, and scaffolding/tool calling that sets it on a fortuitous path. I believe the true nature of this paradigm is lost on many people, whose instincts with respect to interacting with software is affixed to the fundamental nature of how software has been for the last 50 years: deterministic and close-ended.

The strange discontinuity of this emerging reality will only become more acute as the utility of LLM’s exceeds that of human experts in broader and broader domains. From that point on, the most useful skill in the universe is a sort of cognitive empathetic practice: an understanding of how models think, react, and what their true capabilities are in the short breaths of seeming consciousness they are provided in the tokenized range of their context window.