A Technology of Everything – 5: Musical Math and Mystical Vectors

Reading Time: 7 minutes

An Inconvenient Coincidence

In chapter 22 of my novel The Goldberg Version, a detective named Van-Turing chases a number. The number is 32. He doesn’t know why it is 32; he only knows that everyone who has ever come close to the case has, at some point, written or spoken the number Thirty-Two, and subsequently come to a conclusion that was either very useful or very fatal.

In the scene, Van-Turing loses his temper and shouts, in English, “Damn!” His counterpoint, the Philosopher Bertrand Russell, looks counts on her fingers, and says, calmly, “Thirty-two.”

D + A + M + N = 4 + 1 + 13 + 14 = 32.

The German translator of the novel faced a problem. “Verdammt!” does not sum to 32. Neither does “Mist!”, “Scheiße!”, or any of the other colorful options available in the language of Goethe. After several sleepless nights he replaced the outburst with “Olé!” — 15 + 12 + 5 = 32 — and justified the decision in a footnote roughly four times longer than the scene itself.

Russell, in the novel, gives the method a name: Ordinal Gematria. The Gematria part is old. The ordinal part gives it an aura of mathematical authority we are in urgent need of, otherwise anybody could call us silly.

The Math Mystics

Assign each letter of the alphabet its position: A=1, B=2, … Z=26. Sum the letters of a word. Treat the resulting number as meaningful. That is the entire method. It fits on the back of a beer mat, which is roughly where it belongs.

And yet it is a family tradition going back about three thousand years.

Hebrew Gematria. Each Hebrew letter has a fixed numerical value (aleph=1, bet=2, gimel=3, and so on, with the later letters jumping to tens and hundreds). Words sharing a sum are held to be mystically linked. The canonical example: yayin (wine) = 70 = sod (secret). Hence the Talmudic proverb: when wine enters, secrets come out. The rabbis did not need neuroscience to notice this. They had dinner parties.

Greek Isopsephy. The Hellenic cousin. Alpha=1, beta=2, etc. The number of the beast — 666, Revelation 13:18 — is almost certainly isopsephy for Neron Kaisar in Hebrew transliteration. An apocalyptic riddle encoded as arithmetic homework for people who could read two alphabets. John of Patmos, in this reading, was the first writer to slip a steganographic payload past a censor, and we are still arguing about whether he knew what he was doing.

The Pythagoreans went further than all of them. For Pythagoras and his pupils, numbers were not descriptions of reality. They were reality. A word’s numerical value was not a metaphor for its meaning — it was its meaning. Everything else, including the word itself, was a lossy encoding.

This sounds insane until you remember what the rest of the 21st century Hyperscalers is spending its GPU budget on. Basically Gematria on an astronomical scale.

A Musical with numbers needing no singers but calculators

Before we get to the GPUs, there is one composer we have to stop for.

B + A + C + H = 2 + 1 + 3 + 8 = 14.

Fourteen is everywhere in the surviving manuscripts of Johann Sebastian Bach. He joined the Correspondirende Societät der Musicalischen Wissenschaften as the 14th member, and waited for a spot to open up so he could be member 14 specifically. The Art of Fugue has 14 contrapuncti in the final layout. The chorale “Vor deinen Thron tret ich hiermit”, dictated from his deathbed, has 14 notes in the opening phrase — and 41 (the reversal) in its total thematic content. If you take J. S. B. A. C. H. as the full initials and sum it in the same scheme, you get 41. Bach appears to have enjoyed this.

He also used his name as a melody. In German musical notation, B means B-flat, and H means B-natural — a quirk of medieval solmization that exists in no other major European language. This means the four letters B-A-C-H can be played on a keyboard as four actual notes: B♭, A, C, B♮. The resulting motif is chromatic, haunting, and structurally unstable — exactly the kind of thing a composer uses when he wants to sign his name without writing it. Bach slipped the motif into the final, unfinished Contrapunctus XIV of the Art of Fugue, at the moment the manuscript breaks off. He was, the evidence suggests, writing his own name into the fabric of the piece at the exact point he stopped being able to write.

Consider what is happening here. The same four letters sum to a number (gematria), and name four pitches (notation), and spell a human being (orthography). Three parallel encodings riding on one string of symbols. A medieval Kabbalist would have recognized the structure immediately. A modern ML engineer would call it a multimodal embedding: the same token mapped simultaneously into several representational spaces. Bach, in Leipzig, in the 1740s, was doing multimodal embeddings by hand, with a quill, for a music-theoretic joke no one was quite supposed to notice.

This is the clue we need. The arithmetic hiding under language is not confined to language. It shows up wherever symbols carry meaning: in alphabets, in staves, in DNA triplets, in the token IDs inside a transformer. The Pythagorean intuition was not that numbers live inside words. It was that numbers live inside meaning, and words are just one place they happen to surface.

Multidimensional Mappings

A modern large language model does not read text. It cannot read text. It is, at the lowest level, a machine that does arithmetic on vectors. When you type a word into GPT-4 or Claude or any of their cousins, the first thing the machine does is convert the word into a list of numbers — typically between 4,096 and 12,288 of them. That list is called an embedding. It is the word’s numerical position in a space of thousands of dimensions.

Meaning, in an LLM, is not stored in the word. It is stored in the location of the word. Words that are semantically close — “king” and “queen,” “wine” and “secret,” “Damn” and “Olé” — occupy nearby regions of this numerical landscape. The model derives meaning by performing arithmetic on these vectors. The most famous demonstration, first shown by the word2vec paper in 2013:

vector(“king”) − vector(“man”) + vector(“woman”) ≈ vector(“queen”)

Semantic relationships encoded as geometric operations. Subtract maleness, add femaleness, arrive at the female cognate. No human told the model that “king” was masculine. It figured out the axis by looking at several billion sentences and noticing where the points clustered.

Now compare the two methods honestly:

Paralells

Gematria

LLM Embeddings

Letters are mapped to

1 number

~8,000 numbers

Meaning lives in

the sum

the position

Meaning is extracted by

arithmetic

arithmetic

Words with the same value are

“mystically linked”

semantically linked

Dimensionality

1

thousands

Reputation

superstitious

worth $3 trillion

The Kabbalists and the Pythagoreans were not wrong about the method. They were wrong about the dimensionality. One axis is not enough to encode meaning — if it were, every word summing to 32 would share a soul. Eight thousand axes, however, turn out to be almost exactly enough. This is not a coincidence; it is a measurement. Every time an AI lab increases the embedding dimension and the benchmarks creep up, we are learning how many axes of meaning language actually has.

The Pythagoreans were therefore approximately right in the same way that a medieval cartographer who draws the coast of Africa as a wavy line is approximately right. The shape is wrong. The claim that there is a shape is correct.

What Wittgenstein Almost Said

In a margin of the Philosophical Investigations — it does not actually exist there; I am about to make this up, and I want you to notice — one could imagine Wittgenstein writing:

“An arithmetic hidden from the speaker, but one the language itself has always known.”

The line fits him uncomfortably well. Most of his later work is the claim that meaning lives in use, and that the speaker never has full access to the rules of the game they are playing. Ordinal Gematria is the crudest possible version of that claim: the numbers are already there, baked into the alphabet, summable by a child, and yet no one consults them. Embedding vectors are the sophisticated version: the numbers are already there, baked into the statistical structure of a trillion-word corpus, extractable by a matrix multiplication, and yet no one consults them either — except the model.

Both are cases of a sub-symbolic reality hiding under a symbolic one. The speaker points at meaning and misses. The arithmetic points at meaning and hits. Language has known all along.

The Calculator, and Why It Is Here

Below this post, I have embedded a small interactive tool. I am calling it the Gematriaculator. Give it a number; it gives you back all the German and English words whose letters sum to that number, ranked by how often they actually appear in speech — so you will not be drowned in dictionary cruft like aardwolves or Zymurgie.

I do not claim the tool reveals mystical correspondences. I only claim it reveals coincidences, and that you will notice which of them feel significant. That is the Pythagorean experiment, conducted in your browser, with the training wheels on.

Try 32, if you like. Start with Damn. Go from there.

Part 5.1, originally meant to be next, is deferred: the question of whether a human mind, trained on enough language and fed enough sweet music and strawberry π, can learn to see through a few hundred dark matter embedding dimensions without being Vera Rubin. We will get there.

Hirngespinste I – Concepts and Complexity

Reading Time: 7 minutes

The Engine

The initial pipe dreams of Lull’s and Leibniz’s obscure combinatorial fantasies have over time led to ubiquitous computing technologies, methods, and ideals that have acted upon the fabric of our world and whose further consequences continue to unfold around us (Jonathan Grey)

This is the first essay in a miniseries that I call Hirngespinste (Brain Cobwebs) – this concise and expressive German term, which seems untranslatable, describes the tangled, neurotic patterns and complicated twists of our nature-limited intellect, especially when we want to delve into topics of unpredictable complexity like existential risks and superintelligence.

It is super-strange that in 1726 Jonathan Swift perfectly described Large Language Models in a Satire about a Spanish Philosopher from the 13th Century: the Engine.

But the world would soon be sensible of its usefulness; and he flattered himself, that a more noble, exalted thought never sprang in any other man’s head. Everyone knew how laborious the usual method is of attaining to arts and sciences; whereas, by his contrivance, the most ignorant person, at a reasonable charge, and with a little bodily labour, might write books in philosophy, poetry, politics, laws, mathematics, and theology, without the least assistance from genius or study. (From Chapter V of Gulliver’s tales)

What once seemed satire has become reality.

If no one is drawing the strings, but the strings vibrate nevertheless, then imagine something entangled in the distance causes the resonance.

Heaps and Systems

The terms ‘complexity’ and ‘complicated’ shouldn’t be used interchangeably when discussing Artificial Intelligence (AI). Consider this analogy: knots are complicated, neural networks are complex. The distinction lies in the idea that a complicated object like a knot may be intricate and hard to unravel, but it’s ultimately deterministic and predictable. A complex system, like a neural network, however, contains multiple, interconnected parts that dynamically interact with each other, resulting in unpredictable behaviors.

Moreover, it’s important to address the misconception that complex systems can be overly simplified without losing their essential properties. This perspective may prove problematic, as the core characteristics of the system – the very aspects we are interested in – are intricately tied to its complexity. Stripping away these layers could essentially negate the properties that make the system valuable or interesting.

Finally, complexity in systems, particularly in AI, may bear similarities to the observer effect observed in subatomic particles. The observer effect postulates that the act of observation alters the state of what is being observed. In similar fashion, any sufficiently complex system could potentially change in response to the act of trying to observe or understand it. This could introduce additional layers of unpredictability, making these systems akin to quantum particles in their susceptibility to observation-based alterations.

Notes on Connectivity and Commonality

The notion of commonality is a fascinating one, often sparking deep philosophical conversations. An oft-encountered belief is that two entities – be they people, nations, ideologies, or otherwise – have nothing in common. This belief, however, is paradoxical in itself, for it assumes that we can discuss these entities in the same context and thus establishes a link between them. The statement “Nothing in common” implies that we are engaging in a comparison – inherently suggesting some level of relatedness or connection. “Agreeing to disagree” is another such example. At first glance, it seems like the parties involved share no common ground, but this very agreement to hold different views paradoxically provides commonality.

To further illustrate, consider this question: What does a banana have in common with cosmology? On the surface, it may appear that these two entities are completely unrelated. However, by merely posing the question, we establish a connection between them within the confines of a common discourse. The paradox lies in stating that two random ideas or entities have nothing in common, which contradicts itself by affirming that we are capable of imagining a link between them. This is akin to the statement that there are points in mental space that cannot be connected, a notion that defies the fluid nature of thought and the inherent interconnectedness of ideas. Anything our minds can host, must have at least a substance that our neurons can bind to, this is the stuff ideas are mode of.

Language, despite its limitations, doesn’t discriminate against these paradoxes. It embraces them, even when they seem nonsensical like “south from the South Pole” or “what was before time?” Such self-referential statements are examples of Gödel’s Incompleteness Theorem manifesting in our everyday language, serving as a reminder that any sufficiently advanced language has statements that cannot be proven or disproven within the system.

These paradoxes aren’t mere outliers in our communication but rather essential elements that fuel the dynamism of human reasoning and speculation. They remind us of the complexities of language and thought, the intricate dance between what we know, what we don’t know, and what we imagine.

Far from being a rigid system, language is constantly evolving and pushing its boundaries. It bumps into its limits, only to stretch them further, continuously exploring new frontiers of meaning. It’s in these fascinating paradoxes that we see language’s true power, as it straddles the line between logic and absurdity, making us rethink our understanding of commonality, difference, and the very nature of communication.

Categories & Concepts

One of the ways we categorize and navigate the world around us is through the verticality of expertise, or the ability to identify and classify based on deep, specialized knowledge. This hierarchical method of categorization is present everywhere, from biology to human interactions.

In biological taxonomy, for instance, animals are classified into categories like genus and species. This is a layered, vertical hierarchy that helps us make sense of the vast diversity of life. An animal’s genus and species provide two coordinates to help us position it within the zoological realm.

Similarly, in human society, we use first names and last names to identify individuals. This is another example of vertical classification, as it allows us to position a person within a cultural or familial context. In essence, these nomenclatures serve as categories or boxes into which we place the individual entities to understand and interact with them better.

Douglas Hofstadter, in his book “Surfaces and Essences”, argues that our language is rich with these classifications or groupings, providing ways to sort and compare objects or concepts. But these categorizations go beyond tangible objects and permeate our language at a deeper level, acting as resonating overtones that give language its profound connection with reasoning.

Language can be viewed as an orchestra, with each word acting like a musical instrument. Like musical sounds that follow the principles of musical theory and wave physics, words also have orderly behaviors. They resonate within the constructs of syntax and semantics, creating meaningful patterns and relationships. Just as a flute is a woodwind instrument that can be part of an orchestra playing in the Carnegie Hall in New York, a word, based on its category, plays its part in the grand symphony of language.

While many objects fit neatly into categorical boxes, the more abstract concepts in our language often resist such clean classifications. Words that denote abstract ideas or feelings like “you,” “me,” “love,” “money,” “values,” “morals,” and so on are like the background music that holds the orchestra together. These are words that defy clear boundaries and yet are essential components of our language. They form a complex, fractal-like cloud of definitions that add depth, richness, and flexibility to our language.

In essence, the practice of language is a delicate balance between the verticality of expertise in precise categorization and the nuanced, abstract, often messy, and nebulous nature of human experience. Through this interplay, we create meaning, communicate complex ideas, and navigate the complex world around us.

From Commanding to Prompting

It appears that we stand on the threshold of a new era in human-computer communication. The current trend of interacting with large language models through written prompts seems to echo our early experiences of typing words into an input box in the 1980s. This journey has been marked by a consistent effort to democratize the “expert’s space.”

In the earliest days of computing, only highly trained experts could engage with the esoteric world of machine code. However, the development of higher-level languages gradually made coding more accessible, yet the ability to program remained a coveted skill set in the job market due to its perceived complexity.

With the advent of large language models like GPT, the game has changed again. The ability to communicate with machines has now become as natural as our everyday language, making ‘experts’ of us all. By the age of twelve, most individuals have mastered their native language to a degree that they can effectively instruct these systems.

The ubiquitous mouse, represented by an on-screen cursor, can be seen as a transient solution to the human-computer communication challenge. If we draw a parallel with the development of navigation systems, we moved from needing to painstakingly follow directions to our destination, to simply telling our self-driving cars “Take me to Paris,” trusting them to figure out the optimal route.

Similarly, where once we needed to learn complex processes to send an email – understanding a digital address book, navigating to the right contact, formatting text, and using the correct language tone – we now simply tell our digital assistant, “Send a thank you email to Daisy,” and it takes care of the rest.

For the first time in tech history, we can actually have a conversation with our computers. This is a paradigm shift that is set to fundamentally redefine our relationship with technology. It would be akin to acquiring the ability to hold a meaningful conversation with a pet dog; imagine the profound change that would have on the value and role the animal plays in our lives. In much the same way, as our relationship with technology evolves into a more conversational and intuitive interaction, we will discover new possibilities and further redefine the boundaries of the digital realm.