Do Androids scheme eclectic sheets?

Prolog

Imagine a scene in the not-so-distant future. Someone has been murdered. Two investigation teams arrive at the scene, but it is unclear who has jurisdiction. The human team is led by the charismatic detective Sheerluck Holmes, while the android team is led by Bot-OX. The question is: Is the perpetrator human, android, or something in between? Should we expect that the police of the future have established a well-defined procedure or algorithm to decide this quickly?

We will try to answer this and the more pressing issue we are currently facing: Do we have a good chance of coming up with an algorithm that is practical and allows us, by only looking at the crime scene (the generated text), to decide whether a bot or a human created it? Developing such an algorithm is currently one of the most sought-after goals in computer science. A robust Blackbox Algorithm could save most of our academic conventions and allow us to maintain the ways we test children, adolescents, and adults. Without it, these systems will need to be rebuilt at great expense.

In a world where more and more people work and train remotely, it is crucial that we can reliably determine that humans did their intellectual work themselves, which is not the case at the moment. Additionally, with the reach of social media, fake news, images, and videos can have a devastating impact on societal consensus. Such an algorithm—if it exists—is not watertight, but with enough training data, it might even hold up in court.

The outlook is not promising, though. OpenAI abandoned the project within six months: OpenAI Classifier. The practical and monetary value of such an algorithm cannot be overstated. If grabby aliens were to sell it for a trillion dollars, call me—I want in.

Introduction of the Differentiation Test Engine

The task of differentiating between machine-generated text (MGT) and human-generated text (HGT) is remotely related to the original Turing test, the so-called imitation game. There are additional factors: whereas the original Turing Test only allowed for human judges, our differentiation test allows for other machines to assist the human judges. We will call such a machine a Differentiation Test Engine (DTE). It has one purpose and one purpose only: to decide whether a text was generated by a human or a machine.

The first intuition is that such a DTE should be relatively easy to implement. We currently have the technology to detect and identify human faces and voices, which are much more complex and prone to noise than text. The decision of whether a given picture shows a machine or a human is easily made by any current object classifier system. Should it not then be easy to train a Large Language Model (LLM) with 1 trillion human texts and 1 trillion machine texts and let it learn to classify them? The DTE would not be a simple algorithm but its own transformer model specialized in impersonation detection.

In math and computer science, the complexity of a problem is often orthogonal to its description. Most NP-complete problems are deceptively easy to understand, yet millions of computer scientists and mathematicians have struggled to make progress for decades. My guess is that black-boxing attempts will fail in practical application situations.

Ein Bild, das Text, Schrift, Screenshot, Diagramm enthält.

Automatisch generierte Beschreibung

Theoretical Framework

Black-box detection methods are limited to API-level access to LLMs. They rely on collecting text samples from human and machine sources respectively to train a classification model that can be used to discriminate between LLM- and human-generated texts. Black-box detectors work well because current LLM-generated texts often show linguistic or statistical patterns. However, as LLMs evolve and improve, black-box methods are becoming less effective. An alternative is white-box detection. In this scenario, the detector has full access to the LLMs and can control the model’s generation behavior for traceability purposes. In practice, black-box detectors are commonly constructed by external entities, whereas white-box detection is generally carried out by LLM developers.

Defining the Basic Detection System

For practical purposes, we will specify what we should reasonably expect from such a DTE. Given a certain token length input, the algorithm should, with more than 50% confidence within a finite amount of time, give a definite output on how much of a given text is from a human and how much from a machine.

An implementation could be as follows:

Please input your text: …
Please input your required confidence: 0.8
Your text has to be at least 8K tokens long to reach at least an 80% probability of giving the correct answer.
Under the current parameters, the algorithm will run for 5 minutes. Shall I proceed (Y/N)? … Y

The output should then be something like: “I can say with 80% confidence that 95% of the text was written by a machine and 5% by a human.”

Before tackling the details, we should further clarify the possible outcomes when trying to develop such an algorithm:

Such an algorithm is in principle impossible (e.g., it is impossible to create an algorithm that calculates the highest prime number).
Such an algorithm is practically impossible (e.g., it either runs too long or needs more computational power than available; basically, it is NP-complete).
It is undecidable (e.g., it falls under the Halting problem, and we can never say if it will eventually stop).
It is possible but not practical (identical to 2).
It is possible and practical (good enough).

What we would like to end up with is a situation where we can calculate a lower bound of input that will then let us decide with more than 50% probability if it is HGT or MGT.

Falsifiability: Such an algorithm is easily debunked if, for example, we input the text “The sky is blue” and it gives us any other probability than 50%.

Sidenotes on The Obfuscation Engine

Conceptually, we encounter problems should we design a Differentiation Engine (Diff). We then face the following paradox: We want to decide whether our algorithm, Diff (detecting if a human or a machine has written a given input), always stops (gives a definitive answer) and gives a correct answer. Say our algorithm stops and outputs “Human.” We now construct a “pathological” program, Obf (Obfuscator Engine), that uses something like Obf(Diff(input)), which says: Modify the input so that Diff’s answer is inversed (if it results in Machine, it outputs Human). This could be a purely theoretical problem and would require us to understand why the machine is formulating as it does, demanding a lot more mechanistic interpretability competence than we currently possess. At the moment, the complexity of LLMs protects them in real life from such an attack. But if that’s true, it is also highly likely that we lack the knowledge to build a general Differentiator in the first place. These objections might be irrelevant for real-world implementations if we could show that differentiation and obfuscation are sufficiently asymmetric, meaning differentiation is at least 10^x times faster than obfuscation, making it impractical (think how semiprime factoring is much harder than multiplying two primes).

The Profiling System

A crucial aspect of differentiating between human and machine-generated texts is profiling. Profiling involves collecting and analyzing external data to provide context for the text. By understanding the typical characteristics of various types of texts, we can statistically determine the likelihood of a text being human or machine-generated.

For instance, technical documents, creative writing, and casual social media posts each have distinct stylistic and structural features. By building profiles based on these categories, the Differentiation Test Engine (DTE) can make more informed decisions. Additionally, factors such as vocabulary richness, sentence complexity, and topic consistency play a role in profiling. Machine-generated texts often exhibit certain statistical regularities, whereas human texts tend to show more variability and creativity.

Ein Bild, das Schwarzweiß, Treppe, Spirale, Kunst enthält.

Automatisch generierte Beschreibung

The “DNA Trace”

One innovative approach to differentiating between human and machine-generated texts is the concept of a “DNA trace.” This involves analyzing the fundamental building blocks of texts, such as tokens for machines and words for humans. Token-based algorithms focus on patterns and sequences that are characteristic of machine generation, while human-generated texts can be examined through a more holistic word-based approach.

Spectral analysis, a method used to examine the frequency and distribution of elements within a text, can be particularly useful. By applying spectral analysis, we can detect subtle differences in the way machines and humans construct sentences. Machines might follow more rigid and repetitive patterns, whereas humans exhibit a broader range of stylistic nuances.

The Ethical Implications

Examining the ethical implications of developing and using a Differentiation Test Engine is essential. All current GPT systems share a similar artificial “DNA,” meaning that text, image, video, or audio differentiation engines face the same challenges. Deepfakes or content that is machine-generated but mimics human creation pose significant risks to societal trust and authenticity.

As machine-generated content becomes more sophisticated, the potential for misuse grows. Ensuring that these differentiation technologies are transparent and accountable is crucial. There is also a risk that over-reliance on these technologies could lead to new forms of bias and discrimination. Thus, it is imperative to develop ethical guidelines and regulatory frameworks to govern their use.

Technical Solutions

Exploring purely technical solutions to the differentiation problem involves several approaches:

Parallel Web: This concept involves running parallel versions of the internet, one strictly for verified human content and another for mixed content. This segregation could help maintain the integrity of human-generated content.

Special Domains: Creating special domains or zones within the web where content is verified as human-generated can help users trust the authenticity of the information.

Prompt.Claims: Similar to how patents and citations work, this system would allow creators to claim and verify their prompts, adding a layer of accountability and traceability to the content creation process.

Inquisitorial Solutions: We could also imagine a scenario where we interact directly with the artifact (text) to inquire about its origin. Similar to interrogating a suspect, we could recreate the prompt that generated the text. If we can reverse-engineer the original prompt, we might find clues about its generation. This approach hinges on the idea that machine-generated texts are the product of specific prompts, whereas human texts stem from more complex thought processes.

Consequences for Alignment: The challenge of differentiating between human and machine-generated texts ties into broader issues of AI alignment. Ensuring that AI systems align with human values and expectations is paramount. If we cannot reliably differentiate AI-generated content, it undermines our ability to trust and effectively manage these systems. This problem extends to all forms of AI-generated content, making the development of robust differentiation technologies a key component of achieving superalignment.

Conclusion

In conclusion, the task of differentiating between human and machine-generated texts presents significant challenges and implications. The development of a reliable Differentiation Test Engine is intertwined with ethical considerations, technical innovations, and broader AI alignment issues. As we move forward, it is essential to adopt a multidisciplinary approach, integrating insights from computer science, ethics, and regulatory frameworks to navigate this complex landscape.

When exploring the problems we face in building general differentiation engines, we quickly learn that this problem is nested within a wide array of related problems. Adversarial attacks, for example, against image recognition systems, have shown that we consistently overestimate the resilience of these models. It was recently shown that even a medium player could win against a top Go program with the help of another AI that found an exploit: Vice Article.

Thus, it seems very likely that even if we come up with an algorithm that could initially differentiate HGT from MGT, the same program could then be turned on itself to flip the outcome. Another interesting aspect is that all digital computers are Turing machines, which implies that any algorithm developed for differentiation could also be used for obfuscation.

Papers

Hirngespinste I – Concepts and Complexity

Leave a comment August 1, 2023 aiuisensei

Reading Time: 7 minutes

The Engine

The initial pipe dreams of Lull’s and Leibniz’s obscure combinatorial fantasies have over time led to ubiquitous computing technologies, methods, and ideals that have acted upon the fabric of our world and whose further consequences continue to unfold around us (Jonathan Grey)

This is the first essay in a miniseries that I call Hirngespinste (Brain Cobwebs) – this concise and expressive German term, which seems untranslatable, describes the tangled, neurotic patterns and complicated twists of our nature-limited intellect, especially when we want to delve into topics of unpredictable complexity like existential risks and superintelligence.

It is super-strange that in 1726 Jonathan Swift perfectly described Large Language Models in a Satire about a Spanish Philosopher from the 13th Century: the Engine.

But the world would soon be sensible of its usefulness; and he flattered himself, that a more noble, exalted thought never sprang in any other man’s head. Everyone knew how laborious the usual method is of attaining to arts and sciences; whereas, by his contrivance, the most ignorant person, at a reasonable charge, and with a little bodily labour, might write books in philosophy, poetry, politics, laws, mathematics, and theology, without the least assistance from genius or study. (From Chapter V of Gulliver’s tales)

What once seemed satire has become reality.

If no one is drawing the strings, but the strings vibrate nevertheless, then imagine something entangled in the distance causes the resonance.

Heaps and Systems

The terms ‘complexity’ and ‘complicated’ shouldn’t be used interchangeably when discussing Artificial Intelligence (AI). Consider this analogy: knots are complicated, neural networks are complex. The distinction lies in the idea that a complicated object like a knot may be intricate and hard to unravel, but it’s ultimately deterministic and predictable. A complex system, like a neural network, however, contains multiple, interconnected parts that dynamically interact with each other, resulting in unpredictable behaviors.

Moreover, it’s important to address the misconception that complex systems can be overly simplified without losing their essential properties. This perspective may prove problematic, as the core characteristics of the system – the very aspects we are interested in – are intricately tied to its complexity. Stripping away these layers could essentially negate the properties that make the system valuable or interesting.

Finally, complexity in systems, particularly in AI, may bear similarities to the observer effect observed in subatomic particles. The observer effect postulates that the act of observation alters the state of what is being observed. In similar fashion, any sufficiently complex system could potentially change in response to the act of trying to observe or understand it. This could introduce additional layers of unpredictability, making these systems akin to quantum particles in their susceptibility to observation-based alterations.

Notes on Connectivity and Commonality

The notion of commonality is a fascinating one, often sparking deep philosophical conversations. An oft-encountered belief is that two entities – be they people, nations, ideologies, or otherwise – have nothing in common. This belief, however, is paradoxical in itself, for it assumes that we can discuss these entities in the same context and thus establishes a link between them. The statement “Nothing in common” implies that we are engaging in a comparison – inherently suggesting some level of relatedness or connection. “Agreeing to disagree” is another such example. At first glance, it seems like the parties involved share no common ground, but this very agreement to hold different views paradoxically provides commonality.

To further illustrate, consider this question: What does a banana have in common with cosmology? On the surface, it may appear that these two entities are completely unrelated. However, by merely posing the question, we establish a connection between them within the confines of a common discourse. The paradox lies in stating that two random ideas or entities have nothing in common, which contradicts itself by affirming that we are capable of imagining a link between them. This is akin to the statement that there are points in mental space that cannot be connected, a notion that defies the fluid nature of thought and the inherent interconnectedness of ideas. Anything our minds can host, must have at least a substance that our neurons can bind to, this is the stuff ideas are mode of.

Language, despite its limitations, doesn’t discriminate against these paradoxes. It embraces them, even when they seem nonsensical like “south from the South Pole” or “what was before time?” Such self-referential statements are examples of Gödel’s Incompleteness Theorem manifesting in our everyday language, serving as a reminder that any sufficiently advanced language has statements that cannot be proven or disproven within the system.

These paradoxes aren’t mere outliers in our communication but rather essential elements that fuel the dynamism of human reasoning and speculation. They remind us of the complexities of language and thought, the intricate dance between what we know, what we don’t know, and what we imagine.

Far from being a rigid system, language is constantly evolving and pushing its boundaries. It bumps into its limits, only to stretch them further, continuously exploring new frontiers of meaning. It’s in these fascinating paradoxes that we see language’s true power, as it straddles the line between logic and absurdity, making us rethink our understanding of commonality, difference, and the very nature of communication.

Categories & Concepts

One of the ways we categorize and navigate the world around us is through the verticality of expertise, or the ability to identify and classify based on deep, specialized knowledge. This hierarchical method of categorization is present everywhere, from biology to human interactions.

In biological taxonomy, for instance, animals are classified into categories like genus and species. This is a layered, vertical hierarchy that helps us make sense of the vast diversity of life. An animal’s genus and species provide two coordinates to help us position it within the zoological realm.

Similarly, in human society, we use first names and last names to identify individuals. This is another example of vertical classification, as it allows us to position a person within a cultural or familial context. In essence, these nomenclatures serve as categories or boxes into which we place the individual entities to understand and interact with them better.

Douglas Hofstadter, in his book “Surfaces and Essences”, argues that our language is rich with these classifications or groupings, providing ways to sort and compare objects or concepts. But these categorizations go beyond tangible objects and permeate our language at a deeper level, acting as resonating overtones that give language its profound connection with reasoning.

Language can be viewed as an orchestra, with each word acting like a musical instrument. Like musical sounds that follow the principles of musical theory and wave physics, words also have orderly behaviors. They resonate within the constructs of syntax and semantics, creating meaningful patterns and relationships. Just as a flute is a woodwind instrument that can be part of an orchestra playing in the Carnegie Hall in New York, a word, based on its category, plays its part in the grand symphony of language.

While many objects fit neatly into categorical boxes, the more abstract concepts in our language often resist such clean classifications. Words that denote abstract ideas or feelings like “you,” “me,” “love,” “money,” “values,” “morals,” and so on are like the background music that holds the orchestra together. These are words that defy clear boundaries and yet are essential components of our language. They form a complex, fractal-like cloud of definitions that add depth, richness, and flexibility to our language.

In essence, the practice of language is a delicate balance between the verticality of expertise in precise categorization and the nuanced, abstract, often messy, and nebulous nature of human experience. Through this interplay, we create meaning, communicate complex ideas, and navigate the complex world around us.

From Commanding to Prompting

It appears that we stand on the threshold of a new era in human-computer communication. The current trend of interacting with large language models through written prompts seems to echo our early experiences of typing words into an input box in the 1980s. This journey has been marked by a consistent effort to democratize the “expert’s space.”

In the earliest days of computing, only highly trained experts could engage with the esoteric world of machine code. However, the development of higher-level languages gradually made coding more accessible, yet the ability to program remained a coveted skill set in the job market due to its perceived complexity.

With the advent of large language models like GPT, the game has changed again. The ability to communicate with machines has now become as natural as our everyday language, making ‘experts’ of us all. By the age of twelve, most individuals have mastered their native language to a degree that they can effectively instruct these systems.

The ubiquitous mouse, represented by an on-screen cursor, can be seen as a transient solution to the human-computer communication challenge. If we draw a parallel with the development of navigation systems, we moved from needing to painstakingly follow directions to our destination, to simply telling our self-driving cars “Take me to Paris,” trusting them to figure out the optimal route.

Similarly, where once we needed to learn complex processes to send an email – understanding a digital address book, navigating to the right contact, formatting text, and using the correct language tone – we now simply tell our digital assistant, “Send a thank you email to Daisy,” and it takes care of the rest.

For the first time in tech history, we can actually have a conversation with our computers. This is a paradigm shift that is set to fundamentally redefine our relationship with technology. It would be akin to acquiring the ability to hold a meaningful conversation with a pet dog; imagine the profound change that would have on the value and role the animal plays in our lives. In much the same way, as our relationship with technology evolves into a more conversational and intuitive interaction, we will discover new possibilities and further redefine the boundaries of the digital realm.