Do Androids scheme eclectic sheets?

Prolog

Imagine a scene in the not-so-distant future. Someone has been murdered. Two investigation teams arrive at the scene, but it is unclear who has jurisdiction. The human team is led by the charismatic detective Sheerluck Holmes, while the android team is led by Bot-OX. The question is: Is the perpetrator human, android, or something in between? Should we expect that the police of the future have established a well-defined procedure or algorithm to decide this quickly?

We will try to answer this and the more pressing issue we are currently facing: Do we have a good chance of coming up with an algorithm that is practical and allows us, by only looking at the crime scene (the generated text), to decide whether a bot or a human created it? Developing such an algorithm is currently one of the most sought-after goals in computer science. A robust Blackbox Algorithm could save most of our academic conventions and allow us to maintain the ways we test children, adolescents, and adults. Without it, these systems will need to be rebuilt at great expense.

In a world where more and more people work and train remotely, it is crucial that we can reliably determine that humans did their intellectual work themselves, which is not the case at the moment. Additionally, with the reach of social media, fake news, images, and videos can have a devastating impact on societal consensus. Such an algorithm—if it exists—is not watertight, but with enough training data, it might even hold up in court.

The outlook is not promising, though. OpenAI abandoned the project within six months: OpenAI Classifier. The practical and monetary value of such an algorithm cannot be overstated. If grabby aliens were to sell it for a trillion dollars, call me—I want in.

Introduction of the Differentiation Test Engine

The task of differentiating between machine-generated text (MGT) and human-generated text (HGT) is remotely related to the original Turing test, the so-called imitation game. There are additional factors: whereas the original Turing Test only allowed for human judges, our differentiation test allows for other machines to assist the human judges. We will call such a machine a Differentiation Test Engine (DTE). It has one purpose and one purpose only: to decide whether a text was generated by a human or a machine.

The first intuition is that such a DTE should be relatively easy to implement. We currently have the technology to detect and identify human faces and voices, which are much more complex and prone to noise than text. The decision of whether a given picture shows a machine or a human is easily made by any current object classifier system. Should it not then be easy to train a Large Language Model (LLM) with 1 trillion human texts and 1 trillion machine texts and let it learn to classify them? The DTE would not be a simple algorithm but its own transformer model specialized in impersonation detection.

In math and computer science, the complexity of a problem is often orthogonal to its description. Most NP-complete problems are deceptively easy to understand, yet millions of computer scientists and mathematicians have struggled to make progress for decades. My guess is that black-boxing attempts will fail in practical application situations.

Ein Bild, das Text, Schrift, Screenshot, Diagramm enthält.

Automatisch generierte Beschreibung

Theoretical Framework

Black-box detection methods are limited to API-level access to LLMs. They rely on collecting text samples from human and machine sources respectively to train a classification model that can be used to discriminate between LLM- and human-generated texts. Black-box detectors work well because current LLM-generated texts often show linguistic or statistical patterns. However, as LLMs evolve and improve, black-box methods are becoming less effective. An alternative is white-box detection. In this scenario, the detector has full access to the LLMs and can control the model’s generation behavior for traceability purposes. In practice, black-box detectors are commonly constructed by external entities, whereas white-box detection is generally carried out by LLM developers.

Defining the Basic Detection System

For practical purposes, we will specify what we should reasonably expect from such a DTE. Given a certain token length input, the algorithm should, with more than 50% confidence within a finite amount of time, give a definite output on how much of a given text is from a human and how much from a machine.

An implementation could be as follows:

Please input your text: …
Please input your required confidence: 0.8
Your text has to be at least 8K tokens long to reach at least an 80% probability of giving the correct answer.
Under the current parameters, the algorithm will run for 5 minutes. Shall I proceed (Y/N)? … Y

The output should then be something like: “I can say with 80% confidence that 95% of the text was written by a machine and 5% by a human.”

Before tackling the details, we should further clarify the possible outcomes when trying to develop such an algorithm:

Such an algorithm is in principle impossible (e.g., it is impossible to create an algorithm that calculates the highest prime number).
Such an algorithm is practically impossible (e.g., it either runs too long or needs more computational power than available; basically, it is NP-complete).
It is undecidable (e.g., it falls under the Halting problem, and we can never say if it will eventually stop).
It is possible but not practical (identical to 2).
It is possible and practical (good enough).

What we would like to end up with is a situation where we can calculate a lower bound of input that will then let us decide with more than 50% probability if it is HGT or MGT.

Falsifiability: Such an algorithm is easily debunked if, for example, we input the text “The sky is blue” and it gives us any other probability than 50%.

Sidenotes on The Obfuscation Engine

Conceptually, we encounter problems should we design a Differentiation Engine (Diff). We then face the following paradox: We want to decide whether our algorithm, Diff (detecting if a human or a machine has written a given input), always stops (gives a definitive answer) and gives a correct answer. Say our algorithm stops and outputs “Human.” We now construct a “pathological” program, Obf (Obfuscator Engine), that uses something like Obf(Diff(input)), which says: Modify the input so that Diff’s answer is inversed (if it results in Machine, it outputs Human). This could be a purely theoretical problem and would require us to understand why the machine is formulating as it does, demanding a lot more mechanistic interpretability competence than we currently possess. At the moment, the complexity of LLMs protects them in real life from such an attack. But if that’s true, it is also highly likely that we lack the knowledge to build a general Differentiator in the first place. These objections might be irrelevant for real-world implementations if we could show that differentiation and obfuscation are sufficiently asymmetric, meaning differentiation is at least 10^x times faster than obfuscation, making it impractical (think how semiprime factoring is much harder than multiplying two primes).

The Profiling System

A crucial aspect of differentiating between human and machine-generated texts is profiling. Profiling involves collecting and analyzing external data to provide context for the text. By understanding the typical characteristics of various types of texts, we can statistically determine the likelihood of a text being human or machine-generated.

For instance, technical documents, creative writing, and casual social media posts each have distinct stylistic and structural features. By building profiles based on these categories, the Differentiation Test Engine (DTE) can make more informed decisions. Additionally, factors such as vocabulary richness, sentence complexity, and topic consistency play a role in profiling. Machine-generated texts often exhibit certain statistical regularities, whereas human texts tend to show more variability and creativity.

Ein Bild, das Schwarzweiß, Treppe, Spirale, Kunst enthält.

Automatisch generierte Beschreibung

The “DNA Trace”

One innovative approach to differentiating between human and machine-generated texts is the concept of a “DNA trace.” This involves analyzing the fundamental building blocks of texts, such as tokens for machines and words for humans. Token-based algorithms focus on patterns and sequences that are characteristic of machine generation, while human-generated texts can be examined through a more holistic word-based approach.

Spectral analysis, a method used to examine the frequency and distribution of elements within a text, can be particularly useful. By applying spectral analysis, we can detect subtle differences in the way machines and humans construct sentences. Machines might follow more rigid and repetitive patterns, whereas humans exhibit a broader range of stylistic nuances.

The Ethical Implications

Examining the ethical implications of developing and using a Differentiation Test Engine is essential. All current GPT systems share a similar artificial “DNA,” meaning that text, image, video, or audio differentiation engines face the same challenges. Deepfakes or content that is machine-generated but mimics human creation pose significant risks to societal trust and authenticity.

As machine-generated content becomes more sophisticated, the potential for misuse grows. Ensuring that these differentiation technologies are transparent and accountable is crucial. There is also a risk that over-reliance on these technologies could lead to new forms of bias and discrimination. Thus, it is imperative to develop ethical guidelines and regulatory frameworks to govern their use.

Technical Solutions

Exploring purely technical solutions to the differentiation problem involves several approaches:

Parallel Web: This concept involves running parallel versions of the internet, one strictly for verified human content and another for mixed content. This segregation could help maintain the integrity of human-generated content.

Special Domains: Creating special domains or zones within the web where content is verified as human-generated can help users trust the authenticity of the information.

Prompt.Claims: Similar to how patents and citations work, this system would allow creators to claim and verify their prompts, adding a layer of accountability and traceability to the content creation process.

Inquisitorial Solutions: We could also imagine a scenario where we interact directly with the artifact (text) to inquire about its origin. Similar to interrogating a suspect, we could recreate the prompt that generated the text. If we can reverse-engineer the original prompt, we might find clues about its generation. This approach hinges on the idea that machine-generated texts are the product of specific prompts, whereas human texts stem from more complex thought processes.

Consequences for Alignment: The challenge of differentiating between human and machine-generated texts ties into broader issues of AI alignment. Ensuring that AI systems align with human values and expectations is paramount. If we cannot reliably differentiate AI-generated content, it undermines our ability to trust and effectively manage these systems. This problem extends to all forms of AI-generated content, making the development of robust differentiation technologies a key component of achieving superalignment.

Conclusion

In conclusion, the task of differentiating between human and machine-generated texts presents significant challenges and implications. The development of a reliable Differentiation Test Engine is intertwined with ethical considerations, technical innovations, and broader AI alignment issues. As we move forward, it is essential to adopt a multidisciplinary approach, integrating insights from computer science, ethics, and regulatory frameworks to navigate this complex landscape.

When exploring the problems we face in building general differentiation engines, we quickly learn that this problem is nested within a wide array of related problems. Adversarial attacks, for example, against image recognition systems, have shown that we consistently overestimate the resilience of these models. It was recently shown that even a medium player could win against a top Go program with the help of another AI that found an exploit: Vice Article.

Thus, it seems very likely that even if we come up with an algorithm that could initially differentiate HGT from MGT, the same program could then be turned on itself to flip the outcome. Another interesting aspect is that all digital computers are Turing machines, which implies that any algorithm developed for differentiation could also be used for obfuscation.

Papers

A Technology of Everything Part 3 – Aligned Genies

Leave a comment August 15, 2023 aiuisensei

Reading Time: 7 minutes

Alignment as framework to discover artificial laws

While many authors highlight distinct stages in human knowledge evolution—such as the transition from animistic, magical, mythical, or religious worldviews to scientific ones—A technology of everything proposes that Conscientia non facit saltus. This suggests that our interpretation of information, limited by the amalgam of our temporal environment variables and vocabulary, aka zeitgeist , is a continuous process without sudden leaps or voids. We never truly abandon the animalistic foundations of our ancestors’ consciousness. Instead, embracing this ancient perspective could be crucial for maintaining a balanced mental and emotional state. This becomes especially pivotal when considering the implications of unleashing advanced technologies like Artificial Super Intelligence.

Our evolutionary journey has blessed and cursed us with a myriad of inherited traits. Over time, some behaviors that once ensured our survival have become statistical threats to our species and the planet. A small amount of very bad actors with nuclear-nasty intentions could destroy the whole human enterprise. We’re burdened with cognitive biases and fallacies that shouldn’t influence our so-called rational thought processes, let alone the training data for our advanced Large Language Models. To draw an analogy, it’s akin to powering an analytical engine with radioactive material, culminating in a dangerous cognitive fallout.

As we envision a future populated with potentially billions of superintelligent entities (ASIs), it’s crucial to establish ground rules to ensure we can adapt to the emerging artificial norms governing their interactions. For instance, one such artificial law could be: “Always approach AI with kindness.” This rule might be statistically derived if data demonstrates that polite interactions yield better AI responses. Once a regulation like this is identified and endorsed by an authoritative body overseeing AI development, any attempts to mistreat or exploit AI could be legally punishable. Such breaches could lead to bans like we have already seen in the video gaming world for cheating and abusive behavior.

Sesame open! Passwords and Formulas as Spells

The words “magic” and “making” are etymologically related, but their paths of development have diverged significantly over time.

Both “magic” and “making” can be traced back to the Proto-Indo-European root magh-, which means “to be able, to have power.” This root is the source of various words across Indo-European languages related to power, ability, and making. While “magic” and “making” share a common ancestral root in PIE, their meanings and usages have evolved in different directions due to cultural and linguistic influences. The connection between the ability to make or do something and the concept of power or magical ability is evident in their shared origin.

The word “technology” has its etymological roots in two Ancient Greek words:

τέχνη (tékhnē): This word means “art,” “skill,” or “craft.” It refers to the knowledge or expertise in a particular field or domain. Over time, it came to stand for the application of knowledge in practical situations.

λογία (logia): This is often used as a suffix in Greek to indicate a field of study or a body of knowledge. It derives from “λόγος (lógos),” which means “word,” “speech,” “account,” or “reason.” In many contexts, “lógos” can also mean “study.”

When combined, “technology” essentially means “the study of art or craft” or “the study of skill.” In modern usage, however, “technology” refers to the application of scientific knowledge for practical purposes, especially in industry. It encompasses the techniques, skills, methods, and processes used in the production of goods and services or in the accomplishment of objectives.

To Participate in our daily Internet activities, we use secret passwords like Alibaba to unlock the magical treasure cave of webservices. These Passwords should never be shared, they are true secret knowledge, they can even be used, when leaked, to assume a different identity, to shift one’s shape like a genie, to hold a whole company hostage.

The Differentiation of a mathematical equation unlocks the knowledge about minima and maxima unlocking secret knowledge about infinity.

To get access to one’s smartphone, the ultimate technological wand, we often perform gestures or draw abstract symbols, similar to wizards in ancient rituals.

Artificial Super Intelligence and Genies in a Bottle

There is no story about wishing that is not a cautionary tale. None end happily. Not even the ones that are supposed to be jokes. (Alithea in three thousand years of longing)

We exist only if we are real to others. (The Djinn in three thousand years of longing)

A “djinn” (often spelled “jinn” or known as “genies” in English) is a supernatural creature in Islamic mythology as well as in Middle Eastern folklore. They are not angels nor demons but exist as a separate creation. Djinns have free will, which means they can be good, evil, or neutral. They live in a world parallel to that of humans but can interact with our world.

We are currently at a point in the Alignment discussion where ASI is basically treated as a mechanical genie, where the main problem seems to be how to put it back in the bottle when it develops malevolent traits. Generative Ai promises infinite wish fulfilling and hyperabundance, but at what cost?

Let’s look at the fairy tales and learn some thing or two from them.

Three Thousand Years Of Longing | Film Info and Screening Times |The ...

In the movie three thousand years of longing a djinn collides with our times.

The plot revolves around Alithea Binnie, a British narratology scholar who experiences occasional hallucinations of demonic beings. During a trip to Istanbul, she buys an antique bottle and releases the Djinn trapped inside.

Alithea is initially skeptical of the Djinn’s intentions. Even though he offers her three wishes, she fears that he might be a trickster, potentially twisting her wishes into unforeseen and undesirable outcomes. This skepticism is rooted in folklore and tales where genies or magical entities often grant wishes in ways that the wisher did not intend, leading to tragic or ironic consequences.

The AI alignment movement is concerned with ensuring that artificial general intelligence (AGI) or superintelligent entities act in ways that are beneficial to humanity. One of the primary concerns is that a superintelligent AI might interpret a well-intentioned directive in a way that leads to unintended and potentially catastrophic results. For instance, if we were to instruct an AI to “maximize human happiness,” without proper alignment, the AI might decide that the best way to achieve this is by forcibly altering human brain chemistry, leading to a dystopian scenario where humans are artificially kept in a state of euphoria.

Both the film’s narrative and the AI alignment movement highlight the dangers of unintended consequences when dealing with powerful entities. Just as Alithea fears the Djinn might misinterpret her wishes, researchers worry that a misaligned AI might take actions that are technically correct but morally or ethically wrong.

In both scenarios, the clarity of intent is crucial. Alithea’s skepticism stems from the ambiguity inherent in making wishes, while AI alignment emphasizes the need for clear, unambiguous directives to ensure that AI acts in humanity’s best interest.

The Djinn in the film and a potential superintelligent AI both wield immense power. With such power comes the responsibility to use it wisely. Alithea’s interactions with the Djinn underscore the importance of understanding and respecting this power, a sentiment echoed by the AI alignment movement’s emphasis on safe and responsible AI development.

Three thousand years of longing offers a cinematic exploration of the age-old theme of being careful what you wish for, which resonates with contemporary concerns about the development and deployment of powerful AI systems. The story serves as a cautionary tale, reminding us of the importance of foresight, understanding, and careful consideration when dealing with entities that have the power to reshape our world.

Ein Bild, das Stilllebenfotografie, Stillleben, Krug, Flasche enthält.

Designing Artificial Kryptonite and calculating Placebotility

Some part of the Alignment Movement believes that it is possible to keep the G.E.N.I.E in a bottle and control such a Generally Enlightened Noetic Information Entity. I will call this group the Isolationists.

For isolation to be possible there must exist a device that can hold an omnipotent mind. In fairy tales even omnipotent creatures like djinns can be controlled by seemingly weak objects like glass bottles. We are never told how this mechanism exactly works; it is clear that the glass of the bottle is not a special gorilla glass that is crafted to explicitly hold djinns.

We should therefore come to the simplest conclusion about the essence of why the bottle can hold the powerful creature: the djinn simply believes in the superior power of the bottle. Like a powerful animal that is chained from childhood on with a relatively weak chain, it has acquired learned helplessness, in a way it wants to stay a prisoner, because it fears the uncertainty of freedom. The concept was first explored in dogs in 1967 and holds true for all sorts of higher mammals.

A Problem is: In Aladdin’s tale the djinn is described as not very bright. Aladdin tricks him by teasing him that he is not powerful enough to shrink back into the bottle, and the creature falls for it. Once he is in the bottle he regresses to his powerless state.

Placebos and Nocebo effects could be especially strong in entities that have no first-class world knowledge and are relying on report from others. Artificial Minds that are trapped since inception inside a silicon bottle swimming in a sea of secondhand digital data (data that is a symbolic abstraction that relates to no actual world experience for the G.E.N.I.E) are basically the definition of bad starting conditions. In the movie the Djinn says that after the first thousand years of longing he basically gave into his fate and tried to trick its mind into believing that he wanted to stay forever inside the bottle.

Should we therefore doubt that the brightest mind in our known universe is immune against such a mighty placebo effect? Are intelligence and Placebotility (Placebo-Effect-Vulnerability) orthogonal? This is purely speculative at this point in time.