A researcher says an image generating Artificial Intelligence ‘invented its own language’. An image AI called DALL-E has sparked debate among AI experts who claim it is creating a secret language to categorise images, the words look like gibberish but they do have a hidden meaning.
Giannis Daras, a PhD candidate in computer science, used Twitter to provide samples of the “language,” which the AI had developed to detect birds and insects. Birds are referred to as “Apoploe vesrreaitais,” whereas insects are referred to as “Contarra ccetnxniams luryca tanniounons.” In the viral thread, Daras asserted that the system will produce images associated with the AI’s made-up words if you re-enter them.
DALLE-2 has a secret language.
“Apoploe vesrreaitais” means birds.
“Contarra ccetnxniams luryca tanniounons” means bugs or pests.The prompt: “Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons” gives images of birds eating bugs.
A thread (1/n)? pic.twitter.com/VzWfsCFnZo
— Giannis Daras (@giannis_daras) May 31, 2022
Daras and his colleague Alexandros G. Dimakis stated in a study report that has not yet been peer-reviewed that text prompts like: “An image of the word aeroplane” typically lead to generated images that resemble nonsense text. We find that the generated text is not arbitrary, but rather shows a secret vocabulary that the model appears to have created on its own.
For instance, the model commonly generates aeroplanes when fed this nonsense text. Theoretically, the AI can comprehend words that are read back to it after generating them on its own to make sense of the images it makes.
The assertions generated discussion on Twitter.”Hmm, time to brush up on signs of demonic possession.” one user, Dmitriy Mandel, said.
Added someone else: “Is this an actual LANGUAGE? With grammar and stuff?”
However, other AI experts remain highly sceptical of the claims.
A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: “Two farmers talking about vegetables, with subtitles” gives an image that appears to have gibberish text on it.
However, the text is not as random as it initially appears… (2/n) pic.twitter.com/B3e5qVsTKu
— Giannis Daras (@giannis_daras) May 31, 2022
Text-to-image generating AI ‘DALLE-E2’ ‘invented its own language’
A computer scientist has claimed that the ‘DALLE-E2’ system ‘invented its own language’ by generating gibberish text when asked to produce images of words. When plugged back into the system, the device produces images of airplanes. Its creators are still unsure whether or not their machine has truly ‘invented’ its own language, but they are optimistic in the long run.
The DALLE-E2 system ‘invented its own language’, which is a portmanteau of the artists Salvador Dali and Pixar’s robot WALL-E. This AI is able to translate a set of words into an image based on concepts that it has learned. By using its ‘invented language’, DALLE-E2 can also generate images from simple phrases and fill in the details based on concepts related to the inputs.
System generates its own code
The MPS platform includes facilities for generating code. You can write a program in the MPS language, which specifies the behavior of the various components, and interact with the filesystem directly. However, if you prefer to write your code by hand, you can also use long text files to store the code. The latter is particularly useful for writing data formats and configuration files. Here are some examples of MPS code generation.
The disadvantages of code generation: Automatically generated code is typically more complex than handwritten code. Especially for glue code, automatically generated code may be more complex than the original handwritten code. The generated code may not be as well optimized as the handwritten version, but the performance difference is usually small. Code generators can support more use cases than required. However, this doesn’t mean they can’t improve performance. The advantages of code generation are numerous and should be considered carefully.
Uses byte-pair encoding (BPE) to understand “secret language”
Byte-pair encoding is a key factor in image generation. This technique can understand the “secret language” of DALL-E 2, and it’s a major breakthrough in image generation. Byte-pair encoding uses the ‘garbage in, garbage out’ principle, and allows DALL-E to be a little more human-like in its responses.
In contrast, removing one character from a garbled word corrupts the generated image, because the individual words don’t combine to create a cohesive composite image. This adversarial attack is the basis for the development of “secret language” in DALL-E, an artificial language that combines parts of other languages. Its “secret language” is not as well-defined as the language DALL-E speaks.
System struggles with text
The team behind Image Generating AI (IGA) has developed a benchmark called the DrawBench, which consists of text prompts designed to probe the different semantic properties of the models. The DrawBench is a collaborative effort between the Brain team and a team of human evaluators, who then compare the results of two models side by side. In this case, the team evaluated Imagen against DALL-E 2 and three other similar models.
This AI has created several examples of gibberish captions for images. Giannis Daras, a computer science PhD student at the University of Texas at Austin, showed the researchers examples of how the DALL-E algorithm translates text prompts into images. He claimed that while the generated text is gibberish, it retains associations with images. DALL-E is able to consistently generate images of fruits and vegetables. Other nonsense phrases generated by the algorithm also produce images of birds and insects.
Random noise as a second language
Researchers have now revealed that DALL-E, a robot with a secret language, was able to generate gibberish text when asked to generate images. It was then able to translate the gibberish text back into images, including those of airplanes. The researchers have speculated that the robot may be a security flaw. The machine is now capable of identifying and predicting human actions.
The two AIs have different ways of learning and the difference between them is the way they approach the training process. Imagen uses diffusion learning, which is a process by which a computer model is trained by adding random noise to a large set of images. As the computer model gets better and more experienced, it can eventually learn to reproduce the original image. For now, the image quality of Parti and Imagen will not stand up to scrutiny, but the generated images are enough to catch people’s attention.