Interesting OpenAI projects

2021/ 05/02

Artificial intelligence offers us many positive opportunities and we hope that even more is possible by improving it. With several methods, for ex. with deep learning methods, the processes of developments are smoother and easier, and AI gets closer and closer to reaching human level intelligence.

Although the technology is not there yet, many scientists, such as Stephen Hawking,  have expressed their concerns that if advanced AI someday gains the ability to re-design itself, it will be an existential threat to humans. Besides these, it is reasonable that the one who developed human-level AI, would gain powerful advantages over others. This way, a company or state could become an AI superpower and could use it against others, but also the economic advantages could lead to serious conflicts.

The mission of OpenAI nonprofit research center, founded in 2015, is to prevent the catastrophe mentioned above by turning artificial intelligence into a universally available mass product. “If everyone has AI powers, then there's not any one person or a small set of individuals who can have AI superpower” said Elon Musk the co-founder of OpenAI. On their platform, they guarantee access by releasing open-source codes, and their first aim is to make humanity benefit the most from the application of AI.

The solutions on their website can be freely used in integration with other business systems, and Régens is happy to support its customers in such projects. You can read about some of these solutions in more detail below.

GPT-2

GPT-2 is a transformer language model that can generate coherent text. It was developed by unsupervised zero-shot learning, trained on the content of more than 8 million websites in order to be able of predicting the next word of a given text. In the process of zero-shot learning, the model is not especially trained to be capable of solving a task, and the task management is only evaluated at the termination of the process. In spite of this, GPT-2 can generate continuations of given texts and can also be used for reading comprehension tasks or for writing summaries, answering questions and language translations without any fine-tuning. Added to all these, it adapts to the style and content of the original text, so the generated continuations are totally coherent and realistic. Sometimes the model can make mistakes, it may tell false information or may groundlessly switch the topic, but it is already an active area of research how to eliminate these weaknesses.

The up-above listed useful application ideas can contribute to the improvements of chatbots and virtual assistants. Besides, GPT-2 can also be easily used for malicious purposes, for instance generating fake news or offensive manifestations. That is why OpenAI doesn’t release the trained model and its training’s codes or the database, only a much smaller version, which can still be perfectly used in research.

Image GPT

Image GPT’s operation principles are similar to GPT-3 transformation language model’s, the bases are the same: during pre-training, by analyzing huge amount of data and exploring patterns and repetitions, the model becomes capable of predicting the next word of a given text. In case of Image GPT, words equal pixels. This way, the model is able to generate images by arranging pixels, and can either create totally new ones or complete existing ones. The picture below shows an example of image completion with a given half image on the left, the original image on the right and the image completions generated by Image GTP between the two. (source: openai.com)

 

Although the quality of the original images should be reduced in order to help the model, it obviously does a good job, because the deep learning methods, which are used to the up-above mentioned transformer models, can be also applied to images and pixels. You can find an open-source code of Image GTP on the website of OpenAI.

DALL-E

Although the quality of the original images should be reduced in order to help the model, it obviously does a good job, because the deep learning methods, which are used to the up-above mentioned transformer models, can be also applied to images and pixels. You can find an open-source code of Image GTP on the website of OpenAI.

 

According to OpenAI, DALL-E has some of the capabilities of most 3D rendering software, so based on the text description, we can immediately get a complete image that can be used in many areas. For example, architects can use it to visualize buildings, archaeologists can use it to recreate ancient structures. It is also a perfect solution for creating animations which can be easily used in the film industry or for educational purposes. Designers or actually anyone can get inspirations from the images generated by DALL-E, and this way, interesting ideas, design elements, commercials can be created.

CLIP

CLIP is a neural network that is able to classify images by given categories. In comparison with other vision models, it has the advantage of being pre-trained on more than 400 million images from the Internet which have long image descriptions, not just one-word labels. This way, not only can CLIP be used with the labels it has been trained on, but also with ones it has never seen before. The users only need to define a list of possible classes or descriptions, and CLIP will make a prediction for which class a given image is most likely to fall into, based on its prior knowledge. Think of it as asking the model "which of these captions best matches this image?"

With this kind of consequential image classification, we can make the search of images much easier, for example if we classified the pictures in documents, we could search and sort by diagrams or layouts. The open-source code of CLIP is available on the website of OpenAI.

MuseNet

MuseNet is a deep neural network that can generate 4-minute musical compositions with 10 different instruments, based on a specific genre or a composer’s style. The model, like language models, was developed through unsupervised learning, using hundreds of thousands of MIDI files in order to be able to predict the next note in a given music sequence. MuseNet’s work is based on several musical patterns and repetitions.

If we were curious how the combination of Bon Jovi’s and Mozart’s music looks like played on guitar and piano, MuseNet could give us the answer, as it is perfect for entertainment purposes. However, the significance of the project is much more than that, because it creates a connection between image recognition and text reading by interpreting music and musical notes.



 

If you are interested in any of the artificial intelligence-based projects, visit the website of OpenAI for further information, and feel free to contact Régens if you would like to implement them in your business.

 

Source: OpenAI, robotflow, VentureBeat