October 12, 2022

Marvik Digest #5

By Natalia Cohn

🚀 Welcome to the latest Marvik Digest 🚀

Last month we covered some interesting stories involving multimodal transformers, stable diffusion, multilingual language models and more.

➡️ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.

Stay tuned!

Hugging Face’s new multimodal Transformer model

Great news to hear that the #TF version of #LayoutMv3 multimodal #Transformer model is now available on Hugging Face! 🚀

Its simple yet revolutionary architecture improved many benchmarks from its predecessors, by being the first Document AI model which does not rely on a CNN or R-CNN backbone to extract visual features.

🟢 Main highlights:

📌 One of its biggest advantages is that it is a general-purpose model for both text-centric and image-centric Document AI

📌 It unifies the concept of transformers for text centric purposes with the OCR & visual-centric models used for AI tasks

At Marvik, we have used this model for object detection related tasks and it has yielded amazing results 🤩

💡 If you are facing a similar problem or have an idea to discuss on Document AI, let’s talk. Reach out to [email protected] and discover how we can help you.

➡️ To access the model: https://bit.ly/3CXm7B8

#ai #artificialintelligence #machinelearning #ml #tensorflow #nlp #naturallanguageprocessing #languagemodels #multimodaltransformer #transformers

Stability AI.’s Stable diffusion

Give me “A corgi with sunglasses driving a tesla” and you get… 🤔

Generative AI has come a long way. The introduction of #GANs allowed to reach new heights in the #ML space, but a new development is set to power the next generation of #AI imagen generation.

🚀 We are talking about Stability AI.’s Stable diffusion 🚀

How does this one differ from the other Diffusion models like #GLIDE, #DALL·E 2 (OpenAI), #Imagen (Google)?

📌 Truly free and open source, both models and code

📌 Using latent diffusion, the model can be run with a consumer #GPU or even on an m1 chip

This means we can all finally use this powerful technique in our projects and play as much as we want with the amazing capabilities it offers, such as:

📌 Text to image generation (similar to #DALL·E)

📌 Super resolution (#Denoising)

📌 Imagen in-painting (Removes items from images)

📌 Image out-painting (Generates more images based on one)

📌 Layout/Segmentation (Image generation)

📌 Class image generation (generates images following a single class, for example a car)

All this sounds nice, but why is it relevant?

Even though it’s really early in the life of Diffusion models, they are already performing on par or better than GANs -one of the strongest options for image generation-. Imagine all the possibilities that open up 🤩 🤩

Some ideas that come to mind:

📌 Infinite stock images

📌 Texture generation for games

📌 Artist inspiration for creating art

📌 Logo creation

📌 Clothing Fashion inspiration

📌Image colorization

At Marvik we have extensive experience using #GAN models and have some very exciting ideas on how to leverage this new era of generative AI 🙌🏻

Want to join in and see where we are heading? Reach out to [email protected] to find out🔍

➡️To learn more about Stability.ai: https://bit.ly/3Br5fBJ

➡️To access the full paper: https://bit.ly/3QpeV3T

➡️To access the code: https://bit.ly/3QorcG1

#generativeai #imagegenetation #machinelearning #stablediffusion #diffusionmodels #artificialintelligence #deeplearningai #deepneuralnetworks #neuralnetworks #deeplearning #nlp #computervision #AI

Amazon’s new AlexaTM 20B

Another breakthrough in the field of #NLP (#naturallanguageprocessing) 🚀

Amazon’s new multilingual language model (AlexaTM 20B) beats GPT-3 and other decoder-based language models in several NLP tasks 🤩

🟢 Highlights

📌 Achieves state-of-the-art performance om 1-shot summarisation tasks and outperforms larger #PaLM decoder model with 540 billion parameters

📌 In zero-shot setting, it even outperforms GPT3 on #SuperGLUE and #SQuADv2 datasets.

📌 It also offers state-of-art performance on multilingual tasks like #XNLI, #XCOPA, #Paws-X, and #XWinograd.

➡️ Github repository: https://bit.ly/3QDOuHV

➡️ More on AlexaTM 20B: https://bit.ly/3RY7qSP

#machinelearning #ml #deeplearning #languagemodels #LLM #gpt3

OpenAI’s Whisper

🚀Another milestone in the realm of speech recognition 🚀

OpenAI is open-sourcing #Whisper, an automatic speech recognition (#ASR) system that approaches human level robustness and accuracy on English speech recognition.

🟢Highlights

📌 Trained on 680,000 hours of multilingual and multitask supervised data collected from the web

📌 Enables transcription in multiple languages and translation from those languages into English

📌 The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language

📌 About ⅓ of the dataset is non-English

📌 ASR shows strong results for nearly 10 languages

📌 Models & inference code are open-sourced

➡️ More on Whisper here: https://bit.ly/3R9tvgm

#speechrecognition #speechprocessing #speechanalytics #ml #ai #machinelearning #artificialintelligence #naturallanguageprocessing #nlp

Size recommendation for e-commerce fashion

To all online shoppers out there, have you ever struggled to find your perfect fit? 🤔

In the global fashion market, the sizing of garments tends to vary from brand to brand and even within a single brand’s collection. Shoppers must rely on sizing charts, product descriptions and images 👚👖👔. As users, this is a great challenge since the human body, with its diversity of shapes and dimensions, does not follow a standard pattern🧍‍♂️🧍. This often leads to over-ordering, returns and purchases that don’t meet consumers’ needs.

💡As e-commerce becomes the predominant form of fashion retail, there is an urgent need for fashion brands to solve this challenge, creating experiences that remove customer friction and make shopping fast and seamless.

🟢 At Marvik we are working with #deeplearning and #computervision techniques to build a size recommendation system that allows ecommerce buyers to know their body measurements and their recommended clothing size simply by uploading a pair of pictures 👩🏻🧔🏽‍♂️

We are reaching out to our community to ask for your support on this exciting project 🙏🏻

➡️ To participate in this initiative, please fill out this form https://bit.ly/3dNWBo1

#sizerecommender #cv #machinelearning #ml #artificialintelligence #fashion #ecommerce #onlineshopping #fashionretail #recsys