🚀 Welcome to the latest Marvik Digest 🚀
Last month we covered some interesting stories involving multimodal transformers, stable diffusion, multilingual language models and more.
➡️ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.
Hugging Face’s new multimodal Transformer model
Its simple yet revolutionary architecture improved many benchmarks from its predecessors, by being the first Document AI model which does not rely on a CNN or R-CNN backbone to extract visual features.
🟢 Main highlights:
📌 One of its biggest advantages is that it is a general-purpose model for both text-centric and image-centric Document AI
📌 It unifies the concept of transformers for text centric purposes with the OCR & visual-centric models used for AI tasks
At Marvik, we have used this model for object detection related tasks and it has yielded amazing results 🤩
💡 If you are facing a similar problem or have an idea to discuss on Document AI, let’s talk. Reach out to [email protected] and discover how we can help you.
➡️ To access the model: https://bit.ly/3CXm7B8
Stability AI.’s Stable diffusion
Give me “A corgi with sunglasses driving a tesla” and you get… 🤔
🚀 We are talking about Stability AI.’s Stable diffusion 🚀
📌 Truly free and open source, both models and code
📌 Using latent diffusion, the model can be run with a consumer #GPU or even on an m1 chip
This means we can all finally use this powerful technique in our projects and play as much as we want with the amazing capabilities it offers, such as:
📌 Text to image generation (similar to #DALL·E)
📌 Super resolution (#Denoising)
📌 Imagen in-painting (Removes items from images)
📌 Image out-painting (Generates more images based on one)
📌 Layout/Segmentation (Image generation)
📌 Class image generation (generates images following a single class, for example a car)
All this sounds nice, but why is it relevant?
Even though it’s really early in the life of Diffusion models, they are already performing on par or better than GANs -one of the strongest options for image generation-. Imagine all the possibilities that open up 🤩 🤩
Some ideas that come to mind:
📌 Infinite stock images
📌 Texture generation for games
📌 Artist inspiration for creating art
📌 Logo creation
📌 Clothing Fashion inspiration
At Marvik we have extensive experience using #GAN models and have some very exciting ideas on how to leverage this new era of generative AI 🙌🏻
Want to join in and see where we are heading? Reach out to [email protected] to find out🔍
➡️To access the full paper: https://bit.ly/3QpeV3T
➡️To access the code: https://bit.ly/3QorcG1
Amazon’s new AlexaTM 20B
Amazon’s new multilingual language model (AlexaTM 20B) beats GPT-3 and other decoder-based language models in several NLP tasks 🤩
📌 Achieves state-of-the-art performance om 1-shot summarisation tasks and outperforms larger #PaLM decoder model with 540 billion parameters
➡️ Github repository: https://bit.ly/3QDOuHV
➡️ More on AlexaTM 20B: https://bit.ly/3RY7qSP
🚀Another milestone in the realm of speech recognition 🚀
📌 Trained on 680,000 hours of multilingual and multitask supervised data collected from the web
📌 Enables transcription in multiple languages and translation from those languages into English
📌 The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language
📌 About ⅓ of the dataset is non-English
📌 ASR shows strong results for nearly 10 languages
📌 Models & inference code are open-sourced
➡️ More on Whisper here: https://bit.ly/3R9tvgm
Size recommendation for e-commerce fashion
To all online shoppers out there, have you ever struggled to find your perfect fit? 🤔
In the global fashion market, the sizing of garments tends to vary from brand to brand and even within a single brand’s collection. Shoppers must rely on sizing charts, product descriptions and images 👚👖👔. As users, this is a great challenge since the human body, with its diversity of shapes and dimensions, does not follow a standard pattern🧍♂️🧍. This often leads to over-ordering, returns and purchases that don’t meet consumers’ needs.
💡As e-commerce becomes the predominant form of fashion retail, there is an urgent need for fashion brands to solve this challenge, creating experiences that remove customer friction and make shopping fast and seamless.
🟢 At Marvik we are working with #deeplearning and #computervision techniques to build a size recommendation system that allows ecommerce buyers to know their body measurements and their recommended clothing size simply by uploading a pair of pictures 👩🏻🧔🏽♂️
We are reaching out to our community to ask for your support on this exciting project 🙏🏻
➡️ To participate in this initiative, please fill out this form https://bit.ly/3dNWBo1