Marvik Digest #5
By Natalia Cohn
๐ Welcome to the latest Marvik Digest ๐
Last month we covered some interesting stories involving multimodal transformers, stable diffusion, multilingual language models and more.ย
โก๏ธ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.
Stay tuned!
Hugging Faceโs new multimodal Transformer model
Great news to hear that the #TF version of #LayoutMv3 multimodal #Transformer model is now available on Hugging Face! ๐
Its simple yet revolutionary architecture improved many benchmarks from its predecessors, by being the first Document AI model which does not rely on a CNN or R-CNN backbone to extract visual features.ย
๐ข Main highlights:
๐ One of its biggest advantages is that it is a general-purpose model for both text-centric and image-centric Document AIย
๐ It unifies the concept of transformers for text centric purposes with the OCR & visual-centric models used for AI tasksย
At Marvik, we have used this model for object detection related tasks and it has yielded amazing results ๐คฉ
๐ก If you are facing a similar problem or have an idea to discuss on Document AI, letโs talk. Reach out to [email protected] and discover how we can help you.
โก๏ธ To access the model: https://bit.ly/3CXm7B8
#ai #artificialintelligence #machinelearning #ml #tensorflow #nlp #naturallanguageprocessing #languagemodels #multimodaltransformer #transformers
Stability AI.’s Stable diffusion
Give me โA corgi with sunglasses driving a teslaโ and you getโฆ ๐ค
Generative AI has come a long way. The introduction of #GANs allowed to reach new heights in the #ML space, but a new development is set to power the next generation of #AI imagen generation.
๐ We are talking about Stability AI.’s Stable diffusion ๐
How does this one differ from the other Diffusion models like #GLIDE, #DALLยทE 2 (OpenAI), #Imagen (Google)?
๐ Truly free and open source, both models and code
๐ Using latent diffusion, the model can be run with a consumer #GPU or even on an m1 chip
This means we can all finally use this powerful technique in our projects and play as much as we want with the amazing capabilities it offers, such as:
๐ Text to image generation (similar to #DALLยทE)
๐ Super resolution (#Denoising)
๐ Imagen in-painting (Removes items from images)
๐ Image out-painting (Generates more images based on one)
๐ Layout/Segmentation (Image generation)
๐ Class image generation (generates images following a single class, for example a car)
All this sounds nice, but why is it relevant?
Even though itโs really early in the life of Diffusion models, they are already performing on par or better than GANs -one of the strongest options for image generation-. Imagine all the possibilities that open up ๐คฉ ๐คฉ
Some ideas that come to mind:
๐ Infinite stock images
๐ Texture generation for games
๐ Artist inspiration for creating art
๐ Logo creation
๐ Clothing Fashion inspiration
๐Image colorization
At Marvik we have extensive experience using #GAN models and have some very exciting ideas on how to leverage this new era of generative AI ๐๐ป
Want to join in and see where we are heading? Reach out to [email protected] to find out๐
โก๏ธTo learn more about Stability.ai: https://bit.ly/3Br5fBJย
โก๏ธTo access the full paper: https://bit.ly/3QpeV3Tย
โก๏ธTo access the code: https://bit.ly/3QorcG1ย
#generativeai #imagegenetation #machinelearning #stablediffusion #diffusionmodels #artificialintelligence #deeplearningaiย #deepneuralnetworks #neuralnetworks #deeplearning #nlp #computervision #AI
Amazonโs new AlexaTM 20B
Another breakthrough in the field of #NLP (#naturallanguageprocessing) ๐
Amazonโs new multilingual language model (AlexaTM 20B) beats GPT-3 and other decoder-based language models in several NLP tasks ๐คฉ
๐ข Highlights
๐ Achieves state-of-the-art performance om 1-shot summarisation tasks and outperforms larger #PaLM decoder model with 540 billion parameters
๐ In zero-shot setting, it even outperforms GPT3 on #SuperGLUE and #SQuADv2 datasets.
๐ It also offers state-of-art performance on multilingual tasks like #XNLI, #XCOPA, #Paws-X, and #XWinograd.
โก๏ธ Github repository: https://bit.ly/3QDOuHV
โก๏ธ More on AlexaTM 20B: https://bit.ly/3RY7qSP
#machinelearning #ml #deeplearning #languagemodels #LLM #gpt3
OpenAIโs Whisper
๐Another milestone in the realm of speech recognition ๐
OpenAI is open-sourcing #Whisper, an automatic speech recognition (#ASR) system that approaches human level robustness and accuracy on English speech recognition.
๐ขHighlights
๐ Trained on 680,000 hours of multilingual and multitask supervised data collected from the web
๐ Enables transcription in multiple languages and translation from those languages into English
๐ The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language
๐ About โ of the dataset is non-English
๐ ASR shows strong results for nearly 10 languages
๐ Models & inference code are open-sourced
โก๏ธ More on Whisper here: https://bit.ly/3R9tvgm
#speechrecognition #speechprocessing #speechanalytics #ml #ai #machinelearning #artificialintelligence #naturallanguageprocessing #nlp
Size recommendation for e-commerce fashion
To all online shoppers out there, have you ever struggled to find your perfect fit? ๐ค
In the global fashion market, the sizing of garments tends to vary from brand to brand and even within a single brandโs collection. Shoppers must rely on sizing charts, product descriptions and images ๐๐๐. As users, this is a great challenge since the human body, with its diversity of shapes and dimensions, does not follow a standard pattern๐งโโ๏ธ๐ง. This often leads to over-ordering, returns and purchases that donโt meet consumersโ needs.
๐กAs e-commerce becomes the predominant form of fashion retail, there is an urgent need for fashion brands to solve this challenge, creating experiences that remove customer friction and make shopping fast and seamless.
๐ข At Marvik we are working with #deeplearning and #computervision techniques to build a size recommendation system that allows ecommerce buyers to know their body measurements and their recommended clothing size simply by uploading a pair of pictures ๐ฉ๐ป๐ง๐ฝโโ๏ธ
We are reaching out to our community to ask for your support on this exciting project ๐๐ป
โก๏ธ To participate in this initiative, please fill out this form https://bit.ly/3dNWBo1
#sizerecommender #cv #machinelearning #ml #artificialintelligence #fashion #ecommerce #onlineshopping #fashionretail #recsys