By Natalia Cohn
🚀 Welcome to the latest Marvik Digest 🚀
This month we have some interesting stories involving multi-GAN optimization, Microsoft’s new IoT Insider Lab, speech-to-speech translation models, advancements in transformer architectures, and more.
➡️ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.
In the realm of #ComputerVision, generation of full-body human images is still a huge challenge🧍♀️🧍♂️. As humans, we are all different from each other. In terms of looks, we have our unique identity, appearance, shape and pose.
#Generativeadversarialnetworks (#GANs) emerged as a successful image generation paradigm. 🔴 However, issues arise when dealing with classes that show complex variations 🔴
In a recent paper published by Adobe Research, Kaust and University College London, they propose 🟢#InsetGAN🟢, an innovative method that combines multiple pretrained GANs, where one #GAN generates a global canvas and a series of specialized GANs focus on different body parts that can be inserted into the former.
➡️ Main takeaways:
📌 Introduces a multi-GAN optimization framework that jointly optimizes the latent codes of two or more collaborative generators such that the final image, formed by inserting the part insets on the canvas, does not exhibit any seams (e.g., a face, when added to the body, will be consistent in skin tone, clothing boundaries, and hair flow).
📌 Different canvas/part GANs can be trained at different resolutions, thus lowering the data (quality) requirements.
📌 Setup demonstrated by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans.
📌 Tested on a custom dataset and evaluated results with quantitative metrics and user studies.
👉 Find out more here https://bit.ly/3tjNJuP
👉 Visit www.marvik.ai or reach out to [email protected] to learn more about our experience using GANs.
A few days ago we had the chance to share some incredible moments during our team #getaway. We spent the whole weekend in a beautiful house, surrounded by nature and breathtaking landscapes 🍂 🌳 🌅.
There was room for everything. Playing board games near the fireplace 🔥, spirited ping-pong competitions 🏓 and improvised guitar jams and sing-alongs 🎤 🎸. In addition to this, part of the team volunteered to cook and delighted us with a nice Uruguayan barbecue and mouth-watering arepas 🇻🇪.
Even more rewarding was witnessing the presence of most of the Marvik team, both from Uruguay 🇺🇾 and different parts of Argentina 🇦🇷. For some of them, it was their first time visiting 🇺🇾, and certainly the first time we met in person.
Our team keeps growing and growing, and this is just the beginning. 🚀 Will you risk missing our next getaway?
Make sure that doesn’t happen. 👉 Click here https://bit.ly/3yYYIh4 to see all our open positions, or drop us an email to [email protected] to find out more.
Microsoft IoT Insider Lab
📢 Some great news for the #artificialintelligence community in Latin America 📢 Microsoft has chosen #Uruguay 🇺🇾 to host its new #AI & #IoT Insider Lab, the first of its kind in the region and only the third outside the US 🇺🇸
💡 This is game-changing given the growing impact of AI & IoT in the way people, devices and data interact in all aspects of life. Moreover, it puts Uruguay on the path of becoming an “innovation hub” for the region, acting as a facilitator of #innovation and creativity to transform business realities.
🚀 The lab’s mission is to show startups, corporations and organizations across industries how to leverage AI and IoT technologies to solve related challenges, while providing guidance and recommendations from experts so they can achieve their full potential.
➡️ The lab will offer:
📌 Experience-based knowledge from expe
rts: #electricalengineers, #cloudengineers, #datascientists, #programmanagers, #projectmanagers, and #softwareengineers.
📌 On-demand dedication from highly qualified #Microsoft collaborators.
📌 Project management, design, architecture, prototyping, and post-implementation customer and partner guidance.
👉 More on this initiative here https://bit.ly/3NPyNgk
👉 If you’re curious about how Microsoft’s AI & IoT Labs work, click here https://bit.ly/3NSyu46
New speech-to-speech translation model
Meta AI has recently released a new research paper on speech-to-speech translation (#S2ST) that does not rely on #textgeneration as an intermediate step 💡
This method enables faster inference and supports translation between unwritten languages (important since +40 %of the world’s languages are without text writing systems). Instead of the traditional approach (translating source speech into target speech spectrograms), they used discretized speech units obtained from the clustering of self-supervised speech representations.
🟢 Main achievements:
📌 First of its kind trained on real-world open sourced audio data for multiple language pairs
📌 Outperforms previous direct S2ST systems in terms of #runtime , #FLOPS, and #maxmemory
📌 Leverages pretraining with unlabeled speech data
👉 Click here to learn more https://bit.ly/3HEetvS
In our latest blog post, our #mlengineer Diego Sellanes discusses #DIET, Rasa’s latest transformer architecture, which works for entity recognition and intent classification. He goes over to explain how it works, its different modules, as well as its main advantages compared to similar models.
“RASA’s DIET transformer has a very powerful architecture. It proposes a new way of understanding state-of-the-art transformers, with a clever loss function which sums up every aspect of the model.”
👉 Visit our blog for the full story https://bit.ly/3zZ1rqY
👉 At Marvik, we have used Transformers to execute several #NLP projects. DM or reach out to [email protected] if you are curious about how you could apply them to enhance your #NLPmodels.
🚀YOLOv6 is finally out 🚀
#YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance.
🟢 Main takeaways:
📌 Efficient Decoupled Head with SIoU Loss
📌 Hardware-friendly Design for Backbone/Neck
📌 Detection accuracy and inference speed far exceed that of previous #YOLOv5
📌 Released under GNU General Public v3.0
📌Coming soon: + deployment options and quantization tools
👉 Check out the repo here https://bit.ly/3AaQHpy
📢 Google AI has recently launched the Pathways Autoregressive Text-to-Image model (#Parti), its second text-to-image generator model 📢
Parti uses an autoregressive model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.
📌Treats text-to-image generation as a sequence-to-sequence modeling problem (akin to machine translation) → allows it to benefit from advances in large language models.
📌Shows consistent quality improvements by scaling its encoder-decoder up to 20B parameters.
📌Achieves State-of-the-art zero-shot #FID score.
📌Complementary to #Imagen (its predecessor) in exploring two different families of generative models – autoregressive and diffusion → opens up exciting opportunities to combine both.
It’s exciting to witness all these breakthroughs in text-to-image generation 🚀
👉Click here to learn more about Parti https://bit.ly/3I4lMxe