Marvik Digest #2
By Natalia Cohn
๐ Welcome to the latest Marvik Digest ๐
This month we have some interesting stories involving multi-GAN optimization, Microsoftโs new IoT Insider Lab, speech-to-speech translation models, advancements in transformer architectures, and more.
โก๏ธ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.
Stay tuned!
InsetGAN
In the realm of #ComputerVision, generation of full-body human images is still a huge challenge๐งโโ๏ธ๐งโโ๏ธ. As humans, we are all different from each other. In terms of looks, we have our unique identity, appearance, shape and pose.ย
#Generativeadversarialnetworks (#GANs) emerged as a successful image generation paradigm. ๐ด However, issues arise when dealing with classes that show complex variations ๐ดย
In a recent paper published by Adobe Research, Kaust and University College London, they propose ๐ข#InsetGAN๐ข, an innovative method that combines multiple pretrained GANs, where one #GAN generates a global canvas and a series of specialized GANs focus on different body parts that can be inserted into the former.
โก๏ธ Main takeaways:
๐ Introduces a multi-GAN optimization framework that jointly optimizes the latent codes of two or more collaborative generators such that the final image, formed by inserting the part insets on the canvas, does not exhibit any seams (e.g., a face, when added to the body, will be consistent in skin tone, clothing boundaries, and hair flow).ย
๐ Different canvas/part GANs can be trained at different resolutions, thus lowering the data (quality) requirements.
๐ Setup demonstrated by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans.
๐ Tested on a custom dataset and evaluated results with quantitative metrics and user studies.
๐ Find out more here https://bit.ly/3tjNJuPย
๐ Visit www.marvik.ai or reach out to [email protected] to learn more about our experience using GANs.
Weekend Getaway
A few days ago we had the chance to share some incredible moments during our team #getaway. We spent the whole weekend in a beautiful house, surrounded by nature and breathtaking landscapes ๐ ๐ณ ๐ .
There was room for everything. Playing board games near the fireplace ๐ฅ, spirited ping-pong competitions ๐ and improvised guitar jams and sing-alongs ๐ค ๐ธ. In addition to this, part of the team volunteered to cook and delighted us with a nice Uruguayan barbecue and mouth-watering arepas ๐ป๐ช.ย
Even more rewarding was witnessing the presence of most of the Marvik team, both from Uruguay ๐บ๐พ and different parts of Argentina ๐ฆ๐ท. For some of them, it was their first time visiting ๐บ๐พ, and certainly the first time we met in person.
Our team keeps growing and growing, and this is just the beginning. ๐ Will you risk missing our next getaway?
Make sure that doesn’t happen. ๐ Click here https://bit.ly/3yYYIh4 to see all our open positions, or drop us an email to [email protected] to find out more.
Microsoft IoT Insider Lab
๐ข Some great news for the #artificialintelligence community in Latin America ๐ข Microsoft has chosen #Uruguay ๐บ๐พ to host its new #AI & #IoT Insider Lab, the first of its kind in the region and only the third outside the US ๐บ๐ธ
๐ก This is game-changing given the growing impact of AI & IoT in the way people, devices and data interact in all aspects of life. Moreover, it puts Uruguay on the path of becoming an “innovation hub” for the region, acting as a facilitator of #innovation and creativity to transform business realities.
๐ The labโs mission is to show startups, corporations and organizations across industries how to leverage AI and IoT technologies to solve related challenges, while providing guidance and recommendations from experts so they can achieve their full potential.
โก๏ธ The lab will offer:
๐ Experience-based knowledge from expe
rts: #electricalengineers, #cloudengineers, #datascientists, #programmanagers, #projectmanagers, and #softwareengineers.
๐ On-demand dedication from highly qualified #Microsoft collaborators.
๐ Project management, design, architecture, prototyping, and post-implementation customer and partner guidance.
๐ More on this initiative here https://bit.ly/3NPyNgkย
๐ If youโre curious about how Microsoftโs AI & IoT Labs work, click here https://bit.ly/3NSyu46
New speech-to-speech translation model
Meta AI has recently released a new research paper on speech-to-speech translation (#S2ST) that does not rely on #textgeneration as an intermediate step ๐ก
This method enables faster inference and supports translation between unwritten languages (important since +40 %of the worldโs languages are without text writing systems). Instead of the traditional approach (translating source speech into target speech spectrograms), they used discretized speech units obtained from the clustering of self-supervised speech representations.
๐ข Main achievements:ย
๐ First of its kind trained on real-world open sourced audio data for multiple language pairs
๐ Outperforms previous direct S2ST systems in terms of #runtime , #FLOPS, and #maxmemoryย
๐ Leverages pretraining with unlabeled speech data
๐ Click here to learn more https://bit.ly/3HEetvS
DIET Transformer
In our latest blog post, our #mlengineer Diego Sellanes discusses #DIET, Rasaโs latest transformer architecture, which works for entity recognition and intent classification. He goes over to explain how it works, its different modules, as well as its main advantages compared to similar models.
โRASAโs DIET transformer has a very powerful architecture. It proposes a new way of understanding state-of-the-art transformers, with a clever loss function which sums up every aspect of the model.โ
๐ Visit our blog for the full story https://bit.ly/3zZ1rqY
๐ At Marvik, we have used Transformers to execute several #NLP projects. DM or reach out to [email protected] if you are curious about how you could apply them to enhance your #NLPmodels.
YOLOv6
๐YOLOv6 is finally out ๐ย
#YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance.
๐ข Main takeaways:
๐ Efficient Decoupled Head with SIoU Loss
๐ Hardware-friendly Design for Backbone/Neck
๐ Detection accuracy and inference speed far exceed that of previous #YOLOv5ย
๐ Released under GNU General Public v3.0
๐Coming soon: + deployment options and quantization tools
๐ Check out the repo here https://bit.ly/3AaQHpy
Parti Model
๐ข Google AI has recently launched the Pathways Autoregressive Text-to-Image model (#Parti), its second text-to-image generator model ๐ข
Parti uses an autoregressive model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.
๐ข Highlights:
๐Treats text-to-image generation as a sequence-to-sequence modeling problem (akin to machine translation) โ allows it to benefit from advances in large language models.
๐Shows consistent quality improvements by scaling its encoder-decoder up to 20B parameters.
๐Achieves State-of-the-art zero-shot #FID score.ย
๐Complementary to #Imagen (its predecessor) in exploring two different families of generative models – autoregressive and diffusion โ opens up exciting opportunities to combine both.ย
Itโs exciting to witness all these breakthroughs in text-to-image generation ๐
๐Click here to learn more about Parti https://bit.ly/3I4lMxeย