July 5, 2022

Marvik Digest #2

By Natalia Cohn

๐Ÿš€ Welcome to the latest Marvik Digest ๐Ÿš€

This month we have some interesting stories involving multi-GAN optimization, Microsoftโ€™s new IoT Insider Lab, speech-to-speech translation models, advancements in transformer architectures, and more.

โžก๏ธ Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions.

Stay tuned!



In the realm of #ComputerVision, generation of full-body human images is still a huge challenge๐Ÿงโ€โ™€๏ธ๐Ÿงโ€โ™‚๏ธ. As humans, we are all different from each other. In terms of looks, we have our unique identity, appearance, shape and pose.ย 

#Generativeadversarialnetworks (#GANs) emerged as a successful image generation paradigm. ๐Ÿ”ด However, issues arise when dealing with classes that show complex variations ๐Ÿ”ดย 

In a recent paper published by Adobe Research, Kaust and University College London, they propose ๐ŸŸข#InsetGAN๐ŸŸข, an innovative method that combines multiple pretrained GANs, where one #GAN generates a global canvas and a series of specialized GANs focus on different body parts that can be inserted into the former.

โžก๏ธ Main takeaways:

๐Ÿ“Œ Introduces a multi-GAN optimization framework that jointly optimizes the latent codes of two or more collaborative generators such that the final image, formed by inserting the part insets on the canvas, does not exhibit any seams (e.g., a face, when added to the body, will be consistent in skin tone, clothing boundaries, and hair flow).ย 

๐Ÿ“Œ Different canvas/part GANs can be trained at different resolutions, thus lowering the data (quality) requirements.

๐Ÿ“Œ Setup demonstrated by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans.

๐Ÿ“Œ Tested on a custom dataset and evaluated results with quantitative metrics and user studies.

๐Ÿ‘‰ Find out more here https://bit.ly/3tjNJuPย 

๐Ÿ‘‰ Visit www.marvik.ai or reach out to [email protected] to learn more about our experience using GANs.


Weekend Getaway

A few days ago we had the chance to share some incredible moments during our team #getaway. We spent the whole weekend in a beautiful house, surrounded by nature and breathtaking landscapes ๐Ÿ‚ ๐ŸŒณ ๐ŸŒ….

There was room for everything. Playing board games near the fireplace ๐Ÿ”ฅ, spirited ping-pong competitions ๐Ÿ“ and improvised guitar jams and sing-alongs ๐ŸŽค ๐ŸŽธ. In addition to this, part of the team volunteered to cook and delighted us with a nice Uruguayan barbecue and mouth-watering arepas ๐Ÿ‡ป๐Ÿ‡ช.ย 

Even more rewarding was witnessing the presence of most of the Marvik team, both from Uruguay ๐Ÿ‡บ๐Ÿ‡พ and different parts of Argentina ๐Ÿ‡ฆ๐Ÿ‡ท. For some of them, it was their first time visiting ๐Ÿ‡บ๐Ÿ‡พ, and certainly the first time we met in person.

Our team keeps growing and growing, and this is just the beginning. ๐Ÿš€ Will you risk missing our next getaway?

Make sure that doesn’t happen. ๐Ÿ‘‰ Click here https://bit.ly/3yYYIh4 to see all our open positions, or drop us an email to [email protected] to find out more.

Microsoft IoT Insider Lab

๐Ÿ“ข Some great news for the #artificialintelligence community in Latin America ๐Ÿ“ข Microsoft has chosen #Uruguay ๐Ÿ‡บ๐Ÿ‡พ to host its new #AI & #IoT Insider Lab, the first of its kind in the region and only the third outside the US ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ’ก This is game-changing given the growing impact of AI & IoT in the way people, devices and data interact in all aspects of life. Moreover, it puts Uruguay on the path of becoming an “innovation hub” for the region, acting as a facilitator of #innovation and creativity to transform business realities.

๐Ÿš€ The labโ€™s mission is to show startups, corporations and organizations across industries how to leverage AI and IoT technologies to solve related challenges, while providing guidance and recommendations from experts so they can achieve their full potential.

โžก๏ธ The lab will offer:

๐Ÿ“Œ Experience-based knowledge from expe

rts: #electricalengineers, #cloudengineers, #datascientists, #programmanagers, #projectmanagers, and #softwareengineers.

๐Ÿ“Œ On-demand dedication from highly qualified #Microsoft collaborators.

๐Ÿ“Œ Project management, design, architecture, prototyping, and post-implementation customer and partner guidance.

๐Ÿ‘‰ More on this initiative here https://bit.ly/3NPyNgkย 

๐Ÿ‘‰ If youโ€™re curious about how Microsoftโ€™s AI & IoT Labs work, click here https://bit.ly/3NSyu46


New speech-to-speech translation model

Meta AI has recently released a new research paper on speech-to-speech translation (#S2ST) that does not rely on #textgeneration as an intermediate step ๐Ÿ’ก

This method enables faster inference and supports translation between unwritten languages (important since +40 %of the worldโ€™s languages are without text writing systems). Instead of the traditional approach (translating source speech into target speech spectrograms), they used discretized speech units obtained from the clustering of self-supervised speech representations.

๐ŸŸข Main achievements:ย 

๐Ÿ“Œ First of its kind trained on real-world open sourced audio data for multiple language pairs

๐Ÿ“Œ Outperforms previous direct S2ST systems in terms of #runtime , #FLOPS, and #maxmemoryย 

๐Ÿ“Œ Leverages pretraining with unlabeled speech data

๐Ÿ‘‰ Click here to learn more https://bit.ly/3HEetvS

DIET Transformer

In our latest blog post, our #mlengineer Diego Sellanes discusses #DIET, Rasaโ€™s latest transformer architecture, which works for entity recognition and intent classification. He goes over to explain how it works, its different modules, as well as its main advantages compared to similar models.

โ€œRASAโ€™s DIET transformer has a very powerful architecture. It proposes a new way of understanding state-of-the-art transformers, with a clever loss function which sums up every aspect of the model.โ€

๐Ÿ‘‰ Visit our blog for the full story https://bit.ly/3zZ1rqY

๐Ÿ‘‰ At Marvik, we have used Transformers to execute several #NLP projects. DM or reach out to [email protected] if you are curious about how you could apply them to enhance your #NLPmodels.

Example of response from the model


๐Ÿš€YOLOv6 is finally out ๐Ÿš€ย 

#YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance.

๐ŸŸข Main takeaways:

๐Ÿ“Œ Efficient Decoupled Head with SIoU Loss

๐Ÿ“Œ Hardware-friendly Design for Backbone/Neck

๐Ÿ“Œ Detection accuracy and inference speed far exceed that of previous #YOLOv5ย 

๐Ÿ“Œ Released under GNU General Public v3.0

๐Ÿ“ŒComing soon: + deployment options and quantization tools

๐Ÿ‘‰ Check out the repo here https://bit.ly/3AaQHpy


Parti Model

๐Ÿ“ข Google AI has recently launched the Pathways Autoregressive Text-to-Image model (#Parti), its second text-to-image generator model ๐Ÿ“ข

Parti uses an autoregressive model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.

๐ŸŸข Highlights:

๐Ÿ“ŒTreats text-to-image generation as a sequence-to-sequence modeling problem (akin to machine translation) โ†’ allows it to benefit from advances in large language models.

๐Ÿ“ŒShows consistent quality improvements by scaling its encoder-decoder up to 20B parameters.

๐Ÿ“ŒAchieves State-of-the-art zero-shot #FID score.ย 

๐Ÿ“ŒComplementary to #Imagen (its predecessor) in exploring two different families of generative models – autoregressive and diffusion โ†’ opens up exciting opportunities to combine both.ย 

Itโ€™s exciting to witness all these breakthroughs in text-to-image generation ๐Ÿš€

๐Ÿ‘‰Click here to learn more about Parti https://bit.ly/3I4lMxeย 

A teddy bear wearing a motorcycle helmet and cape is car surfing on a taxi cab in New York City
Get in touch with one of our specialists. Let's discover how can we help you.
Training, developing and delivering machine learning models into production