SUPERTONE

The Future of Content Creation:
GPUaaS Accelerates the AI Voice Revolution

SUPERTONE is a startup that develops artificial intelligence (AI) voice synthesis technology and services. Founded in March 2020 by Professor Kyogu Lee of Seoul National University’s Music and Audio Research Group, the company was established together with CTO Heo Hoon-a former Samsung Electronics engineer-and four other co-founders.

Supertone offers creative voice experiences through a range of AI technologies, including voice synthesis, voice conversion, and real-time voice transformation. Building on these capabilities, Supertone is opening up new possibilities in content creation across music, film, gaming, and animation.

Industry: Entertainment

Since 2020, AI technology has made remarkable advances in the fields of speech recognition and synthesis, reaching a stage where it can precisely replicate and creatively expand upon the human voice. Amid this wave of innovation, AI voice company Supertone is not only dramatically accelerating its technological advancement by adopting Samsung Cloud Platform (SCP) GPUaaS, but is also leading a new era, the “AI Voice Renaissance”, where technology and art converge. Leveraging a high-performance GPU infrastructure, Supertone has dramatically accelerated its service development cycle, enhancing the quality and reliability of AI voice synthesis technology while also enabling rapid time-to-market for its services.

Accelerating Product Launch While Enhancing User Experience

  • With Samsung Cloud Platform (SCP) GPUaaS infrastructure equipped with the latest H100 GPUs, Supertone has established a highly reliable MLOps platform, dramatically accelerating both AI model training and the voice synthesis development cycle.
  • Leveraging SCP’s multi-node GPU cluster, Supertone has overcome sentence length limitations and achieved more natural speech expression, enhancing the overall user experience.

Reliable Multi-Node GPUaaS: The Key to Rapid Time-to-Market

The pace of advancement in AI voice technology is astonishingly fast, but critical challenges should be addressed along the way. In particular, securing robust computing infrastructure is a decisive factor for the growth of AI startups as it enables them to train larger models with more data and operate them reliably.

Supertone was no exception. In its early days, the company purchased and hosted its own GPU servers for research purposes to develop AI models. However, as time went on, it faced several limitations, including aging hardware, high maintenance costs, a lack of personnel specialized in infrastructure management, and data loss caused by server failures.

To address these challenges, Supertone adopted SCP’s multi-node GPUaaS to secure both scalability and reliability of its infrastructure. This shift has allowed the company to focus exclusively on AI model development without the burden of infrastructure management

“Our on-premises GPU servers experienced increasingly frequent hardware failures over time, leading to more server downtime and lengthy recovery periods. In contrast, with SCP GPUaaS, even during a rare GPU downtime in over six months of use, the issue was resolved perfectly in less than a day. Hardware issues of GPU equipment can occur anywhere. Whenever a problem arose, SCP demonstrated a significant difference by responding quickly and proactively. This prompt response and problem-solving capability were the most notable improvements in system reliability and availability that we experienced after adopting the GPUaaS.”
– Ilji Choi, MLOps Engineer, MLE Team, Supertone

3x Faster TTS Training with High-Performance GPU Computing Power

Since adopting a product-centric growth strategy in 2023, Supertone has placed great importance on TTS-based products.

In the past, in an on-premises environment, training the NANSY model, a foundational model for various voice synthesis tasks, took two months. This was followed by six weeks for TTS training based on that foundation, resulting in a total of three and a half months to train the entire model. However, after adopting the SCP infrastructure, training time under the same settings was reduced by a factor of three, and further optimization of the model architecture and training methods enabled the TTS model to be trained in just four days.

Thanks to this groundbreaking improvement in speed, Supertone was able to update model checkpoints much more frequently and rapidly enhance the product quality. Previously, the company could only evaluate the model’s performance after the training was fully completed. Now, however, Supertone can assess and adjust the AI model’s performance at intermediate stages, allowing the company to manage release schedules more strategically.

supertone_01

Image of a scale-shaped image with the left side (on-premise system) going up and the right side (SCP infrastructure) going down

On-Premises System

Training Record Loss Due to Unreliable System

Long Training period

SCP infrastructure

Stable Long-Term Model Training with High Reliability

Short Training period

AI Voice Emotional Expressiveness Amplified by Multi-Node GPU Clusters

The core of AI voice technology lies in its ability to express emotions naturally like a human, understand context, and smoothly deliver sentences of various languages and lengths.

To achieve these goals, Supertone has actively leveraged multi-node GPU clusters. Previously, AI voice models trained in traditional on-premises environments were limited to processing sentences of up to 200 characters at a time. Now, that limit has been extended to 300 characters, and depending on the configuration, the models can handle even longer and more complex sentences with ease. This advancement has addressed the length limitation issue that was particularly inconvenient for English-speaking users, contributing to the expansion of the service. As a result, Supertone has grown into a beloved global service, now used in over 150 countries.

supertone_02

Image with description of the shape of sound waves spreading out

AI Voice Emotional Expressiveness Enhancement

  • Natural-Sounding Speech - Enable AI to handle longer sentences more effectively
  • Increased Immersion in Stroytelling - Help AI produce more creative results
  • Enhanced Context understanding - Support AI to understand deeper context
  • Advanced Emotional Expressiveness - Deliver richer emotions to users

“The advancement of AI voice technology is not merely a technical breakthrough. It is about providing creators with an environment where they can more easily express their creative intentions. Through Samsung Cloud Platform GPUaaS, we are now unlocking new possibilities in the field of AI voice technology.” – Heo Hoon CTO of Supertone

Technical Support as a Competitive Edge: A Trusted Partnership for AI Companies

AI infrastructure should be more than just providing high-performance computing resources. When reliable technical support is in place to quickly respond to and resolve unexpected issues, companies can operate their services reliably.

Supertone was deeply impressed with the technical support provided while using SCP GPUaaS. When unexpected errors occurred, Samsung SDS demonstrated persistent analysis and professional technical support, enabling us to resolve the issues quickly. Through this experience, we realized the value of having a true technical support partner, rather than just a service provider.

supertone_03

An image of a rising shape with a speech bubble and an arrow combined, each description is inside a speech bubble.

A Trusted Partner for Reliable Operations

  • Quick Issue Resolution - Prompt response to issues
  • Professional Expertise - Technical capabilities and expertise of SCP
  • Responsible Support - Trusted and accountable technical support

“The GPUs on Samsung Cloud Platform (SCP) are like a furnace: incredibly robust, unbreakable, and always a presence you can rely on.” – Heo Hoon, CTO of Supertone

Supertone is …

Supertone is developing audio watermarking technology that inserts identifiable messages into audio files to prevent the misuse of AI technology and enhance trust. This will help protect against unauthorized use of voices by others without the owner's consent. In addition, we plan to further advance our AI voice technology and develop new services leveraging large language models (LLMs) and multimodal AI technologies. Achieving these goals will require even more powerful GPU infrastructure, and SCP GPUaaS will be a crucial partner in reliably realizing these ambitions.

As AI voice technology becomes more sophisticated and natural, it will remove the boundaries of content creation. Look forward to the future of AI voice technology envisioned by Supertone.

Like