Technologyfeatured

Building Bridges with Machine Translation: Memory Chips Help Erase Language Barriers

By September 27, 2022 No Comments

Languages are one of the most common means of communication, of which the media and carriers are not limited to voices and texts. Saved as data in devices such as computers, they can be equally useful for information exchange and communication after processes like copying, transmission, and translation. According to surveys, over 7,100 languages are still in use around the world1. Language barriers have undeniably caused numerous challenges for people in the era of globalization, while translation is now rising to the occasion, building bridges among all countries and cultures.

“A World without Barriers”, the theme of the 2022 International Translation Day (September 30), has long been our hopeful vision for the future world. For years, technologies like AI (Artificial Intelligence) have been supporting our reach for the wildest dreams. With the rise of AI translation, a world without language barriers seems closer than ever before.

 

The Rising Demand for Translation

Under the wave of globalization, different countries are now extremely connected in various fields such as economy, trades, and culture. As effective communication and mutual understanding are fundamental for international collaborations, breaking language barriers to accurately reach consensuses has become an urgent issue. As a matter of fact, a study shows that above other culture factors such as value differences or stereotypes, speaking different languages is still the primary challenge for effective cross-culture communication2.  As a result, the global language market is constantly expanding, expected to reach 57.7 billion US Dollars this year3.

While English, widely used and learnt around the world, has established itself as a world language, about 40% of the world’s languages are on the verge of extinction. As the language diversity is indispensable for the diversity of culture, to prevent the loss of minority cultures, we need to hear the voices of endangered languages. Translation has thus become our key access to reach these cultures.

So far, translation services can be roughly categorized into two types:  human translation and machine translation. In the pursuit of “faithfulness, expressiveness and elegance”, human translation can usually offer higher-quality outcomes. However, with closer global communication comes the exponential growth of translation workload, making it difficult for human translation to fully take on the challenge. Within the massive data to be translated, there are also lots of redundant, miscellaneous information, for which the usage of human translation can be quite wasteful due to the high labor cost.  In addition, human translators’ own interpretation and style might have an impact on the translation outcome. Therefore, since its appearance in mid-20th century, machine translation has gradually become a common tool for our daily translation needs.

Figure 1. Global Machine Translation Market Size

 

Development of Machine Translation

The development of machine translation can be divided into three stages: Rule-based Machine Translation (RBMT), Statistical Machine Translation (SMT), and Neural Machine Translation (NMT).

Figure 2. Development of Machine Translation

 

Among them, neural machine translation adopts an end-to-end encoder-decoder structure, without applying preset translation rules, splitting sentences, or translating word-by-word. It directly decodes the source text, globally processing the input and output content. Decades of neural networks development has lay down the foundation for neural machine translation to achieve extremely rapid growth.

Figure 3. NMT Optimizes the Translation Process

 

From 2015 to 2016, Baidu and Google successively released their own self-developed online NMT systems with the most advanced training technology at that time, kickstarting the era of NMT. Since then, many other major Internet and ICT companies have also started venturing into the NMT arena, each integrating its own system with corporate visions or products/services.

For example, the “No Language Left Behind (NLLB)” project by Meta aims to help billions of people around the world translate over 200 languages with high quality. In July this year, the company announced a plan to build an open-source AI model for “NLLB”, which includes 50+ billion parameters, and is trained by an AI supercomputer, estimated to perform over 25 billion translations per day4. On the other hand, NVIDIA’s own Maxine Software Development Kit (SDK) is designed to provide better real-time communication experiences. It offers high-quality real-time audio translation through the AI-driven SDK. Other than that, with its augmented reality (AR) SDK, other interactive features such as face tracking and eye contact can be introduced to bring smoother and more intuitive communication to video-calls5.

 

Memory Upgrades for the Diversity & Prosperity of All Languages

The evolution of machine translation puts forward both higher requirements and bigger motivation for the development of computer technology. Since the rise of SMT, the construction and continuous expansion for text corpus has brought up another challenge for data storage.

With the development of ICT, the amount of data generated by everyday communications has become immeasurable: according to IBM’s statistics, 2.5 exabytes of data are generated every day6. And very likely, this amount will be further increased with the progress of civilization, as well as the diversification and complication of technology applications. To ensure the effective cross-culture communication and protect culture diversity, information and communication technologies should provide powerful timely supports. Performances of semiconductor chips, such as data transmission speed and reading/writing speed are becoming essential factors for considerations.

Figure 4. World’s first 238-layer 512Gb TLC 4D NAND developed by SK hynix

 

In the era of AI and big data, fast data access has become a primary technological requirement to maximize the use of massive data corpus for more effective machine translation. The development of flash memory technologies therefore plays an important role. This August, SK hynix announced the successful development of the world’s first 238-layer 512Gb TLC 4D NAND, which is expected to begin mass production in the first half of 2023. The data-transfer speed of the 238-layer product is 2.4Gb per second, a 50% increase from the previous generation. This 4D product have a smaller cell area per unit compared with 3D, leading to higher production efficiency. It will be provided for high-capacity SSDs for servers in the future, potentially helping the operation of massive text corpora.

Meanwhile for NMT, the application of deep learning is indispensable. As a highly complex form of machine learning, deep learning aims to enable machines to have learning and analytical capabilities like humans do. Running the data to be learnt on computing devices, it eventually comes up with an optimal result through billions of neural network operations and adjustments.

To continuously optimize the learning outcomes and generate results that better fit the context, researchers need to increase the amount of training data for a language model in the partition for each language pair. Factors such as the complexity of the language model and learning algorithm used, as well as their error tolerance will further determine the total amount of data required for deep learning7. The quantity and quality of data will ultimately affect the outcome of its use on the algorithm model and are therefore crucial to the output quality of NMT.

 

Figure 5. SK hynix DDR5 DRAM CXLTM Memory

Figure 6. SK hynix HBM3 DRAM

 

Therefore, NMT needs the support of large-scale floating-point operations to improve model performances such as inference speed, making the improved computation power an essential technological support8. To address processing demands like this, SK hynix recently developed the HBM3 DRAM and the first CXL memory, which provide more advantages and possibilities for AI and deep learning with innovative, groundbreaking technologies. The expandable DDR5 DRAM-based CXL memory allows for flexible memory expansion compared to current server market, where the memory capacity and performance are fixed once the server platform is adopted. With a total bandwidth of 360-480 GB/s, and a total capacity of 1.15 TB, the CXL memory product highly appeals to a variety of fields that require high-performance computing. Moreover, HBM3 of SK hynix, which can process up to 819GB per second and adopts a 16-channel architecture that runs at 6.4Gbps, is now under mass production, supporting the significant improvement of computing performances. With the upgraded computing power, more languages can be converted into data, achieving better accuracy in translation and communication, helping create a future world without language barrier through high-performance technologies.

The constant breakthrough in technology is pushing our imagination of the future beyond past frameworks and limitations. What was once “impossible” gradually becomes the reality today. With globalization connecting countries and communities around the world, the developments of different societies are no longer isolated by geographic or language barriers. Every leap in the evolution of technology is helping create a world where our vision for the greater common good for all mankind can be eventually achieved.

While embracing challenges, constantly innovating, and striving to offer better products and technologies, SK hynix keeps a close eye on the global community and hopes to make this world a better place with its expertise. Under the grandiose narrative of the world, tiny semiconductor chips hidden in devices are sparing no efforts to create infinite possibilities for our future society.

 

 

1https://blog.busuu.com/most-spoken-languages-in-the-world/

2https://www.researchgate.net/figure/Barriers-for-effective-cross-cultural-communication-and-interaction-Source-made-by-the_fig2_338242321

3https://www.researchgate.net/figure/Barriers-for-effective-cross-cultural-communication-and-interaction-Source-made-by-the_fig2_338242321

4https://ai.facebook.com/research/no-language-left-behind/

5https://developer.nvidia.com/maxine

6https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=788642c860ba

7https://postindustria.com/how-much-data-is-required-for-machine-learning/

8https://www.sciencedirect.com/science/article/pii/S2666651020300024