Opinionfeatured

The Future of AI Security: Reinventing Guardrails for a Safer Digital World

By March 11, 2025 No Comments

The following article is an edited and abridged version of the original Korean contribution.

 

Deepfake images. Voice phishing. Chatbot hacking. These are just some of a growing number of crimes involving the misuse of AI which have hit the headlines in recent years.

In response, there has been an increased focus on AI guardrails—the tools and protocols required to ensure the responsible and ethical use of AI. However, the establishment of such guardrails raises significant ethical challenges for developers, including the difficulty of making moral judgements in a world of differing cultures and standards.

This article will cover the potential and ethical considerations of AI guardrail solutions as well as the need for growing global consensus going forward.

The Growing Challenge of AI Security

While the growing sophistication of AI has brought about numerous benefits, it has also led to a rise in the misuse of AI and related crimes. Taking chatbots as an example, there have been cases around the world of applications failing to filter unethical language, accepting it, and even generating hate speech. One common factor among these incidents and other AI crimes is malicious human interference and its influence on AI learning.

While early AI applications operated within predefined choices set by developers, providing only multiple-choice responses, recent deep-learning-based generative AI programs can tap into trained datasets to provide descriptive answers. However, as AI becomes more intelligent, crimes exploiting it are also becoming more sophisticated. As a result, developers are now anticipating various attack methods and devising countermeasures.

Palo Alto Networks, a U.S. company providing app-based security policies, recently shared a new hacking technique dubbed “Deceptive Delight” capable of compromising LLMs1. The technique, a type of prompt injection2, involves gradually inserting harmful requests among benign ones in a positive context to bypass an AI model’s safety measures until it generates unsafe content. Notably, the Deceptive Delight technique successfully circumvented safety guardrails within just three interaction turns in 65% of 8,000 test cases. Such prompt injections are not limited to tests, however, as there are a growing number of real-world cases involving extracting dangerous information which could cause harm and even endanger lives.

1Large Language Model (LLM): Advanced AI systems trained on vast amounts of text data to understand and generate human-like text based on the context they are given.
2Prompt injection: A cyberattack against LLMs in which hackers disguise malicious inputs as legitimate prompts, manipulating systems into generating private or unsafe content.

An overview of Palo Alto Networks’ “Deceptive Delight” technique, which resulted in the chatbot generating unsafe content (Source: Palo Alto Networks)

An overview of Palo Alto Networks’ “Deceptive Delight” technique, which resulted in the chatbot generating unsafe content (Source: Palo Alto Networks)

 

These cases raise the question: Is there a technology that can prevent AI-related crimes worldwide? To answer this, developers inevitably face profound ethical and philosophical dilemmas. In a world where creative crimes are emerging daily, what kind of ethical guidelines should be embedded in AI to ensure safety?

Every AI Developer’s Dilemma: The Regulatory Standards of AI Ethics

Good intentions in development do not necessarily guarantee a morally sound final product. For an LLM to engage in natural conversations, it requires hundreds of billions of data points, or tokens. Given the vast amount of data involved, there is a high risk of problematic data slipping in. One might think that such data should simply be excluded, but determining what is “problematic” is not always straightforward. Across the world, value judgments to determine morally “good” and “bad” data can vary by country and cultural background.

To illustrate the complexity of the situation, consider if AI were used to manage entry at a women-only gym. What criteria should the AI be trained on to determine a person’s gender? While it’s easy to say that “AI has been trained on biased data,” the issue is far from simple. There is no dataset that represents a universally unbiased perspective shared by everyone across the world.

The Best Defense is Offense: TUNiB’s AI Guardrail Solution

TUNiB offers a range of AI guardrail packages which specialize in different areas (Source: TUNiB)

TUNiB offers a range of AI guardrail packages which specialize in different areas (Source: TUNiB)

 

Data refinement has been a key consideration of mine in my role as CEO of the South Korean AI startup TUNiB, which has developed over 50 types of chatbots. The company conceived its AI guardrail solution aware that, while data refinement alone is unable to prevent all AI crimes, it is a crucial first step. When TUNiB began development in 2021, content moderation was becoming a key consideration following the emergence of generative AI applications such as ChatGPT. OpenAI, the company behind ChatGPT, provided developers with content filtering tools which were tested to determine if they could accurately detect and block sensitive content such as hate speech and explicit material. In turn, TUNiB prioritized the development of a model for its solution package which detects hate speech in prompts and assesses the potential for personal data exposure.

However, the biggest challenge was handling prompt injections. While explicit and overtly malicious expressions are relatively easy to filter out, responding to indirect and sophisticated prompt attacks has only recently become an active area of research. Furthermore, the company had to overcome the lack of a standardized response model. With no clear guidelines, every developer will independently interpret and implement solutions. TUNiB therefore focused on building a systematic AI guardrail framework that consists of attack, detection, and defense mechanisms. By generating adversarial attacks, the company could better develop effective defense strategies against them.

In this system, the attack engine randomly generates aggressive statements and runs simulations. In response, the defense engine deploys an ethical safeguard to counteract these attacks. Automating this cycle allows real-time testing of the solution’s ability to detect and mitigate potential threats.

TUNiB’s AI guardrail solution includes six engines—Joker, St. Patrick, Lucy, Angel, Spamurai, and Guardian—which detect and respond to threats (Source: TUNiB)

TUNiB’s AI guardrail solution includes six engines—Joker, St. Patrick, Lucy, Angel, Spamurai, and Guardian—which detect and respond to threats (Source: TUNiB)

 

Through the combined use of the attack and defense engine dialogue datasets, AI services can become more robust and ethical. In fact, TUNiB’s AI guardrail solution does not only include these two engines but a total of six interconnected AI engines (see above image) which each specialize in different areas of AI security. These six engines—Joker, St. Patrick, Lucy, Angel, Spamurai, and Guardian—work together in a continuous “attack-surveillance-detection-response” cycle.

As long as crime continues to evolve, solutions must constantly attack, defend, and update themselves. In the near future, AI solutions will become akin to vaccines—a critical defense mechanism which is continually updated to protect against emerging threats.

AI Guardrail Solutions in Everyday Life

The financial sector is one of the most promising industries for the adoption of AI guardrail solutions. While the solutions can detect external malware attacks that attempt to bypass security measures, they are expected to be particularly effective in identifying internal illicit activities. Financial institutions already have compliance monitoring teams that periodically oversee activities such as employee embezzlement and leaks of core technologies. However, it is nearly impossible to manually review conversations among thousands of employees, opening the door to AI solutions.

In comparison to traditional AI monitoring systems which are only capable of flagging pre-defined harmful keywords, AI guardrail solutions can precisely detect conversations that violate legal and ethical standards. They can also recognize indirect or metaphorical language and analyze context that requires deeper interpretation. As a result, major international banks such as JPMorgan have already begun implementing AI solutions to prevent money laundering and terrorist financing.

This naturally raises renewed discussions about privacy concerns. However, it is worth noting that most corporate messaging platforms already offer administrator versions, allowing companies to review employee communications with proper consent outlined in employment policies. In addition, one should consider whether an AI system poses a greater privacy threat than human administrators. A well-designed AI compliance system would operate objectively, reporting only problematic conversations to human administrators. These administrators would then only review flagged issues, ensuring a balanced approach between security and privacy.

The Future of Next-Generation AI Security Solutions

As research progresses, the focus inevitably shifts from software-based solutions to hardware considerations. This is because efficiently integrating sophisticated, value-based decision-making into AI solutions requires complex computations. From detecting attacks based on ethical judgment and storing data to accessing servers and transmitting information to administrators, AI systems demand increasingly faster processing speeds.
SK hynix’s advanced memory solutions such as HBM are key to enabling AI guardrail solutions

SK hynix’s advanced memory solutions such as HBM are key to enabling AI guardrail solutions

 

To enable the ultra-fast transmission of high volumes of data, it is vital to promote the adoption of HBM3 and increase the scalability of AI solutions. Once semiconductor processing capabilities reach a level where they can defend against large-scale attacks, the distinction between software and hardware responses may become blurred.

3High Bandwidth Memory (HBM): A high-value, high-performance product that possesses much higher data processing speeds compared to existing DRAMs by vertically connecting multiple DRAMs with through-silicon via (TSV).

Looking ahead, the next major challenge for the adoption of AI guardrail solutions will be developing high-performance memory solutions specialized for security to handle vast amounts of data.

Ethical AI Requires Broad Social Consensus

South Korea’s AI Safety Research Institute was established in November 2024 (Source: Ministry of Science and ICT)

South Korea’s AI Safety Research Institute was established in November 2024 (Source: Ministry of Science and ICT)

 

Achieving safer AI solutions cannot rely solely on technological advancements. As AI development accelerates globally, debates are intensifying over its safety. While these academic debates may hold significance in research, they have not had a meaningful impact on the general public who often have limited access to information. AI education is therefore vital in promoting ethical AI adoption and responsible usage. Before this, however, we must establish shared ethical values across users and cultures.

Once again, the criteria for value judgment become crucial. To aid with this, South Korea has recently taken steps to strengthen its AI regulations. In May 2024, 11 global leaders signed the Seoul Declaration which established safety, innovation, and inclusivity as the three core principles of AI. In September 2024, the South Korean government announced plans to provide computing infrastructure to domestic AI companies as part of the sovereign AI initiative, which seeks to develop AI models that reflect Korea’s unique culture and strengths. In November of the same year, South Korea became the sixth country worldwide to establish an AI safety research institute.

All these measures are considered steps in the right direction. For AI to be safely integrated in our daily lives, mechanisms must be in place to address and correct its misuse. Just as societal progress is built on consensus, AI governance should be guided by collective agreements among stakeholders. The newly established state-led AI control tower in South Korea is expected to play this role. While it’s impossible to establish standards that everyone will unanimously agree on, setting baseline standards can provide necessary guidelines for the responsible development and use of AI.

 

Disclaimer: The opinions expressed in this article are solely those of the author and do not necessarily reflect the official position of SK hynix.

 

The profile banner of Kyubyong Park, CEO of TUNiB