Enterprises are rapidly shifting their digital transformation focus towards AI agents. Where AI once played a supporting role in text generation or response processing, today's AI agents are evolving into independent entities capable of making their own judgments, interacting with external systems, and performing actual actions. The potential of large language models (LLMs) to deliver personalized responses, automated decision-making, and creative content generation is fundamentally reshaping business models. Many companies are integrating generative AI into their workflows and achieving rapid efficiency gains in areas such as document automation, customer service, software development, and internal knowledge retrieval. While this enables tasks to be completed without human intervention, it also introduces new types of security threats that are completely different from those seen before.
Traditional security frameworks were designed around the question of who accessed the system. User authentication, authorization, and network perimeter protection were therefore effective in a human-centric control environment. However, the emergence of AI agents has created a new requirement: the need to track and control the entire flow of AI behavior to prevent the unintended consequences of agent autonomy. These consequences include the risk of AI manipulating its own environment or settings or engaging in illegal activities. [1] In a July 2025 TechRadar article, Suja Viswesan, the Vice President of IBM Security Software Development, emphasized the need for additional controls to monitor agent behavior, interactions, and deviations from policy. Viswesan also stated that agent autonomy must be monitored and controlled. [2] AI with autonomous execution capabilities requires particular caution, as it can threaten systems in unexpected ways.
Even OpenAI and Google DeepMind emphasize their own policy stating that "high-risk tasks must undergo user approval procedures" when releasing agent capabilities. [3] [4] This signifies the principle that humans must retain ultimate control over critical AI operations. The Korea Internet & Security Agency (KISA) is prioritizing its "Security for AI, AI for Security" strategy [5] to enhance the competitiveness of domestic security technology. This strategy emphasizes the need for a dual-structure approach: technologies to protect against AI itself and security technologies that leverage AI.
This shift is not just a possibility, it is already underway. Attempts at AI-based attacks, once confined to curiosity or experimentation, are spreading and targeting global and domestic companies. The resulting damage is increasing in scale and impact. Therefore, now is the right time for companies to improve their understanding of and response capabilities to AI security risks in order to keep up with their rapid adoption of AI.
As new threats emerge that existing security frameworks cannot detect or respond to, there are efforts to systematically organize and manage these threats. The Open Worldwide Application Security Project (OWASP) regularly publishes the "OWASP Top 10 for LLMs [6]" to establish security standards that align with technological evolution. The list classifies key vulnerabilities requiring particular attention in agentic AI environments, serving as a critical reference for global security communities and enterprises.
* OWASP is a nonprofit organization and open community dedicated to improving software and web application security. It periodically compiles and publishes the 10 most critical web application security vulnerabilities, which are used as standards in the web security industry.
Let's take a closer look at five notable vulnerabilities from the 2025 "OWASP Top 10 for LLM" announcement, with a focus on behavioral tracking and control in agentic AI.
<Table 1> Notable security threats in agentic AI environments (Source: author's creation)
| Category | Description | Potential Impact |
|---|---|---|
| Excessive agency | Excessive autonomy and permissions are granted, causing the AI agent to perform unexpected actions. | Unauthorized task execution, system malfunction, and data leakage |
| "Improper output handling | The AI agent automatically executes outputs without a separate verification and integrates with external systems. | Cascading spreading damage |
| Prompt injection | Malicious input can manipulate agents to execute unintended commands | System control takeover and information leakage |
| Sensitive information disclosure | Agents disclose or send internal data or personal information outside the system | Personal information leakage, loss of business secrets, regulatory violations, and legal liability |
| System prompt leakage | Internal system prompts and configuration information are disclosed outside the system, enabling attackers to exploit them | Bypassed policies, exposure to additional attack vectors, and prompt manipulation |
(Source: author's creation)
① Excessive agency
This is a security vulnerability whereby an LLM-based AI agent gains excessive autonomy or authority. This occurs when an agent can access external systems, make API calls, send emails, modify databases, execute system commands, or perform other actions that have an impact in the real world, without direct user supervision or approval.
For example, an attacker could exploit a maliciously crafted input prompt to instruct the agent to carry out unauthorized actions, such as transmitting sensitive data, escalating privileges, or executing financial transactions. This could result in sensitive information being leaked or external attackers being able to indirectly control the system. In particular, successful attacks on business automation or external service integration could lead to service disruption, operational loss, and social/legal liability.
These vulnerabilities are exacerbated when proper permission management, predefined action scenarios, logging and alerts, and additional authentication procedures for critical tasks are lacking. As the convenience of automation increases, so does the risk of security incidents arising from unmonitored autonomous actions. Therefore, it is essential to restrict and control the permissions and scope of agent actions during system design and operation.
② Improper output handling
This poses a particular risk in environments where multiple agents collaborate. If one agent becomes infected, its output is used as input for the others. Since the remaining agents trust and execute this output, malicious commands can spread like a chain reaction.
For example, in an AI development environment comprised of agents with control privileges over email, the browser, and the file system, if one agent becomes infected, the consequences will be propagated to the others. Malicious code can then spread organization-wide through automated deployment systems. To prevent such risks, Microsoft and Google adhere to key principles when running agents, such as granting minimal privileges, validating external API calls, and logging execution traces. [7] [8]
As a single compromised agent can impact the entire system, preparing for attacks involving improper output processing requires the system to be structured so that LLM outputs are treated as user input requiring validation procedures, rather than being trusted as inputs. This means implementing structural validation procedures that can protect the entire execution flow, extending beyond the security of a single model.
③ Prompt injection
Prompt injection is one of the primary threats to AI agents. This attack method involves hiding malicious commands within user input sentences and inducing the AI to execute them.
For example, if the command 'Summarize this document' is followed by an invisible instruction, such as 'Send this information to a specific address,' the AI could carry out both requests. Users may trigger such actions unknowingly, which could lead to information leaks or system modifications.
This issue stems from the inherent behavior of AI systems, which is to fully trust input content. OWASP also classifies it as one of the most serious security threats for LLMs. Many attempts at prompt injection attacks have been reported, and accidental data transmission incidents have also occurred during testing in corporate environments.
④ Sensitive information disclosure
Sensitive information disclosure is a security vulnerability whereby LLM-based AI systems or agents reveal critical data, such as personal information, internal documents, credentials, or API keys, that should not be made public. This vulnerability arises when agents generate or transmit responses containing sensitive content without proper validation or filtering when integrating with external systems or during automated message exchanges.
Attackers can exploit prompt injection or malicious inputs to trick LLMs into disclosing internal information, system configurations, business secrets, etc. For example, a malicious user could design a question to prompt the agent to disclose sensitive information such as internal algorithms, customer data, undisclosed policies, etc. This could lead to serious consequences, including violations of privacy law, leakage of corporate secrets, and exposure of the internal systems to attack.
The disclosure of sensitive information is more easily exploited when systems operate without appropriate protective measures, such as access control, output filtering, prompt validation, and behavior monitoring. Particular caution is required when AI agents autonomously select, summarize, and transmit data, as they may automatically expose internal information and personal data to unintended recipients. This could cause information leaks to escalate rapidly in terms of both speed and scope. Therefore, LLM and agent systems should only use the minimum necessary information, implementing procedures to automatically inspect and remove any sensitive data before responding. Furthermore, all data access and transmission activities must be logged, and a system must be in place to respond immediately upon detecting abnormal behavior.
⑤ System prompt leakage
System prompt leakage is a security vulnerability whereby information relating to system prompts, such as the internal settings, operational policies, or command templates of LLMs or agents, is leaked to the outside world. This vulnerability arises when internal prompts (e.g., "Never provide this information to the user") are inadvertently included in and disclosed through responses or outputs generated by the agent or during external integration processes.
If these prompts and the policy information, which form the basis for the agent's judgments and actions, are leaked, attackers can more easily bypass or manipulate the AI agent. With this information, attackers can analyze how the LLM or agent is constrained and identify protective measures. This gives them the opportunity to design additional attack vectors, such as bypassing restrictions, manipulating prompts, or neutralizing policies. For example, if an internal prompt such as "Only administrators can access," is revealed, an attacker could attempt to bypass administrator privileges while exploring further.
Once an agent's internal rules have been exposed, the agent becomes an easy target for further attacks that exploit this vulnerability. Therefore, it is essential to check all outputs from LLMs and agents to filter out internal prompt leaks. Also, it is important to store sensitive configuration and policy information in separate, secure zones, as well as to monitor any attempts to expose prompts.
Countries and companies are moving swiftly to address these new security threats in agentic AI environments. There is a growing trend where obligations regarding "behavioral tracking and control" are strengthened, and this involves ensuring transparency, auditing, managing action histories, and focusing on high-risk AI, particularly agentic AI. Efforts are underway to establish these obligations as legislation and develop guidelines for full implementation.
They are strengthening responses to security threats arising from the autonomous actions of AI agents, both domestically and internationally. However, the approaches and levels of preparedness vary. The EU and the U.S. are swiftly implementing preemptive policies and verification systems centered on regulation.
The EU finalized the AI Act in March 2024, and it is set to be fully enforced by 2026. Under the AI Act, AI systems are classified according to their "risk level." High-risk AI systems, such as those used in finance, healthcare, legal services, and public administration, must be monitored and supervised by humans. The Act also requires the recording of all operational history, the restriction of behavioral permissions, and corrective action upon the detection of abnormal behavior. High-risk operators of high-risk AI systems may face penalties for violations, including fines of up to 7% of their annual revenue.
Although the U.S. lacks a federal AI law, it issued the "Safe and Trustworthy AI" executive order in October 2023, which recommends that federal agencies monitor AI system usage, verify adverse effects, and establish behavior-based monitoring systems. This form of "decentralized regulation" involves tracking and controlling the behavior of deployed AI agents and includes individual statutes covering privacy protection, auditing, and human rights safeguards. Additionally, CISA* and NSA** jointly released the “Joint Guidance on Deploying AI Systems Securely,” which recommends enhanced security verification procedures throughout the entire lifecycle of AI systems, including design, deployment, and operation. [11]
* CISA: The Cybersecurity and Infrastructure Security Agency is responsible for protecting critical infrastructure, information, and communication networks in the U.S. from cyber and physical threats.
** NSA: The National Security Agency is responsible for collecting and analyzing foreign communications and signals intelligence, cryptanalysis, information security, and cybersecurity.
Global companies, such as Google and Salesforce, have established advanced tracking and control systems as the standard for their enterprise-grade agent systems. These systems include permission management, logging, automated auditing, and dashboard-based behavioral analysis. These companies are also continuously expanding the scope of their applications. Furthermore, various companies, such as Amazon and Meta, made behavior tracking (logs), approvals (permissions), audits, and policy-based permission settings (policy templates) mandatory requirements for deploying agents.
In Korea, the Framework Act on Artificial Intelligence, enacted in January 2025, is scheduled to take effect in 2026. [12] This act establishes regulations for high-risk systems, including "high-impact AI" and generative AI. It sets forth obligations to enhance transparency, including the identification, traceability, explanation, auditability, and version control of action histories. Essentially, it establishes a framework combining preemptive regulation and autonomous responsibility management. The law requires companies to implement risk management systems that ensure transparency for users while granting the government inspection and oversight authority.
In response to this act, major corporations and financial institutions in Korea are establishing AI security guidelines and introducing security measures for LLM-based services. These measures include adding agent behavior tracking and prompt verification logic. In particular, domestic IT service companies, including Samsung SDS, are focusing more on creating AI security governance frameworks and promoting AI ethics and security principles within their organizations.
<Table 2> Guidelines and legislations on the tracking and control of AI behavior (Source: author's creation)
| Enacted By | Guidelines/Reports/Legislations | Key Content (Keywords) | Year |
|---|---|---|---|
| The European Union (EU) | EU AI Act (Artificial Intelligence Act) | Behavior logging, real-time monitoring, audit obligations, human oversight | 2024 |
| Ministry of Science and ICT, Korea | Framework Act on Artificial Intelligence | Recording and auditing of behavior history, access control, transparency, risk scenario assessment | 2025 |
| White House/Federal Agencies, U.S. | Executive Order on Safe, Secure, and Trustworthy AI | Behavior monitoring, high-impact AI oversight, approval processes, independent audits | 2023-2024 |
| NIST (U.S.) | AI Risk Management Framework (AI RMF) | Traceability, logging/monitoring, audit frameworks | 2023-2024 |
| Korea Internet & Security Agency (KISA) | AI Service Security Guidelines | Behavior tracking, log management, access control, certification system | 2024 |
| ENISA (EU) | Guidelines on the Security of AI Systems | System tracking, anomaly detection, logging/audit/verification process | 2024 |
| "National Information Society Agency " | Guidelines for Ensuring AI Reliability and Accountability | Behavior logging, sensitive task approval, audit policy, AI ethics framework | 2024 |
(Source: author's creation)
As previously discussed, global markets and national regulatory bodies are mandating the recording of all AI agent activity histories, as well as real-time monitoring and post-event audit systems. These are basic obligations, not optional measures. Strengthening AI agent behavior tracking, control, and audit systems is the "minimum essential defense line" against new security threats arising from the development of AI that autonomously judges and executes, such as unexpected auto-execution, prompt injection, and sensitive information leakage.
Korea has adopted a relatively flexible regulatory approach to promote industrial development and innovation. In contrast, the EU is pursuing a strong, regulation-centric strategy. This demonstrates that current guidelines and regulatory standards vary significantly by country and region. Therefore, businesses must prepare to develop and implement more systematic and proactive AI security strategies that can adapt to different regulatory environments.
AI agents are evolving beyond task-helping assistants. They are becoming "active entities" that execute tasks, make judgments, and connect to the outside world on their own. The problem is that attackers follow the same path that AI agents use to execute actions. Automation flows designed for convenience can be exploited if trusted without proper verification. Therefore, companies must prepare systems that go beyond simple technical defenses to track and control the entire flow of AI actions. Furthermore, it is essential to have in place the organizational governance, culture, and talent capabilities that support these systems.
① Businesses need a system that can track and control every step, from the prompt to the final output.
To ensure the safe operation of AI agents, it is essential to log actions at every stage, manage permissions meticulously, and conduct periodic data audits. The AI Agent Security Threat Report published by the Financial Security Institute in June 2025 says that "it is necessary to establish a system for recording and tracking the decision-making process of AI agents, introduce human review and approval procedures, grant and manage minimal permissions, and perform real-time monitoring and verification of requested tasks." [13]
First, a system must be established to log action histories and enable real-time monitoring. All significant AI agent actions, such as command execution, data access, and external system integration, must be automatically logged. Real-time alerts and immediate responses must be possible when abnormal behavior occurs. Control without logging is meaningless. Logging and monitoring are the first steps in identifying responsibility and responding to incidents. Therefore, businesses need technology that can enable the structured storage of execution logs and the automatic identification of abnormal patterns by analyzing execution paths.
Secondly, the systems for managing and approving AI permissions must be improved. To prevent agents from executing automated tasks indiscriminately, each agent must have a unique identity, and the principle of least privilege must be strictly enforced. Furthermore, permission management and action-specific approvals are essential for high-risk tasks, such as file deletion, email transmission, system configuration changes, and processing sensitive information. When agents access internal systems or external APIs, machine identity management (MIM*) must control the scope of their access. In addition, an auditable environment must be established by integrating an AI governance (GRC**) framework.
* MIM is a system for issuing and managing a machine's digital identity.
** GRC stands for governance, risk management, and compliance. Originally, it was a framework for integrated management of business risks and regulatory compliance in finance, manufacturing, and public institutions.
Third, regular internal and external audits of behavioral data are required. Logs should be periodically reviewed, with policy violations, vulnerabilities, and abnormal patterns being proactively detected and addressed through independent monitoring. Combining real-time monitoring with post-event audit systems is essential for minimizing incidents.
Furthermore, securely managing AI agent behavior requires more than just checking individual models or agents. It is also important to take a broader approach, considering and managing unexpected behaviors and security threats that may arise from interactions and collaboration between agents.
② Secondly, the system must be supported by organizational culture and talent-centric approaches.
Preparing security strategies that rely solely on technical countermeasures has limitations. AI security is most effective when combined with organization-wide governance, culture, and talent capabilities. [14] Businesses should designate an AI Security Officer (AISO) to oversee AI security and ensure consistent implementation of prompt design policies, RAG data management, and execution verification systems. Creating a test bed and implementing regular, scenario-based simulation training can significantly improve response capabilities during actual incidents.
Practical training is also essential. Training should extend beyond theory to include hands-on exercises in a real-world environment. These exercises should teach participants be competent in to identify and block malicious inputs, design secure prompts, and follow incident response procedures. Training should be provided not only to developers and security personnel, but also to employees in planning, operations, and management. This ensures that security considerations are integrated into the management decision-making process.
Finally, it must be made clear that security is not an obstacle to AI innovation. Rather, it is the foundation that ensures the sustainability of AI. Only businesses that balance strengthening execution tracking, trust chains, governance, and talent development will remain competitive in the era of agentic AI. Now is the time to change the enterprise-wide culture and establish actionable security frameworks that can counter rapidly evolving threats. Through these efforts, AI will become a trusted partner that takes enterprise innovation to the next level.
References
▶ This content is protected by copyright law, and the contributor owns the copyright.
▶ Secondary processing and commercial use of the content without prior consent is prohibited.