Virtual Sentinels - Spectrum of AI Agents in Virtual Environments

Advancements in Artificial Intelligence (AI) and Virtual Environments (VE) have given rise to a fascinating realm known as intelligent virtual agents. This area showcases the convergence of AI and AL (Artificial Life) with VEs, giving rise to a flurry of research activity focused on autonomous agent behavior in virtual realms. While multi-agent systems concentrate on distributed problem-solving applications, the spotlight is on autonomous agents that operate independently in changing environments, utilizing real-time data to make decisions.

In this article I delve into the world of intelligent virtual agents and explore their applications. I will dive into the value of autonomy in virtual environments, the challenges in creating realistic physical agents with non-verbal communication capabilities, and the coupling of agents with their virtual worlds. I will uncover the cognitive side of virtual agents, where AI-driven cognitive behavior and emotional models are intertwined to create intelligent and relatable entities.

As we navigate the intersection of AI and VEs, we will unravel how the perception and action of virtual agents influence their interaction with the digital world. From handling complex physical behaviors to enriching non-verbal communication, the development of intelligent agents in virtual environments presents a new frontier for AI research and implementation.

Autonomous Agents

This is the area of agents where the intersection of AI and AL with VEs is most evident. There has been a flurry of activity surrounding autonomous agent research over the years. In contrast to this, multi-agent systems focus more on distributed problem-solving applications such as network management for which VR and virtual environments are not needed, and instead concentrate on inter-agent communication and negotiation that are in many cases unnecessary in autonomous agents research.

This part of the series concentrates on the usage of VEs for investigating and validating the trustworthiness of agents. This includes simulated agents, virtual actors, virtual human beings and avatars (used as real-life representations of users) in three-dimensional multi-user web surroundings. The research overview explores VEs from the perspective of Artificial Intelligence (AI).

Autonomy

The value of autonomy has been thoroughly studied in terms of agents operating independently in a changing atmosphere. An immediate query appears: is autonomy appropriate for agents operating in virtual environments like in real life? In reality, the environment does not rely on the activities of an individual agent — one can only catch a glimpse of it and can easily be mistaken about what they do observe. Therefore, misapprehensions about the surroundings are unavoidable. Thus, permitting an agent to make its own choices allows it to take account of real-time situations rather than predicted outcomes.

In a virtual environment, the situation is very different. The designer has a 'god's-eye' view of the environment as well as the agent, and does not have to distinguish between them. Additionally, the agent has access to the entire virtual world — there is no need for the agent to separate its model from the actual virtual world. From a practical perspective, autonomy may seem unnecessary, and is only useful to investigate agenthood more scientifically.

The omniscient approach to virtual agents can be very inefficient. In order to create believable virtual agents that instill a sense of presence in the user, they must resemble real-world agents with their own limitations collecting information as it becomes available and carry out behavior like noticing, manipulating and avoiding objects. Managing these omniscient agents can be overwhelming due to the complexity of tracking all the details for each agent. To simplify this process, researchers have opted for an autonomous approach where each agent is equipped with virtual sensors and use them to drive interaction via virtual effectors. Consequently, most of the work concerning virtual agents utilizes this autonomous approach.

The potential for reusing agents in different VEs and distributing them over separate processors holds numerous implementation-level benefits. In spite of this, reuse and distribution still remain theoretical to a great extent due to the infancy of the area and the range of challenges tackled. Moreover, autonomy may be an essential requirement for reusable agents but is far from being sufficient —  significant work still needs to be done when it comes to agent, environment, and interaction representation.

The Spectrum of Agents

In order to make the vast amount of systems more manageable, we can visualize them on a spectrum of agency. At one side stands physical agents, which concentrate on exhibiting believable physical traits in a virtual environment. This involves simulation of movement and interaction with the setting for both humans and artificial creatures, as well as body language, movements and facial expressions. All of these things are accomplished by virtual sensors that operate in a non-symbolic way. 

The other end of this spectrum focuses on human cognitive behavior and cognitive interactions with the system's human users. Several topics here relate to natural language and cognitive processes like planning. It is often difficult to determine how far such agents can be said to have an autonomous perceptual apparatus when they sense symbolic information directly from the VE. Rather than seeing development of virtual agents as a choice between mutually exclusive categories, it is more sensible to describe them as spanning a spectrum. At one end, intelligent behavior necessitates physical interaction with the world, at the other, physical movements require cognitive control. In theory, it would be great if agents had both realism in their motion and human-like cognizance, yet, resolving all those issues to reach that level of sophistication presents many difficulties. Therefore, research groups usually concentrate on either end of the continuum and this is particularly evident when examining the complexities encompassed by emotion.

Emotion

We can observe that work on virtual agents has re-energized the study of motivation and emotion in them. This is due to the increased number of ways an embodied avatar can express emotion, from gaze and facial expressions to body language and more, which was not possible with disembodied predecessors that could only use verbal communication for expression. Avatars used in multi-user networks have encouraged the use of emotions even further, as they are essential components in many virtual agent applications.

Those working at the cognitive end of the agent spectrum emphasize emotion as a cognitive state, while those working at the more physical end emphasize emotion as a bodily state. A model of emotion that is internal rather than external in nature is what we mean by this. Cognitive modeling has the advantage of maintaining a clear link between an agent's emotional state and its external manifestation. A simpler — albeit rather crude — way to model emotion is to incorporate meters that are altered depending on interaction with the environment, other agents or human users. This type of emotion representation is manifested through interaction with the environment, rather than being cognitively modeled.

Physical Agents

In this section, we will look at virtual agents for which physical behavior is a critical factor and then we consider how non-verbal communication works with physical agents, followed by the implications of their interaction with their environment.

Physical Issues

In the first instance, body movement and mobility must be handled, which often raises important issues regarding body structure. Thus, the notion of embodiment may have equal importance for virtual agents as it does for real ones. When a sophisticated physical representation can be controlled, it can be used for non-verbal communication, such as gaze, facial expression, gesture and overall body language. Next, agents ought to be able to avoid bumping into other objects in their vicinity, regardless of whether they are stationary structures like trees, buildings or furniture, or in motion, such as other agents. Introducing other moving elements accentuates the need to address complex scenarios such as herding, flocking and group movement. Moreover, there are further types of interaction with the environment that require consideration such as manipulating objects or participating in physical activities with others.

Once a physical representation and set of physical behaviors are presented, various control questions arise. For instance, the extent to which control should be exercised must be determined, which can include controls at the level of individual muscles or behaviors such as walking and grasping — an issue familiar from robotics and teleoperation. Additionally, another query is how these behaviors can be combined and coordinated so that an agent can conduct multiple tasks simultaneously.

In comparison to robotics, virtual agents differ in the way that issues are focused on. It is much more expensive and difficult to create complex machinery with physical materials, motors and gears than it is to construct a virtual character. This provides a way for virtual characters to have more intricate body designs than robots. Furthermore, sensors used by these virtual characters will not possess the same limitations as their real-life counterparts, as they will perpetually know their position unlike robots who have difficulty localizing. On the other hand, in the actual world, there are a plethora of physical assets such as gravity and inertia at no cost, however this must be intentionally added into the virtual world by its creator.

Bodies in Motion

It was hand-crafted animation that inspired the development of physical agents. Hand-crafted animation is laborious and costly, so it seemed logical to apply computer power to the process. By automating intermediate frames between hand-drawn frames, less hand-drawn frames were needed — this is called procedural animation.

In the animation and avatar manipulation industries, performance animation is a widespread method. This approach involves tracking the physical movements of a real-world human wearing specialized markers on joints. These markers can be reflective, as seen by a camera, or work with other tools like magnets. The tracked movements are then applied to the digital agent’s body structure in order to produce motion. Its main benefit is its computational feasibility, however, it needs an accurately modeled agent body with correct joint relationships. Compared to sequencing techniques, this strategy is quite rigid —- nonetheless, it's possible to make use of both approaches by creating authorial segments through performance animation and then scripting them together.

Once an agent has a realistic body structure, providing autonomous control to drive its motion internally rather than externally animating the surface is an obvious next step. This concept, known as behavioral animation. It can be highly computationally intensive depending on the physical accuracy of control and structural complexity of the body.

Non-verbal Communication

The development of sophisticated body structure offers a more comprehensive way to communicate without words. This progress has largely been the result of employing avatars in shared graphical settings formerly known as MUDs (multi-user domains or dungeons). As such, there is a need for better ways to communicate beyond written language, and this requirement has sparked research into encouraging richer nonverbal communication through the use of glances, facial expressions, gestures, postures and overall body language. Non-verbal communication may be explored through facial expressions. This form of communication does not necessarily need a realistic face. Emoticons, which uses standard keyboard symbols to portray emotion in writing. For other applications, like generating speech, it has been necessary to build models of the human face that can control the mouth movements in order to make them appear as though they are speaking. Glance is a great asset when it comes to using facial expressions. In a system with virtual sensors, even simple ones, one can observe how glance is simply an outcome of the agent's own use of its sensory system. On the other hand, where the agents don't have a field of view decided by their sensory perception (for example when information is taken from the VE), glance must be more specifically used. Moreover, avatars run by users also require that those users directly glance at themselves.

At the opposite end of the complexity range, research into body language has been done with non-human participants. The humanoid model, when utilized as an avatar, has enabled researchers to examine a broader range of gestures and body language, including welcoming bowing and aggressive arm or hand signals. User trials with this system demonstrated both the potency of non-verbal communication and how difficult it is to manage behavior utilizing simplistic interfaces such as mouse clicks and menu selection. Ultimately, it is clear that the most realistic representation of agents or avatars in virtual worlds will come from the combination of various kinds of non-verbal communication. This system allows agents to carry out conversations using synchronized speech, intonation and body language such as lip and eyebrow movement, head and eye gaze, predefined handshapes and controlled arm motion.

The Agent-World Coupling

In this case of the interaction between object and VE already discussed above, we have not yet considered the issues that arise from physical agents perceiving and affecting the world physically. At an abstract level, perception and action define an agent's coupling to its virtual world, and we examine each separately.

The degree to which perception is replicated in a VE for different agents varies tremendously. Those versed in robotics will be well aware of how hard it is to do this properly in the real world, with all the ambiguity, noise and time-consuming processes. In virtual space though, it can be much simpler: a straightforward sensor might involve projecting a line from the agent's 'eyes', and if it intersects with an object, information about that object's identity and properties can be taken from its data structure.

In contrast to the mentioned approach, some work has explored a biologically plausible perceptual system for virtual agents. This involves projecting the agent's field of view onto a simulated retina and employing vision algorithms to process the pixels into a usable form. Additionally, there exist virtual robot systems in which infra-red and ultrasonic sensors are modeled with some degree of realism through adding noise into the simulated signal — existing somewhere between the two extremes.

No matter the biological accuracy of a given system of virtual sensors, it is essential to recognize that perception is a dynamic between an agent and its environment. Say, for instance, you'd expect agents in a dark space to see less than those in well-lit spaces. As with different aspects where the controller and the domain come together, this agent perception is a practical case which brings up the issue of knowledge location. The simpler the system for certain agent capabilities, more information must be sent to the agents from objects around them. On the contrary, having an accurately modeled fish visual system will need much less knowledge from its ambience, but a larger amount of processing inside the agent itself.

Perception poses questions about how much of a role an agent has in a specific world; this is true, yet to an even greater degree when it comes to action. Taking action has implications for the state of the world, and these outcomes are reliant both on the agent's abilities as well as its surroundings. For instance, if something's capacity to be lifted is to be tested, this must take into account the size and weight of the object — which in itself varies depending on whether we're looking at Earth's surface or deep underwater.

At the most basic level, a reliable grasp should appear visually accurate; the hand should not intersect with the object and the bond between them must seem reasonable. As the simulation progresses, complexity can be increased as components of physical forces, such as surface hardness are integrated into the interaction. Finally, a realistic grasp is achieved through animation which brings together simple graphical objects.

Cognitive Agents

The three key domains for creating agents of a cognitive nature in virtual environments include the need for an agent architecture capable of critical thinking, decision-making, learning and more. This is a common challenge in other agent-related contexts and serves as the bedrock of other related tasks. In this context however, the focus is not on this aspect but rather its relation to virtual environments.

Realism is essential in intelligent virtual environments if they are to be practical. This realism must manifest not just in their visualisation but in their behaviour as well. This makes it essential for cognitive function to be intertwined with affective influences such as motivation, emotion, and personality. As a result, there has been a great deal of research into affective agent architecture as well as models of emotions and motivations, dubbed 'computational theory of mind'.

The concern with expressing affective influences in intelligent virtual environments brings us to the third key area of representation or visualisation. In this case, we return to the question of how cognitive and affective models can be mapped onto physical models, which completes the circle. In this case, visualisation is important, but it can only be effective if the agent models provide adequate and appropriate information about the agents.

We examine how personality, which manifests itself in wide-ranging traits like curiosity and weariness for example, is associated with different aspects of locomotion conducted by an artificial being such as speed and foresight. We present a basic model through which personality can have an influence on future activities. Though the model is quite straightforward and this research only presents a starting point to delve deeper into it, it does prove that the higher psychological elements hold a connection with their physical manifestation at lower levels.

Conclusion

The world of intelligent virtual agents holds immense potential for reshaping various aspects of human-computer interaction. As we have explored, the integration of AI with virtual realms has given rise to autonomous agents that can operate independently, adapt to changing situations, and interact with their environments in a realistic and engaging manner. From physical agents with lifelike movements and non-verbal communication capabilities to cognitive agents with emotions and personality traits, the development of intelligent virtual entities is redefining the boundaries of technology and human experience.

There ar still challenges in creating autonomy in virtual environments, the complexities of representing emotions and motivations, and the interplay of perception and action. But each aspect has contributed to the development of immersive and realistic virtual experiences that captivate users and enrich their interactions with digital worlds.

As the field evolves, the potential applications of the agents are vast and far-reaching — from enhancing the shopping experience in e-commerce with virtual customer service agents to revolutionizing the entertainment industry with lifelike virtual actors.

Intelligent agents brought to life with human-like characteristics is ultimately the next phase of fusion of physical and digital worlds. As we move forward, the ongoing research and development in this field will come up with more and more sophisticated and realistic intelligent virtual agents as tools for us to interact with technology.

Title image credits: Mathew Lucas