Operationalizing Intentionality in Primate Communication: Social and Ecological Considerations

An intentional transfer of information is central to human communication. When comparing nonhuman primate communication systems to language, a critical challenge is to determine whether a signal is used in intentional, goal-oriented ways. As it is not possible to directly observe psychological states in any species, comparative researchers have inferred intentionality via behavioral markers derived from studies on prelinguistic human children. Recent efforts to increase consistency in nonhuman primate communication studies undervalue the effect of possible sources of bias: some behavioral markers are not generalizable across certain signal types (gestures, vocalizations, and facial expressions), contexts, settings, and species. Despite laudable attempts to operationalize first-order intentionality across signal types, a true “multimodal” approach requires integration across their sensory components (visual-silent, contact, audible), as a signal from a certain type can comprise more than one sensory component. Here we discuss how the study of intentional communication in nonlinguistic systems is hampered by issues of reliability, validity, consistency, and generalizability. We then highlight future research avenues that may help to understand the use of goal-oriented communication by opting, whenever possible, for reliable, valid, and consistent behavioral markers, but also taking into account sampling biases and integrating detailed observations of intraspecific communicative interactions.

Introduction in philosophy to refer to the mental phenomenon of "aboutness" (i.e., beliefs, desires, or goals) that cannot be embedded in nonmental constructs, such as words and sentences (Brentano, 1924). Later this approach was challenged by other philosophers who claimed to be able to study intentionality in actions and communication (Dennett, 1983;G r i c e ,1957;Searle & Willis, 1983).
Early definitions of intentionality involved complex meta-representations (representations of others' mental states) in which the signaler and the recipient take into account each other's state of mind and intentions so that the signal can be mutually understood (Grice, 1957). However, the cognitive prerequisites for meta-representations and mental-state attribution required in Gricean communication are valid only for linguistic communication and can be tested only in adult humans (Moore, 2016;T o w n s e n det al., 2017). Although this approach emerged from linguistics and was initially operationalized for adult human behavior, the study of intentional communication in nonhuman animals built on this complex and specific scenario. A framework that allowed researchers to decompose the complexity inherent to Gricean communication in a graded system became popular among those in the field of animal communication (Dennett, 1983). Dennett's framework established three levels of intentionality, from signaling without any mentality involved (zero-order intentionality, such as a grimace produced after tasting something bitter; Masi et al., 2013); passing through an intermediate level in which the signaler intends to communicate in order to change the behavior of the recipient (first-order intentionality, such as the open mouth facial expression to convey the playful purpose of the interaction; Waller et al., 2015); to a level in which intentions to signal are combined with mental state attributions (second-order intentionality; such as hiding one's own play face expression with hands to deceive the recipient; Tanner & Byrne, 1993).
Studies exploring the intentional use of signals in primate communication are mainly focused on first-order intentionality (see Graham et al., 2020 for a review). Distinguishing intentionally used from emotionally driven and reflexive signals was a popular approach used to identify instances in which cognitive processes would drive the use of more flexible and complex signals (e.g., Tomasello, 2008). In primate communication, this distinction led to a dichotomy associated with the communicative modalities in which signals were typically divided into intentional gestures and involuntary vocalizations and facial expressions (Arbib et al., 2008;Call & Tomasello, 2007). However, accumulating evidence suggests that vocal and facial signals can also be used in goal-directed ways (e.g., Schel et al., 2013;Waller et al., 2015) and tailored to the attentional or knowledge state of the recipient (Crockford et al., 2017;Walleret al., 2015). Moreover, voluntary control and affective processes are not mutually exclusive (Graham et al., 2020;L i e b a l& Oña, 2018). For instance, affective states may complement higher cognitive processes usually related to intentional use of signals and influence the intensity of those signals (Liebal et al., 2014). The variability in the approaches and definitions has prevented systematic progress in the study of intentionality, and researchers need to come up with suitable operationalizations that would allow them to measure intentionality in a valid way across signal types and across species and contexts.

Inferring Intentionality in Nonhuman Primates
Deriving Criteria Applied to Humans to Other Primates When observing animal behavior, it is difficult to determine whether an individual intends for a signal to attain a particular behavioral outcome in the recipient, let alone for the recipient to derive a particular meaning from the message contained in the signal. Detecting intentionality in young children raises similar problems, because they cannot yet communicate through linguistic means about their mental states. To overcome this problem, developmental psychologists identified behavioral markers that would indicate intentionality in prelinguistic children gestures (Bates, 1979;B a t e s et al., 1975).
The onset of intentional communication in children has been reported to take place around 9 mo (Bates, 1979;Lock,2004). By this time, infants change their behavior in several ways that suggests that they have the underlying cognitive development required for the intentional use of signals. One of the commonly used criteria for intentionality in infant behavior is the change of the gaze pattern that represents the transition of perlocutionary to illocutionary acts. According to Bates et al. (1975), perlocutionary acts are infant behaviors that may have an effect on another person but have no social-communicative intention (e.g., an infant crying while simultaneously gazing at a box that they cannot open, which elicits a reaction: an adult approaching and opening it). In contrast, illocutionary acts comprise conventional signals that are intentionally used to carry out some socially recognized function (e.g., an infant alternates their gaze between the box and a person while crying with the communicative intention of seeking help from that person to open the box).
In subsequent studies, further behavioral markers were identified as indicators of intentionality in preverbal communication. Bates and colleagues (1979) defined intentional communication as "signaling behavior in which the sender is a priori aware of the effect that a signal will have on his listener, and she persists in that behavior until the effect is obtained or failure is clearly indicated" (p. 36). Together with gaze alternation, these researchers identified other behavioral markers such as persistence and elaboration in communicative behavior (augmentations, additions, and substitutions of signals) until the goal has been obtained and adjustment of signals in which the forms are abbreviated or exaggerated patterns that are appropriate only for achieving the communicative goal (Bates, 1979;Ba te set al., 1975).

Behavioral Markers
Several decades ago, primatologists started to adapt the behavioral markers originally developed for the intentionality work on prelinguistic children for comparative research on primate gestures (e.g., Leavens, 2004;Leavenset al., 2005;Tomasello et al., 1994). These behavioral markers included attention-getting behaviors, sensitivity to recipient's attentional state, audience checking, elaboration, flexibility, gaze alternation, persistence, response waiting,an dsocial use (Table I).
Social use implies that a given signal is directed at a specific recipient. This criterion was initially assessed through the presence/absence of an audience, in which a signal is produced only when a recipient is present (Leavens, 2004). Social use has also been assessed according to the presence and behavior of specific individuals (audience composition; e.g., Schel et al., 2013;To wn se ndet al., 2017).
The behavior immediately before or during communicative events may also give us a clue about the intent of the signaler. Gazing at another individual can precede and accompany communicative signals and is often referred to in the literature as audience checking (e.g., Hobaiter & Byrne, 2011aSchel et al., 2013). Gaze alternation occurs when the signaler looks back and forth between the recipient or a distant object or location To ma se ll oet al., 1994).
Closely linked to social use and audience checking is the behavioral marker sensitivity to the recipient's attentional state. This criterion refers to the ability of the signaler to adjust the communicative behaviors to whether or not the recipient is visually attending: when the recipient is visually oriented toward the signaler, the signaler employs visual behaviors; when the recipient is not visually attending, the signaler may use other behaviors (audible or tactile) to attract the attention of the recipient (attention-getters such as "clapping,"" banging,"" ground slap,"" poke at"; Botting & Bastian, 2019;Tomaselloet al., 1994), or may move to a place from which the recipient can see the signal (Cartmill & Byrne, 2007;Liebalet al., 2004;Tomasello et al., 1994).
When the response does not match or only partly matches the signaler's goal, the signaler may produce further signals until the goal is met. For this reason, waiting for a response from the recipient is another intentionality criterion that is usually considered together with persistence and elaboration: the signaler may repeat the same signal (persistence), or modify the type of the signal (elaboration) if the apparent goal has not been met (Cartmill & Byrne, 2007;Le av en set al., 2005).
Finally, some researchers also take flexibility as evidence for intentional communication (Liebal et al., 2014). Flexibility within the intentionality debate refers to meanends dissociation of signals and contexts, with the same signal being employed for multiple goals/contexts, or several signals employed in the same goal/context (Bruner, 1981). In primate communication, this feature is often explored in gestures, argued to Gaze alternation Signaler looks back and forth between the recipient and a third entity (e.g., food item)

Persistence
Repetition of an initial signal in cases of previously failed communicative attempts Response waiting Pausing behavior and/or holding current position after signal production Social use Sensitivity to the presence and composition of an audience indicate voluntary control, and contrasts with the "one signal, one function" approach typically found in vocalization research (Genty et al., 2009). By using the conservative behavioral markers described in the preceding text, researchers were able to operationalize and empirically assess "intentionality" as a cognitive phenomenon that is not directly accessible in primate communication. However, researchers may underestimate intentional communication in other animals because the criteria might be very obvious to humans and thus too conservative when we try to transpose them to other species. Furthermore, important features of the communicative interaction could be missed, depending on how many criteria each communicative instance must meet. For instance, within gesture research, large subsets of the data can be discarded when gestures are required to meet a rigid set of criteria.

Current Limitations in the Study of Intentionality
In the following, we address four major issues that currently limit our ability to further understand intentional communication in other animals. We focus on empirical examples from nonhuman primate research, but the same problems may also apply to other nonhuman species or human children. These four main problems are reliability (reproducibility of the results under the same conditions), validity (explanations by lowerlevel cognitive processes), consistency (diversity of evidence and definitions), and generalizability (usability across signal types, contexts, settings, and species). Explicitly acknowledging these issues will facilitate intra-and interfield consensus in the detection of intentional signal use in primates.

Reliability: Observation Conditions and Observer Biases
The reliability of an intentionality criterion relates to whether or not the results can be reproduced under the same conditions. How observers assess the intentional nature of communicative signals may depend on several factors, such as previous training, the information available about the study species and subjects, and observational conditions that may impact the certainty of their coding. Unlike vocalizations, which can be more objectively classified via spectrograms, the intentional nature of communicative signals and in particular gesture analysis may be profoundly affected by the researchers' subjective perception. To minimize observer biases in gesture research, it is standard practice to collect video footage, code those recordings according to a tailored coding scheme, and have a second coder assess a portion of the coded material to assess interobserver reliability. We argue that several behavioral markers currently used cannot be objectively measured by individual observers across all conditions. For instance, eye gaze is difficult to track in some primate species due to their dark sclera, especially in their natural habitats when light levels are low, or visibility conditions are poor (cf. Perea-García et al., 2019). Moreover, the presence of an individual may often be perceived through other sensory channels than vision. Therefore, field conditions frequently do not permit to reliably determine the presence and composition of an audience, since researchers are ignorant about the signaler's knowledge of the presence of other individuals based on olfactory cues or auditory stimuli (produced unintentionally such as the movement of dry leaves in the surroundings). Along the same lines, the intended audience might be difficult to determine, especially for long-range vocalizations and some audible gestures like "drumming" (e.g., Babiszewska et al., 2015;Notman & Rendall, 2005). For behavioral markers that involve audible information, such as certain attention-getters, a similar problem arises related to the uncertainty of the recipient of that signal. For instance, the attention-getter might be directed to several individuals at once or another individual than the one that engages in the subsequent interaction (e.g., social play; Tomasello et al., 1994).
The reliability of behavioral markers can also vary between research settings. Given that the current criteria for intentional communication often rely on visual attention (Fröhlich & Hobaiter, 2018;Liebal et al., 2014), higher visibility in captive settings may result in the detection of many more intentionally used communicative acts than in natural environments. In the wild, visibility can be limited in some environments such as dense and obscuring vegetation or when animals are in the canopy (Fröhlich, Lee, et al., 2019;K n o xet al., 2019). In contrast, intentional vocalizations or other audible signals may be often missed in captivity due to glass barriers of enclosures and ambient noise.

Validity: Lower-Level Explanations
Validity refers to whether or not the results represent what they are supposed to measure. A considerable proportion of situations in which behavioral markers are applied may be readily explained by lower-level processes (i.e., processes that occur in an automated manner and usually involve low levels of consciousness, associative learning) and less cognitively demanding communicative interactions (e.g., Graham et al., 2020;Liebalet al., 2014). For instance, a simpler explanation for gaze alternation might be that the recipient and the third entity represent two competing targets of interest for the signaler. In addition, although sensitivity to the attentional state (producing visual signals only when signalers are attending) seems to indicate visual perspective-taking, this ability may be better explained by learning to discriminate stimuli associated with particular contexts (Graham et al., 2020;Townsend et al., 2017). For example, the individual may learn that the recipient's face needs to be present as a prerequisite for visual-silent signals. Although considered unlikely, the use of attention-getters could still be explained by a complex series of learned discriminations, in which the individual may learn that a use of an audible or contact signal followed by visual-silent signals are suitable when recipients are not visual orientated toward the signaler (Liebal et al., 2014).
Emotional arousal constitutes another lower-level explanation for intentionality markers such as social use, persistence,a n dflexibility. The presence of particular individuals in the audience is likely to influence the level of arousal of the signaler, and the signals might be involuntarily produced as consequence of the signaler's arousal state. It is also possible that heightened arousal leads to the repeated production of signals in a short period of time until the goal is met and the underlying emotion changes (Townsend et al., 2017). Although elaboration of signals occurs less frequently than repetitions, it provides a more resilient marker since the change of the signal type seems to require more voluntary control and is less likely to be exclusively emotionally driven (e.g., in chimpanzees, repetitions of a "pant-grunt" from a subordinate individual could be explained by high arousal levels, but if the signaler "presents the genitals" to the presumed recipient after the "pant-grunt," this may indicate a higher level of cognitive control (Liebal et al., 2014;Si ev er set al., 2017).
Flexibility in signal use is also open to lower-level explanations: the use of the same signal across different contexts and the use of different signals in the same context may be more parsimoniously explained by similar levels of arousal in those situations (e.g., chimpanzees may use "pant-hoots" during displays or on arrival at a food source along with distinct signals during these high-arousal contexts; Fedurek et al., 2014). Meansends dissociation is useful to detect flexibility in the use of gestures, but is restricted in its suitability to detect intentionality, since situations in which means-ends dissociation is not confirmed may include intentional communication (e.g., signals that are used only in one context; Cartmill & Byrne, 2010;Liebalet al., 2014). In sum, most of the commonly used behavioral markers in studies of intentional communication in primates are limited in their ability to provide information about the underlying cognitive complexity.

Consistency: Comparability Across Studies
Issues of consistency refer to variability in detecting intentional communication between and within signal types (gestures, vocalizations, and facial expressions), particularly with regard to the number and types of behavioral criteria (Graham et al., 2020;Townsend et al., 2017). This variability makes comparisons of the underlying cognitive mechanisms inherent to intentionality across signal types very difficult.
Consistency Across Signal Types Some of the variability in applying behavioral criteria for intentional communication between signal types might be linked to different research foci. For example, striking evidence for social use has been gathered by studying the effects of audience composition on vocal signal use (e.g., the rate and the structure of the call may vary in relation to the composition of the audience; Slocombe et al., 2010;Slocombe & Zuberbuhler, 2007), which would theoretically also be possible for other signal types. Gestural studies are more focused on the dyadic level, and thus on the effect on one particular recipient, while little or no information about other individuals present is collected (Genty et al., 2015).
However, the most striking difference between signal types relies on the use of intentionality criteria as a precondition to identify the signals. For instance, intentional use is part of the definition of gestures in great apes, but not of vocalizations or facial expressions (Graham et al., 2020;Townsendet al., 2017). Gestures are usually defined as mechanically ineffective movements of extremities, the head, the body, or postures (the signaler does not act physically to alter behavioral outcomes in the receiver, achieving their ends indirectly; Smith, 1977), but are classified as gestures only if these movements meet the criteria for intentional use. For instance, ineffective movements that would be physically suitable to be classified as gestures, but that do not meet intentionality criteria, are not considered gestures (Fröhlich & Hobaiter, 2018). By contrast, vocal research might classify vocalizations regardless of whether they show any evidence of being goal-directed (Liebal et al., 2014). Therefore, it should not be surprising that gestures show more indications of being intentionally produced than other signal types.
Consistency Within Signal Types Even within the same signal type, the way that behavioral criteria of intentional use have been applied is not unambiguous. Whereas in species that are more distantly related to humans than great apes, such as Afro-Eurasian monkeys, gestural researchers may term some physical actions as gestures before checking for intentional use (Canteloup et al., 2015;D e s h p a n d eet al., 2018), gestures in great apes usually refer only to intentional communicative acts, or if intentionality criteria have not been tested, researchers may use a different term (e.g., "potential gestures";Cartmill&Byrne,2010). Furthermore, even studies that involve a cautious analysis to check if gestures meet the intentionality criteria adopt different approaches during this selection process, affecting the subset of signals included in the study. Some researchers check the intentionality criteria for all gestures and report it as part of their results, but use all behaviors in their subsequent analyses (e.g., Fröhlich et al., 2016); others analyze intentionality for all gestures recorded, and if intentionality criteria are met in some occurrences, all instances of that signal type are included in the article (e.g., Roberts et al., 2014); others check intentionality criteria and only select the instances that meet the intentionality criteria for analysis (e.g., Hobaiter & Byrne, 2011).
Sources of variability within signal types also include the number and the type of criteria used to detect intentional communication (Graham et al., 2020;L i e b a let al., 2014). Typically, this variability is more pronounced in gestural research, since efforts to check and adapt behavioral markers for primate intentional communication have been made only recently for vocalizations and facial expressions (e.g., Schel et al., 2013;Walleret al., 2015). Within the same signal type some studies require only one out of a set of criteria to be present (e.g., Hobaiter & Byrne 2011), other studies require several criteria to be present (e.g., Halina et al., 2013;Prieuret al., 2017), and still other studies are not clear about the criteria used (e.g., Roberts et al., 2014;Tomasello et al., 1994).
Studies are also inconsistent with regard to the study of intentional use in individual signals versus a sequence of signals of the same or a different type. A signal might not be considered intentional when studied in isolation but might meet the intentionality criteria when analyzed in conjunction with other types of signals in the same bout or sequence. For example, "soft hoos" were reported to meet only some intentionality criteria in chimpanzee vocal communication when these calls were combined in bouts with another type of alarm calls (Schel et al., 2013). 20Variation can be found even within the use of the same criterion. For instance, response waiting might be defined similarly by different researchers (the signaler pauses at the end of the signal while maintaining visual contact) but the duration of the pause may vary between researchers (e.g., Hobaiter & Byrne, 2011 >1 s;Fröhlich et al.20 19≥ 2 s). So far, the duration of this pause seems to be arbitrarily chosen by researchers, rather than based on biological evidence that is likely to vary between species. While the markers persistence and elaboration seem to be used consistently across studies at a first glance, some researchers use the term persistence to refer to the production of any further signal (without discriminating if these are of the same or different type) after response waiting and in the absence of a response considered as satisfactory (e.g., Graham et al., 2017;Hobaiter & Byrne, 2011). Other research discriminate between persistence and elaboration in terms of the quantity of signals produced after response waiting (persistence as the use of an additional signal, and elaboration as the use of multiple signals; Leavens et al., 2005). Since the same terms may refer to different conditions, it is imperative that researchers define the behavioral markers used in their studies consistently.

Generalizability: Transversality of Behavioral Markers
We refer to generalizability as the usability of behavioral markers for intentional signal use in a variety of circumstances. Variability in the use of behavioral markers between different signal types arises mainly from the fact that some of the criteria apply only to a specific sensory modality and are hence unhelpful to study the communicative content and intent of the message conveyed in other sensory channels. The usefulness of some behavioral markers may be specific to certain social and physical environments and may poorly describe the intentional use of a signal in some contexts and settings and also be applicable only in some species. We conducted a systematic search for research focused on intentional communication in primates from 1985 to 2020 using Web of Science. We used the search terms ("primate*" OR "great ape*")A N D"intentional* communication." Although there are many other primate taxa, we specified "great apes" because research conducted on great apes may use this term as keyword instead of the term "primate." We complemented our list with the studies reported in Graham et al. (2020)andinLieba let al. (2014) that were not found in our initial search. Both authors evaluated the scope and relevance of the papers. We considered only empirical research, and articles needed to be peer reviewed and published in English to be included in our final list of studies. We retained a total of 85 studies for which we examined the taxa, design, and setting (Electronic Supplementary Material).
Generalizability Across Signal Types Traditionally, research on great ape communication focused on signal types (gestures, vocalizations, and facial expressions) in isolation using different approaches and criteria to study their intentional use (e.g., Hobaiter & Byrne, 2011;Schel et al., 2013;W a l l e ret al., 2015). Over the last decade, the use of paradigms including signal combinations has increased (Fröhlich et al., 2021;Fröhlich & van Schaik, 2018;Genty et al., 2014;Ho ba it eret al., 2017;Slocombe et al., 2011;Wilke et al., 2017). However, including signal combinations might not be sufficient for a true "multimodal" approach, as a signal may comprise more than one sensory component (visual-silent, contact, audible;Fröhlich & Hobaiter, 2018).
Since the criteria used to infer first-order intentionality in primate communication emerged from the study of prelinguistic children, they typically refer to the visual attention of the interactants-such as visual monitoring of the recipient by the signaler during response waiting (Bates, 1979;Bates et al., 1975). Several behavioral markers, such as the sensitivity to the attentional state of recipient and the use of attentiongetters to manipulate their attentional state are applicable only to visual signals and therefore not relevant to vocal production (Schel et al., 2013). Nevertheless, many primate signals are perceived through sensory channels other than vision (olfaction, hearing, and touch), and even gestures that are usually investigated in the visual space might have salient audible or contact components (Hobaiter et al., 2017;P a r t a n& Marler, 1999). Therefore, it may be more difficult to identify the intentional use of signals that are not reliant on a visual component to be effectively perceived, and the current behavioral criteria might not be suitable for such signals (Fröhlich & Hobaiter, 2018). Visual signals can be very salient too, and even audible and tactile signals might be missed when the recipient is distracted. However, when an audible or tactile component is present, the signaler may not have the same need to ensure that the signal is perceived by gazing at the recipient before and/or during (as required for audience checking and for the sensitivity to the attentional state criteria) or after (as may occur during response waiting) signaling. Yet these behavioral markers could be applicable to all types of signals and used regardless of the availability of information given by other sensory channels. For example, the recipient can perceive the signaler's intention of displacement through a contact gesture such as "push" without checking the recipient'sb e h a v i o r( audience-checking), and therefore without the need to check or manipulate through attention-getting behaviors the attentional state of the recipient.
Similarly, vocalizations or audible gestures are mostly independent of the recipient's line of sight or gaze direction. For example, individuals may perceive a "stomp" gesture without being visually attentive. With many of these signals having a long-range function, an additional complication arises with the definition of the audience, as some of these signals may be produced targeting a group response, a nonspecific recipient or even an out-of-sight recipient (e.g., pant-hoots, drumming; Babiszewska et al., 2015;Fedurek et al., 2014). In these cases, it is difficult to check the social use and all the criteria involving the identification of the recipient (Liebal et al., 2014). Therefore, most intentionality criteria should not be used when the audience is not present (Fig. 1). The current behavioral markers are specifically tailored to the dyadic level and are thus unhelpful to look at intentional signal use in polyadic interactions. However, studies focused on vocalizations rely on the presence/ composition of the audience (social use) and markers addressing a certain recipient through attention-getting behaviors (Fig. 2). Behavioral markers such as persistence, elaboration,o rflexibility are rarely used to detect intentionality in these studies (Fig. 2), although they could theoretically be suitable (Fig. 1).
Generalizability Across Social Contexts Some behavioral markers are applicable only to specific social contexts. Using gaze alternation to infer intentional communication, for example, is meaningful only when a third entity is involved (individual, object, or location). For example, both bonobos and chimpanzees produce gaze alternations to request food from a human communicative partner and do so more frequently when the partner is visually attentive (Lucca et al., 2018). However, triadic communication has been rarely observed in naturalistic interactions among conspecifics, as the majority of interactions do not involve a third entity Tomonagaet al., 2004). For this reason, we should not use this criterion to infer intentionality for the majority of communicative interactions. Other behavioral markers such as response waiting, persistence,a n de laboration are also restricted to relatively specific situations. For instance, if a signal is immediately followed by an "appropriate" change of behavior in the recipient (i.e., an outcome matching the signaler'sp r e s u m e dg o a l ;s e eC a r t m i l l& , there is simply no need for the signaler to wait for a response, to persist or elaborate (Liebal et al., 2014; Fig. 1).
Finally, the social context of the signaler might impact the sensory channels used and add some variation in the use of intentional markers within and between species. With decreasing physical distance and increasing familiarity and social tolerance, the communication of intentions may be much subtler or even imperceptible to the observer (Fröhlich, Wittig, & Pika, 2019). "Communicative effort" such as detectable use of behavioral markers, e.g., persistence via elaboration etc., may be more likely to be observed beyond the mother-infant pair when interaction outcomes are less predictable. For instance, in mother-infant dyads where social tolerance is high, the infant might mainly use mechanically effective actions to directly achieve its goals (e.g., taking the food) rather than produce communicative signals to indirectly achieve its goals (e.g. begging for food; Fröhlich et al., 2020). Proximity between communicative partners seems to be an important variable to include in studies regarding the intentional use of signals, avoiding erroneous generalizations.

Generalizability Across Research Settings and Designs
Although research on primate vocalizations began in the wild, their underlying intentionality was not a major focus initially (e.g., Brown & Waser, 1988;Cheney & Seyfarth, 1985;L i e b e r m a n ,1968). The use of intentional markers was adapted from studies of early communication in prelinguistic children and initially operationalized for the gestural communication of captive great apes (Liebal et al., 2006;Pika et al., 2003;To ma se ll oet al., 1994). The bias of using visual markers in studies on human children was reinforced by better observing conditions in captive environments compared to the wild, such as excellent visibility (Bard et al., 2014). Researchers may have better opportunities for observing behavioral markers related to visual attention in captivity, because in these settings, individuals usually experience more opportunities for the direct lines-of-sight needed for visual communication. For example, audience checking might be suitable and deployed by a species in captive settings without visual barriers, but not in wild settings with dense vegetation (Fröhlich, Lee, et al., 2019; Fig. 1). However, studies occurring in naturalistic settings often strongly rely on audience checking (Table II). In contrast, other markers such as the use of attention-getters are useful in environments with physical constraints, such as captive settings (Table II), especially in interactions between primates and their caretakers (Hopkins & Leavens, 2007;. However, when individuals are not physically constrained in their movements, they may prefer to move to the front of the recipient for communication instead of using attention-getters, which is likely also true for intraspecific interactions  (Liebal et al., 2004). Attention-getters have been rarely used to detect intentional communication in the wild (Table II). Related to the specificities of the setting are the circumstances underlying research designs. Most experimental studies take place in captive settings that involve a physical barrier, a third entity, and a human recipient (although experimental studies may also occur in the wild where these specific circumstances do not apply; e.g. Crockford et al., 2015). These experimental studies in captivity mainly explore intentionality criteria such as gaze alternation and attentional-getters (e.g., Leavens et al., 1996;Poss et al., 2006; Table II). However, some studies conducted in observational settings use these behavioral markers to detect intentional signal use (Table II). Although there are some benefits of assessing behavioral markers in more controlled settings, where researchers can obtain impressive results for markers (e.g., gaze alternation, persistence, elaboration in food begging experiments; Cartmill & Byrne, 2007;L e a v e n set al., 2005), we may miss information about intentional signal use in certain naturalistic contexts (e.g., joint travel, consortship, sexual solicitation). We should thus assume that experimental paradigms differ from observational studies in explanatory power and limitations, depending on the specific behavioral criteria applied.
Generalizability Across (Primate) Species Early work on intentionality in primate signals focused mainly on the gestural communication of great apes, particularly chimpanzees (e.g., Plooij, 1978;Tomaselloet al., 1985). Although a cross-species approach that includes different types of primates, and other less dexterous nonprimate species, is important to map communicative abilities across the animal kingdom (Ben Mocha & Burkart, 2021), some characteristics may intrinsically hamper cross-species comparisons. Some specificities of a species (including socioecology) might have resulted in the widespread use of certain behavioral criteria by researchers, but these criteria are not necessarily applicable to other species. For example, most evidence for communicative behavior in arboreal species such as orang-utans is gathered while they are high in the canopy (Fröhlich, Lee, et al., 2019;Knoxet al., 2019). Dense vegetation hampers the direct lines-of-sight needed for visual communication so arboreal species may rely more on salient acoustical components than on visual ones (e.g., loud scratch in mother-infant orang-utan dyads, Fröhlich, Lee, et al., 2019). Yet, behavioral markers suitable for visual signals are still used for arboreal species such as orangutans and small apes (Table II). Apart from criteria related to visual attention other criteria may not be appropriate for some species. For example, since bonobos and orang-utans apparently respond much more quickly to gestures than chimpanzees (Fröhlich et al., 2016;Knox et al., 2019), response waiting may not be an appropriate behavioral marker for those species, or at least the temporal aspect of the definition needs to be adjusted. Similarly, persistence in communicative instances during which a social goal has not been met may be common in chimpanzees (e.g., Leavens et al., 2005;Robertset al., 2013), but bonobos, gorillas, and orang-utans apparently do not commonly repeat the same gesture after a failed communication attempt (Fröhlich et al., 2016;Genty & Byrne, 2010;Tempelmann & Liebal, 2012). Yet, persistence is still used to infer intentionality in these species (Table II) and is also likely that persistence is not used as a strategy for dealing with failed communication attempts in more distant taxa. It is currently unclear whether this different behavior is due to higher responsiveness in these other species or  whether it is a consequence of species-specific "communication styles" (Fröhlich et al., 2016).

Outlook
Part 1: Opting for Markers That Are Reliable, Valid, and Consistent Across Studies To facilitate comparisons between empirical studies on primate intentional communication, we should try to decrease unnecessary sources of variation. A first step in doing so would be identify which behavioral criteria are unreliable. If a measure is linked to a considerable degree of subjectivity within or between researchers, we should not use it. Video and audio recordings allow the evaluation of not only which signals are more reliable (identified by different researchers under the same conditions) but also which behavioral markers we can rely on. For instance, if eye gaze is difficult to track in some primate species or under poor light conditions, it would be wiser to avoid the use of behavioral markers involving eye gaze, such as sensitivity to the attentional state of the recipient, audience checking,a n dgaze alternation, to detect intentionality in those species or conditions. Here, the use of video recordings could help to understand if the behavioral markers involving eye gaze are suitable to evaluate the intentionality of the communicative behaviors. Interobserver reliability for intentionality markers may have a profound impact in communication studies, as this may have consequences for which signal types are included in the final dataset. We do not argue that the assessment of gaze direction to detect intentionality should be avoided. If conditions permit, we can reliably assess gaze direction (Perea-García et al., 2019). However, because there might be some difficulties related to the research setting and visibility of particular situations, it may be wise not to rely on behavioral markers dependent on gaze direction for all cases of signal use. We should opt for more reliable markers according to the general features of the study (species, environment, etc.), adjusting and complementing the subset of potentially suitable behavioral markers with those that can be coded consistently in that particular study (i.e., with good interobserver reliability). There have been several recent attempts to tackle the questionable validity of the behavioral markers in the field of intentional communication in nonhuman animals, as well the inconsistency of their use across studies (Ben Mocha & Burkart, 2021; Graham et al., 2020 ;Townsend et al., 2017). A first step toward a consistent use of behavioral markers to detect intentional communication across studies is the use of similar definitions, even when species-or setting-related adaptations are required to suit the socioecological conditions. In situations in which it is not possible to use similar definitions, it is crucial that these criteria are properly defined because researchers might use the same label to refer to different things. Many researchers now make their definitions clear in papers and through platforms that support open science such as GitHub and OSF. This is a considerable improvement in animal communication research, as it is crucial to aim for transparency with regard to the definitions and methods employed.
Using a set of criteria instead of relying on a single criterion in isolation may help researchers to overcome the validity problem (Graham et al., 2020;Townsend et al., 2017). Lower-level cognitive processes such as associative learning and arousal might often be a more parsimonious explanation of a communicative behavior than intentionality. However, with convergent evidence from different criteria, a single high-level cognitive process, such as first-order intentionality, may offer a more plausible scenario than a myriad of low-level mechanisms (Byrne & Bates, 2006;Liebalet al., 2014). A recent framework was proposed to overcome this issue, operationalizing and systematizing intentionality in a way that is conservative enough to ensure the validity of the criteria but including some flexibility that will allow comparisons and facilitate consistency across species and signal types (Townsend et al., 2017). Townsend et al. (2017) proposed that a signal should meet three conditions to be classified as intentional: 1) the signaler acts with a certain goal; 2) the signaler produces voluntary, recipient-directed signals as a means to reach the represented goal; and 3) the signaler's communicative behavior changes the behavior of the recipient in ways conducive to realising the goal. According to this framework, to be classified as intentional, the signal needs to show at least one of the behavioral markers specified for each condition. To assess whether a signaler is acting in a goal-directed way, we may use behavioral markers such as persistence, elaboration and/or response waiting. The second condition identified by the authors referred to the social and voluntary use of the signals. The presence/ absence and composition of the audience, the use of attention-getters, audience checking, and the sensitivity to the attentional state of the audience allow us to test whether the signal is used socially. Finally, the third condition also relates to the social use of the signals, but the major focus is on the recipient's behavior, in which the communicative signaler's behavior elicits a change in the recipient'sbeha vi orth at is repeatable, consistent and in line with the intentions of the signaler.
Following the Townsend et al. (2017) framework, Ben Mocha and Burkart (2020) proposed 20 statistical operational criteria to distinguish between nonintentional and first-order intentional communication that would help testing whether the signaling meets the three broader criteria proposed by Townsend et al. (2017). The authors complement Townsend et al.'s framework by detailing the behavioral markers for each condition (with emphasis on voluntary control) and propose an additional condition (a preceding step) namely inferring the signaler's goal through the detection of statistical regularities. The study of the intended meaning is only possible through the analysis of the outcome of thousands of communicative interactions. Based on studies conducted in captivity (Cartmill & Byrne, 2010;Ge nt yet al., 2009), Hobaiter and proposed a systematic and holistic approach to study the meaning of gestures in the wild by looking at recipient's reactions and signaler's response behavior. The researchers focus on the outcome of the interaction, and when the recipient'sr e a c t i o n satisfied the signaler (an apparently satisfactory outcome [ASO]), the communication ceases (i.e., the signaler stops signaling) and the recipient's reaction represents a plausible desire in the signaler. The use of ASOs to infer the signaler's intention also reveals flexibility of the interactants behavior, a feature used to detect intentionality (Sievers et al., 2017).
Finally, Graham et al. (2020) suggested that one way of dealing with the validity problem would be to directly assess arousal during communication, and to focus experimentally on second-order intentionality. Accurately assessing arousal during communication would allow us to determine the degree to which high levels of arousal elicit certain signals (Graham et al., 2020). Although arousal has been a popular lower-level explanation for some behavioral markers, there is scarce evidence for the causal role of arousal in signal production, especially because current techniques to measure arousal during communicative interactions are not viable because they require expensive equipment and minimal movement of the interactants. To explore this venue, we might need to wait until technology advances and becomes more accessible to field researchers. Furthermore, by focusing on evidence for second-order intentionality, we can assume that a first-order degree of intentionality is met, thus avoiding the need to rely on criteria with debatable validity.

Part 2: Taking into Account Sampling Biases
The debate around methodological issues hampering the detection of intentionally used signals, and consequently the possible solutions, focus mainly on validity and consistency (Ben Mocha & Burkart, 2021;G r a h a met al., 2020; Townsend et al., 2017). Although it is crucial to limit unnecessary variation across studies, it is also imperative to allow some flexibility that considers differences in the subjects' biology, their social relationships, and living environment. Thus, it is important to consider critical factors, including research setting and social background, that influence primate communicative behavior, analogous to the STRANGE framework recently proposed by Webster and Rutz (2020), to avoid sampling biases. This framework may allow authors to evaluate their study animals for possible sources of bias, such as social background, rearing history, genetic makeup, and experience.
Signal types may be perceived very differently by the recipient and the researcher, hampering the use of the same behavioral markers. One way to account for variation is to use broader conditions in which only one of the behavioral markers for those conditions needs to be verified (Ben Mocha & Burkart, 2021;Townsend et al., 2017). A great part of this variation depends on the communicative channel used. Despite efforts to conceive a framework that would also work for auditory signals, these frameworks require an immediate audience and are not suitable for long-range signals, especially because it would be difficult to test whether the signal is recipient directed and if there is a response in the recipient's behavior according to the presumed goal). The only exceptions, recently highlighted by Ben Mocha and Burkart (2021), may be the voluntary use of long-range signals, as individuals may discernibly prepare for signaling (e.g. looking for materials such as buttresses, or positioning themselves in places that will enhance the propagation of the signal) and may also monitor audible responses (e.g., holding their positions waiting for long-range signals; Fig. 1). Auditory signals may have several recipients, or (a) recipient(s) not in visual contact with the signaler. Therefore, a first step is to define the possible audience and choose the appropriate set of behavioral criteria (Fig. 1). For instance, social use, audience checking, the sensitivity to the attentional state, gaze alternation, and the use of attention-getters would be difficult to assess for long-range communication. However, if there is a behavioral change in a potential recipient to a long-range signal that is also a long-range signal, we will at least be able to determine if it matched the signaler'sgoal. The signaler may be monitoring an audible response (the expected behavioral change) by remaining quiet or raising the head to listen (i.e., response waiting; Ben Mocha & Burkart, 2021); and if there is no response, the signaler may persist or elaborate (Fig.  1). The current challenge is to determine the possible audience for auditory signals and detect failed communicative attempts that are not followed by persistence or elaboration. This also applies in short-range and face-to-face communication. The audience of tactile signals also requires special adaptations as the recipient is in contact with the signaler and may not have the same need to check the audience and to adapt signals to the attentional state of the recipient.
Another aspect that researchers should take into account when assessing the intentional use of a certain signal is its ultimate function and whether or not the signaler requires any behavioral change in the recipient. If the communicative signal serves to inform the receiver of a third entity or event (e.g., declarative communication), the signaler may intentionally produce that signal but not check the response of the recipient, as no behavioral change is expected (e.g., "travel hoos" in chimpanzees).
In these situations, researchers should focus on the social and voluntary aspects inherent to intentional communication and may use the same behavioral markers before or during the communicative act (Fig. 1). Although the use of declarative signals by primates in the wild is still debatable, progress in detecting declarative signals and their possible meanings would allow us to assess if the goal of the signaler is met without relying exclusively on the response of the recipient.
Socioecological pressures are likely to shape communication short term (for several groups or settings) or long term (species). Testing for statistical differences in the behavioral patterns of species or groups will allow us to understand which behavioral markers are suitable to measure and if adaptation is needed (Ben Mocha & Burkart, 2021). For example, persistence seems to be not relevant for some species (e.g., Genty & Byrne, 2010;Tempelmann & Liebal, 2012), and the time that should be used to measure response waiting may vary between species.

Part 3: Detailed Observations of Intraspecific Communicative Interactions
The scarcity of experimental evidence for intentional communication likely stems from the logistical and ethical challenges of applying fully controlled approaches with larger and endangered wildlife. Although often criticized for not being able to disentangle causation from correlation, observational studies are often the only practical way to address questions related to naturally occurring, intraspecific communication as instances of decision-making. Cognition is both sequential and interactional, with each decision within an interaction affecting the probability space for the subsequent one, which may follow loose or more dominant rules (e.g., demonstrating distant related dependencies in a sequence; Mielke et al., 2018). Hence, behavioral observations of intraspecific interactions in naturalistic contexts yield many more data points than controlled laboratory experiments. By looking simultaneously at signaling, behavior, and social structure, the emerging patterns can inform our understanding of individual decision-making (including communicative acts) in naturalistic settings. With the advance of statistical techniques that control for complex confounding variables (e.g., ecological and social variables; Dingemanse & Dochtermann, 2013;Hertelet al., 2020) human control over the context becomes less critical than it used to be. For example, repeated observations of the same individuals in different dyadic constellations and social contexts can be analyzed with a "behavioral reaction norm approach" (Dingemanse et al., 2010): behavioral variability can be partitioned into intrinsic among-individual variation and reversible behavioral plasticity (i.e., environmental components), provided that the number of individuals and samples is sufficiently large (Hertel et al., 2020).
For instance, when taking into account the recipient's response, flexible interactions between the signaler and the recipient may also indicate intentional communication.
Although it is important to study the perspectives of the signaler and recipient as two separate issues, it is also important to integrate both perspectives to have a more holistic picture of intentional communication (Graham et al., 2017;Liebal et al., 2014;Sievers et al., 2017). Flexible reactions of the recipient to the signaler's behavior may be possible only if the recipient is aware of the intentional nature of the signaler's behavior. These might cause a turn-taking sequence of communicative and noncommunicative behaviors, as these recipient's reactions are expected to cause further changes in the signaler's behavior, especially if the motivations of the interactants diverge (Sievers et al., 2017). Sievers et al. (2018) studied communicative disagreement during travel initiations in chimpanzees and found clear instances of back-andforth negotiations as attempts to win over the other individual. These interactions may resemble simple forms of human conversations with regard to their turn-taking structure and their overt nature. The authors concluded that an explicit focus on recipient reactions and signaler responses is key to providing a more informative comparison of communicative abilities.

Conclusion
In this review, we discussed the behavioral markers that nonhuman primate researchers use to infer first-order intentional communication via gestures, vocalizations, and facial expressions. We emphasized four major issues that currently limit our understanding of intentional communication in nonhuman animals: reliability, validity, consistency, and generalizability. Recent work has highlighted that the inconsistent use of behavioral markers across and within signal types is at the heart of the problem in current research on nonhuman intentional communication (Ben Mocha & Burkart, 2021;Grahamet al., 2020;Liebalet al., 2014;Townsendet al., 2017). Adopting a more holistic, multimodal approach in future primate communication studies will facilitate comparisons. In addition to problems of consistency and validity, we argue that the socioecological environment (e.g., arboreal vs. terrestrial life-style, wild vs. captive settings) and interactional context (e.g., mother-offspring vs. male-female interaction) should be considered more explicitly. Given that the majority of work has been conducted on one semi-terrestrial great ape species (chimpanzees), it is critical that the markers we use are generalizable across studies in different taxa and settings. We suggest that future research on primate intentional communication should explicitly take into account sampling biases (see also Webster & Rutz, 2020) and embrace the idea that we need more large-scale observational datasets of intraspecific communication, considering both intrapopulation and intraindividual variation, to better understand the goal-oriented communication in our close relatives.
Acknowledgments We thank Carel van Schaik, Simone Pika, Cat Hobaiter and Wild Minds Lab, David Leavens, Raphaela Heesen, Christine Sievers, and Yitzchak Ben Mocha for insightful discussions on nonhuman intentional communication. We are grateful for thoughtful suggestions made by the editor and two reviewers. EDR was supported by the Portuguese national funding agency for science, research, and technology (SFRH/BD/138406/2018); MF was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; FR 3986/1-1) and the Christiane Nüsslein-Volhard Foundation.
Author Contributions EDR and MF conceived of the study and wrote the manuscript. EDR conducted the literature research and descriptive analyses.
Funding Open Access funding provided by Universität Zürich.
Data Availability Supporting information with the list of studies included in the literature research and descriptive analysis is available online in the Electronic Supplementary Material.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.