Connecting primate gesture to the evolutionary roots of language: A systematic review

Comparative psychology provides important contributions to our understanding of the origins of human language. The presence of common features in human and nonhuman primate communication can be used to suggest the evolutionary trajectories of potential precursors to language. However, to do so effectively, our findings must be comparable across diverse species. This systematic review describes the current landscape of data available from studies of gestural communication in human and nonhuman primates that make an explicit connection to language evolution. We found a similar number of studies on human and nonhuman primates, but that very few studies included data from more than one species. As a result, evolutionary inferences remain restricted to comparison across studies. We identify areas of focus, bias, and apparent gaps within the field. Different domains have been studied in human and nonhuman primates, with relatively few nonhuman primate studies of ontogeny and relatively few human studies of gesture form. Diversity in focus, methods, and socio‐ecological context fill important gaps and provide nuanced understanding, but only where the source of any difference between studies is transparent. Many studies provide some definition for their use of gesture; but definitions of gesture, and in particular, criteria for intentional use, are absent in the majority of human studies. We find systematic differences between human and nonhuman primate studies in the research scope, incorporation of other modalities, research setting, and study design. We highlight eight particular areas in a call to action through which we can strengthen our ability to investigate gestural communication's contribution within the evolutionary roots of human language.

systematic review, we investigate those studies of spontaneous gestural communication in human and nonhuman primates that articulate an explicit connection between gesture and the evolutionary origins of modern human language. We incorporate a first use of Bourjade et al.'s conceptual framework for systematic comparison of gesture definitions (Bourjade et al., 2020) and investigate variation in different domains of research, in the study scope, in the inclusion of other signal sources (e.g., vocalizations), in the research setting, and in study design. We aim to provide an up-to-date description of the field, highlighting both what is understood and the areas in need of further research.
Language can be expressed in many forms, including spoken and signed: it is not the signal modality or channel of communication that defines human language so much as the way in which it is used. Many systems of communication across species encode sophisticated information, but nonhuman communication is typically broadcast irrespective of a recipient's attention, interest, or even presence (Rendall et al., 2009). Detecting language-like intentional communication is challenging because it depends not on the observable physical form of the signal but on the cognitive intention of the signaller. Imagine driving along a road and hearing another driver honking their horn as they approach you; there is no fixed information encoded in that signal. Unlike the acoustic structure of a monkey alarm-call (e.g., Seyfarth & Cheney, 2003a, 2003b, the uses of a car horn can mean very different things depending on what the signaller intends them to mean. Intentional use, while a fundamental property of human language, remains apparently rare in communication of other species, including in many primate vocalizations (Rendall et al., 2009;Seyfarth & Cheney, 2003a, 2003b, although see Schel et al., 2013;Townsend et al., 2017). There is an exception: Evidence for flexible intentional use is abundant in nonhuman ape (hereafter ape) gesture (Leavens, Russell, et al., 2005;Plooij, 1978;Tomasello et al., 1985), driving interest in the evolutionary connections between ape gesture and human language, and "gesture-first" hypotheses of language evolution (Corballis, 2002;Hewes et al., 1973;Rizzolatti & Arbib, 1998).
More recent articulations recognize that language-like all animal systems of communication-is multimodal, and likely derived from multimodal systems (Gillespie-Lynch et al., 2014; Leavens, Russell, et al., 2010;Prieur et al., 2020;Taglialatela et al., 2011) but may have included a transition in the role of the different modalities, for example, a shift in the vocal modality from supporting to carrying of information (e.g., Fröhlich et al., 2019).
Comparative studies seeking to draw specific comparisons between primate gesture and human language have been used to explore different aspects of primate species' gesturing including the physical form (as compared with linguistic lexicons; e.g., Brentari et al., 2012) and meaning (as compared with language-like semantics) of gestural signals (often through the study of message and context; e.g., Graham et al., 2018). The structure of gestural communication (in sequences of gestures; as compared with combinatorial structure and syntax in language; e.g., Hall et al., 2015), and the integration of gestural signals with other signal types, such as vocalizations and facial expression (combination of sources; e.g., . From the perspective of the signaller and recipient, researchers have investigated how gesture develops behaviorally during ontogeny (e.g., Salo et al., 2018) and neurologically (neural processes; e.g., Biau et al., 2016), and similarities between how gesture and language are deployed (in brain or limb laterality; e.g., Meguerditchian et al., 2011).
One complication within gestural research is the fact that researchers have no direct access to cognitive states (of either nonhuman or human subjects), and instead employ visible behavioral criteria to infer signaller intentions. These behavioral criteria were first developed in studies of pre-verbal human infants' ability to capture the attention of others and manipulate their behavior (Bates et al., 1975(Bates et al., , 1979. Today, criteria include behavior such as whether the signaller shows sensitivity to the attentional state or composition of the audience, whether they pause (wait) for a response, and if they persist or elaborate when the recipient fails to respond (Leavens, Russell, et al., 2005;Liebal et al., 2004;Tomasello & Call, 1997).
However, as the study of nonhuman primate gesture developed there has been variation in how these criteria have been operationalized and employed (Bourjade et al., 2020;Leavens, Russell, et al., 2005).
Our ability to reliably detect patterns of similarity and distinction across modern primate species' communication is central to our ability to make inferences about the evolutionary trajectory of language. Variation in research settings, methods, or contexts can represent strength, allowing for robust exploration of a particular finding. However, for this to be the case, it is key that diverse methods are transparent about sources of variation (Bourjade et al., 2020;Fröhlich & Hobaiter, 2018). Characteristics of our study sample such as social background, responsiveness, or prior experience impact the generalizability of our findings (cf. STRANGE framework, Webster & Rutz, 2020), and the overrepresentation of particular species or populations distort our ability to make phylogenetic comparisons (e.g., WEIRD-Western, Educated, Industrialized, Rich, and Democratic-humans, or BIZARRE-Barren Institutional Zoo And other Rare Rearing Environment-chimpanzees; Henrich et al., 2010;Leavens, Bard, et al., 2010). Previous studies have highlighted how systematic species-differences in individual history, tasks, and testing environments are confounded with apparent species-differences in communicative or other socio-cognitive abilities, such as their ability to follow gaze or produce pointing (Boesch, 2020;Leavens et al., 2019). Differences in methodology and context of the study appear particularly profound when comparing human and nonhuman primate behavior (Bard & Leavens, 2014;Leavens et al., 2019). Some of these differences may involve, for example, comparisons of institutionalized adult apes with noninstitutionalized human children, or apes in caged environments with free-roaming human children (Bard & Hopkins, 2018;Boesch, 2020;Leavens et al., 2019). In some cases, variation in our understanding across species is limited by what is both technologically feasible and/ or ethical in nonhuman species, for example, the exploration of neural processes inside of living brains (cf. Meguerditchian et al., 2010;Rizzolatti & Arbib, 1998).
To explore meaningful patterns of similarity and distinction between human language and nonhuman gestural communication, we need to address apparent discrepancies in research approach and understanding. A crucial first step in this process is to better understand where any differences currently lie. A systematic assessment of the field allows us to better gauge the impact of any biases on our ability to develop clear hypotheses about the evolutionary trajectory of gesture and language. Bourjade and colleagues recently (2020) developed a framework to allow the systematic comparison of gestural definitions across primate studies-incorporating body parts, sensory modalities, social expression, and communicative and intentional properties. We include a first use of this tool, describing how primate species and study domains of gestural research vary in their concept of gesture, and then assessing how the species and study domains are differently represented in terms of study scope, the inclusion of additional sources such as vocalization or facial expression, and in research settings and study design. With this review, we aim to (1) identify both the areas of focus and apparent gaps within the field in studies that explore the connection between gestural communication and the evolution of human language, and (2) identify to what extent useful comparison can be made across human and nonhuman studies at the present time and make recommendations for the future.

| METHODS
In March 2020, we conducted a search of peer-reviewed articles and book chapters in two search engines: Web of Science and PsycINFO.
We used the SPIDER framework (Cooke et al., 2013) as the search tool to define our question scope and organize and list terms by the main concepts in the search question (Table 1).

| Search
We used the Phenomenon of Interest and the Evaluation categories from the SPIDER framework (Table 1) for our search string. We employed the largest time window allowed (1900-2019) and used both Web of Science and PsychInfo databases. While our search window extended back to 1900, more recent work is more thoroughly indexed in electronic databases, and as a result, our search procedure may have failed to detect some earlier studies. Literature within Web of Science is systematically structured literature from the 1950s and both Web of Science and PsychInfo include books and other material within the Social Sciences and Humanities that are out of copyright. We did not apply search terms related to the sample (e.g., human, nonhuman, or even primates) at this stage, because research conducted on human (as compared with nonhuman) primates does not typically specify taxonomic terms to define the sample. Similarly, the type of study terms described in the design category (e.g., observational, experimental) is also often omitted in human work, so we removed this criterion in the first selection phase.
Although no language restriction was applied at this stage, only studies with English abstracts or keywords were returned by the search because the search terms were in English. In PsychINFO, the search term "gestur*," AND "evolutio*" OR "origin*," AND "languag*" or "communicat*" was used as a filter in the title (TI), abstract (AB), or keyword (KW). In Web of Science, the same search terms were used as a topic (TS) filter (equivalent to title, abstract, and keywords in PsychInfo). The final search string used in Web of Science was (TS = gestur* AND TS = (languag* OR communicat*) AND TS = (evolutio* OR origin*)).

| Inclusion and exclusion criteria
We included publications in the review if they (a) included data from primates, (b) had gestures as the main focus, (c) make explicit the link between their study and language evolution; and (d) relied on spontaneous communication. We included both journal articles and book chapters. Whole books (as a single "unit") were not included because they typically include a range of differently structured studies, so considering them as their individual chapters was more compatible with the data extraction for journal articles. While all publications had to explore at least gestural signals, we also considered those employing a "multisource" approach (extracting the information on data from other sources, such as vocalizations, for analysis). publications without empirical data (e.g., theoretical); and (h) not about language evolution. We checked publications according to these criteria in the order described above, and excluded a publication as soon as they failed to fulfill any criterion (although in practice there may have been further additional reasons for their exclusion).

| Fields for data extraction
Fields for data extraction and their categories ( However, we provide an indication of local socioeconomic structure as being WEIRD (Henrich et al., 2010), Non-WEIRD industrial, or Small-scale nonindustrial. Please note that the category of subjects covered by "Rich" is a global one, and likely includes a range of economic groups in Western Industrialized Countries. These socioeconomic categories are not directly comparable to the Nonhuman primate ones of Species-Typical or Atypical.
We defined the Research Domain(s) explored, asking which area (s) of gestural communication were included in the study (e.g., form, structure, or ontogeny; see Table 3 for full list and definitions).
We asked whether there was an explicit Definition included for Gesture (see Table 2). For the publications which provide an explicit definition of gesture, we used the conceptual tool proposed by Bourjade et al. (2020) to analyze the requirements for a given behavior to be categorized as a gesture. The authors provided 22 criteria covering five main areas: the body parts used to gesture, the sensory modalities mobilized by the gesture, the characteristics of its social expression, and its communicative and intentional properties (see Supporting Information ESMS2 for full details).
Of those studies that provide an explicit definition of gesture, some include in this definition a requirement that they must fulfill certain intentionality criteria (captured by the "communicative and In the majority of the animal communication literature, including some primate studies, the term "multi-modal" has been employed to refer to the combination of information from different sensory channels (e.g., visual, auditory, tactile; Micheletta et al., 2013;Partan, 2002;Partan & Marler, 1999). However, within nonhuman ape communication this term is sometimes used to refer to the combination of different signal sources (e.g., gesture, vocalization, and/or facial expression; Pollick & de Waal, 2007;Waller et al., 2013;Wilke et al., 2017). To avoid confusion, here we follow the wider use and employ the term "multi-modal" to refer to the combination of sensory channels, and the term "multi-source" to refer to the combination of signal types.
We took into account the Gestural scope of the study. Here, we defined studies that explored a specific context, or limited set of contexts (e.g., sexual solicitation and consortship), specific gestures, or limited types of gesture (e.g., tactile gestures, or specific gesture forms such as pointing) as Narrow. We defined studies that explored a question across contexts and gesture repertoires without further specification as Broad.
We then asked what Research setting data were collected from.
We focused on an individual's familiarity with the environment in which the study data were collected. We defined two main cate-

| Data extraction
"Research domain" was the only field involving a potentially subjective judgment, so in all cases, two of the authors extracted this field, and any discrepancies were argued until consensus between the two original raters. A third independent opinion was solicited (38 of 221 publications) when the two original raters could not reach consensus, or when the initial disagreement between raters involved more than one domain. res. = 3.548; Figure 3). Of the 54 studies that provided an explicit definition of gesture, all gestures necessarily included a visual component in their Sensory

F I G U R E 2
Modality. Gestures were also defined as compounds that, in addition to visual information, also included an auditory or tactile component in the majority of studies (N = 37/54, 69% and N = 40/54, 74% respectively; Table 6). In terms of their Social Expression, gestures were defined as produced in the presence of an audience in 89% of studies (N = 48/54); and as addressed to specific recipient(s) in 82% of studies (N = 44/54); however, gestures were defined as produced while looking at the recipient in only 6 of the 54 studies (11%;

| Which Body Parts are taken into account to produce Gestures? How does it vary with species and domain?
Across all studies, we found a particular focus on Manual gestures:  Table 6).
T A B L E 6 Number of studies using the criteria proposed by Bourjade et al. (2020) for defining "gesture"

Studies of gestural communication that make an explicit connection
to the evolutionary origins of language exist in similar numbers for both human and nonhuman primates; however, only four of the 163 studies included in this review incorporated data from both human and nonhuman primates, and only 15 included more than one nonhuman primate species. As a result, our ability to investigate species' similarities and distinctions across primate taxa and infer an evolutionary trajectory for language from this field, is almost entirely dependent on comparison across studies. By conducting a systematic review, we are able to describe to what extent current methods allow us to do so reliably. We find substantial variation in the conceptual and methodological approaches used. While variation allows for a diverse and robust examination of gesture in this context, it presents particular challenges for the effective comparison across studies and species on which the evolutionary approach depends.
There were limitations to the literature returned in our search process, for example, older material (particularly pre-1950) is not systematically indexed in electronic databases. However, perhaps the most important one was in our requirement for an explicit reference to the evolution or origins of language or communication. We were initially surprised that, in employing this restriction, we excluded work we regularly cite as relevant to the evolutionary origins of human language, including our own studies. Removing the requirement for the "evolutionary" terms from our search returned around 6500 results; whereas with them, our structured search returned around a thousand items (with just over a sixth of that retained once the systematic selection criteria had been applied). One explanation for the extent of these exclusions is that many of the empirical studies on nonhuman primate gesture (including our own) avoid explicit discussion of their potential connection to the evolution of human language-and in particular, do not do so in the title, keywords, or abstract; the fields most commonly indexed across databases. In some cases, not doing so may be because the primary focus of the study was a description of the species communication-nonhuman primate gesture is of interest in its own right, not just as a means of comparison to human communication. Nevertheless, these studies may usefully inform our understanding of the evolutionary trajectory of linguistic features. For example, some studies that explore the combination of gestures into sequences (e.g., McCarthy et al., 2013;Tempelmann & Liebal, 2012), relevant to understanding similarities and differences to human language structure, or studies on neural processes of homologous brain area activation in human and nonhuman primate signaling (e.g., Hopkins et al., 2007Hopkins et al., , 2008 were excluded because they did not make an explicit connection to the evolution of language. Similarly, some studies of human gesture discussed its relationship to individual ontogeny of language, but did not explicitly consider the relevance of the work to the evolution of language (e.g., Bates et al. 1979;Iverson et al., 1994).
While widening our search to more broadly encompass primate gestural research would successfully retain these studies, it would also add a very large literature that provides limited insight into the evolution of language (e.g., work on leaf-clipping as a sexual solicitation in chimpanzees; Nishida, 1980; or work on big loud scratch as grooming solicitation; Nakamura et al., 2000) and such a large corpus risks diluting our ability to provide a clear overview of the field. There may also be a justifiable reluctance to engage in "just-so story-telling" in research that does not explicitly test evolutionary hypotheses.
Spurious statements about the relevance of any nonhuman primate behavior to human behavior unhelpfully reinforce human-centric approaches to the study of nonhuman behavior, which risk us overlooking extraordinary nonhuman species-specific capacities. Whereas carefully considered discussions can be helpful, they require a substantial investment that may distract from the main aim of research that did not set out to explicitly further a comparative approach.
Nevertheless, our theoretical studies often use the findings from these same empirical studies of primate gesture as the foundation for the hypotheses we develop on the evolution of human language.
While it is certainly the case that studies of nonhuman primate gesture contribute to a much wider range of questions beyond the possible evolutionary origins of human language, it may be worth reflecting on the apparent scope in our field to more explicitly test evolutionary hypotheses in a wider range of our empirical work. For example, by establishing multispecies primate data sets that employ a coherent study methodology, or allow for the extraction of like-with-like features for comparison, we can test hypotheses that address how aspects of gesture are adapted to a particular species socio-ecology (cf. Prieur et al., 2020).
Developing hypotheses on the evolution of behavior within primate history requires that we have sufficient rich data across, and within, primate taxa. However, we found that studies of nonhuman primate gestural communication were largely limited to chimpanzees and bonobos. A number of studies in monkeys were excluded because they involved training them to produce a particular gesture (e.g., Defolie et al., 2015;Meunier et al., 2013) leaving only a handful of studies in Afro-Eurasian monkeys, and no studies on monkeys of the Americas, small apes, or strepsirrhines. While chimpanzees and bonobos represent our closest living relatives, and these studies allow us to ask whether or not a particular feature of language is uniquely human, they provide a more limited scope for exploring the possible evolutionary trajectory of language over a longer period. In addition, implicit hierarchies exist between species in their relevance to human origins that may obscure the deeper roots of some features (Bourjade et al., 2020). We see this illustrated in our data in the very limited number of human studies that explicitly test whether the gestures explored meet criteria for intentional use-there is an assumption that human behavior always does. Our findings similarly highlight that there is often an implicit assumption of the importance of ape behavior for understanding human evolution, whereas studies of monkey behavior are required to more thoroughly establish the grounds for comparison (Bourjade et al., 2020). Data on the spontaneous gestural communication from a more diverse range of species-including direct comparisons between nonhuman primatesare necessary for a deeper and more nuanced understanding of how and when the capacities that underpin language evolved.
Our understanding of gestural signals' contribution to the evolutionary origins of human language may also be compromised by the use of specific human and nonhuman primate populations to represent species-specific characteristics. Every species' system of communication-including primate gesturing-is in some way adapted to a species' specific and distinct socio-ecological niche (Cheney & Seyfarth, 2018). The majority of studies of nonhuman primate gesture in our review were conducted on groups living in man-made anthropogenic environments that do not reflect the socio-ecological environments to which their communication is adapted. Even among studies of wild primates, a focus on a few specific groups or populations (e.g., Hobaiter & Byrne, 2011a;Pika & Mitani, 2006;Roberts et al., 2012), likely impacts our understanding of species-typical behavior. Similarly, in human studies there was also a focus on specific groups: as in many fields of study (Henrich et al., 2010), there was a strong bias toward studies of human gesture in WEIRD socioeconomic cultures. Our understanding of the links between primate gesture and human language can be strengthened by more direct testing of the impact of species socio-ecology and individual lifehistory characteristics on gestural expression (e.g., Prieur et al., 2020), although doing so will take substantial large-scale data sets.
Different environments may promote the use of certain gestures that are not expressed in other environments (see Leavens, Russell, et al., 2005 for the example of pointing). The frequency and quality (context, interaction partner, and group membership) of interactions seem to influence the frequency of gesture use and the size of gestural repertoires, for example, higher interaction rates with nonmaternal conspecifics and a larger number of previous interaction partners are both related to more frequent gesturing and the use of more gesture types (Fröhlich et al., 2017. Thus, it is particularly important to complement the detailed data from captive settings with more diverse data from primates living in their naturally structured social units (Cheney & Seyfarth, 2018;Fröhlich & Hobaiter, 2018). Given the well-established presence of rich cultural variation in behavior (Boesch et al., 1994;McGrew et al., 1997;Whiten et al., 1999) a richer understanding of the communicative abilities in diverse populations, in a range of environments (Hobaiter & Byrne, 2011b, in other great apes (gorillas, bonobos, and orang-utans; Bard, 1992;Genty et al., 2009;Knox et al., 2019;Schamberg et al., 2016), and in other primate species (e.g., Japanese macaques, mandrills, pygmy marmosets, capuchin monkeys, bonnet macaques; De La Torre & Snowdon, 2002;Gupta & Sinha, 2016;Itani, 1963;Kudo, 1987;Wheeler, 2010) is necessary to better understand the evolutionary trajectory of primate gestural communication and its relationship with language.
The steady increase across most domains highlights the increasing interest in, and relevance of gesture to, questions related to language evolution; however, research efforts remain unequally distributed across domains and between species within domains.
Meaning was the most popular domain and was similarly explored in both human and nonhuman primates. Studies of form were most biased toward nonhuman primates and ontogeny were most biased toward studies in humans.
Studies of meaning in gestural communication are now the most common focus. Gestural research has sometimes employed the signaller's behavior or the context of the signal use as a proxy for "meaning" (Bard & Leavens, 2014;Tomasello et al., 1994). Reflecting gesture's intentional use, recent studies employ a combination of signaller and recipient behavior to take a cognitive-linguistic perspective and infer the signaller's intended meaning (e.g., Cartmill & Byrne, 2010;Genty et al., 2009;Graham et al., 2017;Hobaiter & Byrne, 2014). The large number of studies of signal form in other primates is likely because this domain includes the description of communicative repertoires, a common focus when exploring the communication of any nonhuman species (e.g., Berg, 1983;Conner, 1985;Edds-Walton & Edds-Walton, 1997). However, the lack of similar systematic descriptive studies of human gesture forms again makes comparison with nonhuman primate research challenging (Kersken et al., 2018;Müller, 2005). Gestural ontogeny has been explored in humans for decades in the context of its relevance for language development; however, it is only more recently that researchers started to frame their results within the evolutionary puzzle or explore this domain in nonhuman primates.
Both within and between human and nonhuman primates, studies used different definitions of which movements and body parts constitute a gesture, and different criteria to define their intentional RODRIGUES ET AL.
| 13 of 18 use. Some studies defined gesture broadly to include body postures, while other studies employed more restrictive definitions including specific criteria, or specific body parts. Over half of the studies were restricted to manual gestures, but a considerable number of studies (~40%) were more flexible, including movements from the whole body or other body parts. None of this variation is necessarily problematic; however, a significant cause for concern, given this variation in definition, is that more than a third of the studies in this review did not provide a definition for their use of gesture at all. These differences in the fundamental basis of what "a gesture" is may have  (Bates et al., 1975(Bates et al., , 1979), but their explicit use seems now largely restricted to nonhuman primate studies. Over half of the nonhuman primate studies reviewed provided some criteria for defining intentional gesture. However, while many required gesture cases to meet one or more criteria for intentional use from a set, they typically did not specify which were met (Genty et al., 2009;Hobaiter & Byrne, 2011b;cf. Leavens, Hopkins, et al., 2005). No one criterion is a panacea for the challenge of identifying mental states from observable behavior. Audiencechecking could simply reflect a shift in attention between objects of interest. Response-waiting could reflect a brief rest in activity. Providing more detail on the frequency and distribution of the different criteria within a study would allow for a more direct comparison of intentional gesture use across studies and species of nonhuman primates and improve our ability to assess the extent to which particular criteria provide robust, reliable, measures (e.g., Prieur et al., 2018).
In contrast to the relatively widespread use of criteria to define intentional gesture in nonhuman primates, just four studies in humans and one on both human and nonhuman primates, provided any criteria for intentional gesture use. While humans are clearly capable of intentional communication, we are equally capable of producing fixed non-intentional signals (e.g., an involuntary yelp, smile, or laugh; Kawakami et al., 2007;Provine, 1992). Moreover, mechanically ineffectiveness seems to be a criterion often applied in nonhuman primate research to define gesture, but rarely seen in human gestural research. Including all human gesture-like movements, irrespective of the objective evidence for their communicative and intentional use, while limiting nonhuman primate data to only those gestures used with evidence for intentional use, again, impacts our ability to make meaningful comparisons between human and nonhuman gesture.
Doing so furthers the double standard too often applied in comparative research, which sees systematic species-differences in testing conditions or criteria mistaken for species-differences in cognition (Bard & Leavens, 2014;Leavens et al., 2019).
The majority of the studies reviewed employed a narrow scope of focus, investigating specific gesture types or specific contexts.
However, studies in nonhuman primates were more likely to have a broad scope than studies with humans; for example, they more often included descriptions of gestural repertoires rather than a specific form such as pointing. Again, there is no intrinsic benefit to employing a narrow or broad scope, but both are needed across species to compare like with like.
Almost half the studies in this review included other signal sources with their gesture data; however, the integration of gestures, vocalizations, and facial expressions remains understudied in nonhumans relative to humans, despite recent calls to investigate it (e.g., Slocombe et al., 2011;Waller et al., 2013). Where signal sources are combined in communication, for example, gesture and facial expression, studying one in the absence of the other may limit our interpretation of signal function (Wilke et al., 2017). However, studying different signal types and sources in combination can be methodologically challenging. For example, studies of gesture often focus on visual information, and the signaller and recipient's visual attention; neither of which may be as relevant to the production or receipt of vocal signals (Schel et al., 2013). The development of methodologies that can be applied across sources will allow for more widespread multisource comparisons (Müller, 2005;Slocombe et al., 2011).
Ape cognitive and social development, including in their communicative repertoires (Boesch, 2007;Leavens et al., 2019), is sensitive to a wide range of social and environmental factors, and interactional experience has been shown to impact the development of gestural use (e.g., Fröhlich & Hobaiter, 2018). We found that nonhuman primates were more often studied in environments that were familiar to them and used observational research designs. In contrast, human research was mainly conducted in unfamiliar environments, such as research laboratories, and used experiment designs. Collecting data within a laboratory setting allows nuanced control of specific variables; however, these methods are typically challenging to replicate in an ethical manner with nonhuman primates. Slocombe et al. (2011) previously highlighted the lack of nonhuman primate gestural work in the wild (although see now e.g., Graham et al., 2018;Hobaiter & Byrne, 2011a, 2011bRoberts et al., 2012), but it is similarly noteworthy that very little gestural work on humans is done outside of unfamiliar laboratory settings (cf. Kersken et al., 2018). We can summarize our findings in the following 8-point call to action for researchers interested in how gestural communication may inform our understanding of language evolution.
There is substantial scope in the wider gestural field to more explicitly test evolutionary hypotheses in our empirical work.
Data are needed on spontaneous gestural communication from a more diverse range of species, in particular, outside of Pan ape species, and including direct comparisons between nonhuman primates.
Data are needed from more diverse populations in diverse environments that consider the impact of socio-ecology and socioeconomy on the use of gesture.
There is particular scope for studies of gesture forms in humans, and studies of gesture ontogeny in nonhuman primates.
Given the variation across the field, it is imperative that studies define their specific usage of gesture.
Providing more detail on the frequency and distribution of the different criteria for intentional use, in particular in humans, will improve our ability to assess the extent to which particular criteria provide robust, reliable, measures.
The development of methodologies that can be applied across sources will allow for more widespread multisource comparisons.
In addition to studies of human gesture outside WEIRD populations, there is also substantial scope for studies of natural human gesture in familiar, non-laboratory, environments.
We hope that this review serves to highlight not only the challenges, but also the areas of particular promise for future research. A detailed understanding of human and nonhuman primate gesture will take more than one researcher's or research group's lifetime of study.
Diversity in our study subjects and approach will provide a more nuanced understanding, but transparency and replicability in our methods are equally crucial to our ability to draw meaningful conclusions about gestural communication's role in the evolution of human language.

ACKNOWLEDGMENTS
The structure of this study was developed during the January 2019 ISPA Advanced Course on Scientific Writing, and we would like to thank P. McGregor for his guidance. We are grateful for the thoughtful comments raised in the review process that allowed us to incorporate important new analyses and discussion. The authors gratefully acknowledge the financial support provided by the Portuguese Foundation for Science and Technology to the first author (SFRH/BD/138406/2018).

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the Supporting Information Material of this article and in a public repository on github.com/Wild-Minds/GestureStudies_SystematicReview