Butterfly data
Ten study sites (Fig. 1a) were visited biweekly (every 2 weeks) by a one of us (AMS) for between 45 and 29 years, depending on the site, and only during good “butterfly weather” when conditions were suitable for insect flight (nearly year round at low elevations, and during a more narrow period at higher elevations). At each site, a fixed route was walked and the presence and absence of all butterfly species noted. Maps of survey routes and site-specific details, as well as publically-archived data can be found at http://butterfly.ucdavis.edu/.
For most analyses, we restricted data to a common set of years, from 1988 to 2016, for which we have data from all sites (the plots in Fig. 1 that go back to 1985 are an exception, and do not include all sites in the first few years). Plots and analyses (described below) primarily involve species richness or phenological data, specifically dates of first flight (DFF) or dates of last flight (DLF). The latter two variables (DFF and DLF) involved filtering to avoid biases associated with variation in the intensity or timing of site visits. Specifically, DFF values were only used if they were proceeded by an absence; in other words, there must be a negative observation before a positive observation is taken as a DFF record. Similarly, DLF values were not used if they were not followed by an absence; so any species observed on the last visit to a site in a particular year did not have a record of DLF for that year. If a species was only observed on a single day in a particular year, then that date was used as a DFF (and only if proceeded by an absence) but not as a DLF, in order to not use the same data point twice.
For a subset of years and sites, absolute counts of individual butterflies (by species) were taken in addition to the presence/absence data; this was done at the 5 lowest elevation sites starting in 1999. These data are used here to investigate the dynamics of the low elevation butterflies during the drought years, specifically the relationship between phenological shifts, changing abundance and dependence on irrigation. For the latter (dependence on irrigation), one of us (AMS) ranked species a priori (without knowing the results of analyses) based on natural history observations. Dependency on irrigated areas was categorized as follows: 1), butterfly species that are essentially independent of irrigation; 2), species that use irrigated, non-native hosts in some areas as well as native, non-irrigated hosts in other areas; 3), species that use irrigated, non-native hosts in at least one of multiple generations; and 4), species that are completely dependent on irrigated, non-native hosts.
Weather variables
Analyses included the following weather variables: maximum and minimum daily temperatures, total precipitation, and a sea surface temperature variable associated with regional conditions [17]. Following previous analyses [15], maximum and minimum temperatures were averaged and precipitation was totaled from the start of September of the previous year through August of the current year. Previous fall through current summer is a useful climatological time period in a Mediterranean climate and captures precipitation and overwintering conditions that potentially affect butterflies through both direct effects on juvenile and adult stages, and indirect effects through host and nectar plants. Data were generated as monthly values using the PRISM system (Parameter-elevation Relationships on Independent Slopes Model, PRISM Climate Group; http://prism.oregonstate.edu) for latitude and longitude coordinates at the center of each survey route.
As a complement to the site-specific, PRISM-derived weather variables, we used an index of sea-surface temperature associated with the El Niño Southern Oscillation (ENSO). Specifically, we used the ONI (Oceanic Niño Index) values for December, January and February (a single value is reported for those winter months; http://cpc.ncep.noaa.gov) from the winter preceding butterfly observations for a given year. Higher values of this index correspond to regionally warmer and wetter conditions. We also downloaded snowfall data from the Central Sierra Snow Lab located near our Donner Pass site (station number 428, https://wcc.sc.egov.usda.gov/reportGenerator/), but preliminary investigations found that annual snowfall totals were highly correlated with annual precipitation totals. Correlation coefficients between snowfall and precipitation were between 0.80 and 0.88, and the inclusion of snowfall caused variance inflation factors from linear models (described below) to often exceed 10; thus snowfall was not included in final models. In contrast, correlations among other weather variables (maximum temperatures, minimum temperatures, precipitation, and ENSO values) were generally lower: across sites and weather variables, the mean of the absolute value of correlation coefficients was 0.31 (standard deviation = 0.23).
Weather variables that were included in models were z-standardized within sites to be in units of standard deviations. This allows variables from sites with different average conditions (e.g., mountain and valley sites) to be readily compared and, more important, it allows for slopes from multiple regression models to be compared among weather variables that are themselves measured on different scales (as is the case with precipitation and temperature).
Analyses
Statistical investigations involved two phases. First, we used plots of z-standardized data to visualize patterns in phenology (DFF and DLF) and flight days over time; the latter variable, the number of days flying, was expressed as the fraction of days that a species is observed divided by the number of visits to a site per year (this has been referred to as “day positives” in other publications using these data [17]). DFF, DLF, and flight days were z-standardized within species at individual sites and then averaged across species to facilitate comparisons of patterns across sites and years. We also used plots of species richness to explore change over time at each site. Plots of richness were accompanied by splines (with 5 degrees of freedom) and predicted values from random forest analyses [24], which both allow for visualization of non-linear relationships. The spline analysis has the advantage of producing smoothed relationships (between richness and years), while the random forest analysis, performed with the randomForest [25] package in R [26], has the complementary advantage of being able to incorporate covariates (in this case the number of visits per year) as well as the advantage of making no assumptions about the shape of the relationship (between richness and years, while controlling for sampling effort). Patterns in species richness over time were also explored using diversity indices and Hill numbers that weight rare and common species differently (at different levels of q, which determines the sensitivity of the analysis to rare species) [27,28,29], using the vegetarian (v1.2) package [30] in R [26]. In addition, we used the combination of spline and random forest analyses to investigate changes in abundance (numbers of individuals observed per year) at the low elevation sites where abundance data were available. As with other variables, abundance values were z-standardized within species and sites, and z-scores were averaged across species.
Following the visualization phase of investigation, we developed simple linear models that were focused on prediction of dates of first flight (DFF) and species richness, summarized as described above (as z-scores averaged across species within each site and year). Independent variables for both sets of models included average daily minimum temperatures, average daily maximum temperatures, total precipitation, ENSO (ONI sea surface temperatures), sampling effort (number of visits), and year. Models of species richness included date of first flight as an additional variable because we were interested in the possibility that the timing of species emergence affects butterfly populations and consequently observed species richness. These models (for both DFF and richness) were estimated for each site individually and for high and low sites as groups of sites (5 sites in each model). Additional model complexities were explored that included interactions between weather variables, time lags (effects of previous years on current year dynamics) and cumulative effects (sliding windows of averaged precipitation values). Interactions were rare, but we report interactions between weather variables that were significant at P < 0.05. Lagged and cumulative weather variables did not add to the explanatory power of models and the individual lagged and cumulative effects were rarely significant and not discussed further.
As we have found elsewhere for analyses of phenology and richness with these data [13, 31], linear models satisfied assumptions of normality and homogeneity of variance, and autocorrelation plots were examined to confirm independence of residuals. To address potential collinearity among predictor variables, variance inflation factors were investigated and found generally to be between 0 and 5, and in a few cases between 5 and 10. For instances where inflation factors approached 10, quality control was conducted by including and excluding correlated variables to verify that estimated β coefficients were not affected. Linear models were also used to test the hypothesis that phenological shifts at low elevations have demographic consequences for individual species. For each species at the lowest sites (SM, WS, NS, and RC; see Fig. 1 legend for site abbreviations), we separately regressed dates of first flight against years, and annual abundance against years. Slopes from those regressions were then compared using correlation to ask if species that shifted to an earlier flight (negative slopes for DFF versus years) were also species that became more abundant (positive slopes of abundance versus years). This was done for the years 2008–2016 to capture the transition into the millennium drought years, and only included species that were present in at least 6 of those years. As with other analyses, linear models were performed and assumptions investigated using R [26].