ecology – A.Z. Andis Arietta

Wild Idea Podcast

azandis@gmail.com — Mon, 21 Jul 2025 17:01:46 +0000

I recently joined my dear friend Bill Hodge on the The Wild Idea Podcast for a conversation about ecological resilience, climate adaptation, and how we think about wilderness in a changing world. We covered topics such as road ecology, species adaptation, and the sometimes counterintuitive lessons that emerge when humans step back from the landscape. From wood frogs that freeze solid in winter to the 22-mile rule showing how few truly remote places remain, we explored how human systems, even unintended ones, shape the trajectories of natural systems.

Drawing on my work in evolutionary ecology, wilderness ethics, and machine learning, I reflected on the tension between our desire to intervene and our limited ability to forecast long-term ecological outcomes. Using examples like the Chernobyl exclusion zone—where many species are thriving in the absence of people despite nuclear contamination—I argued that ecological recovery is often less about precision intervention and more about restraint. We discussed how machine learning can help us simulate alternative futures and understand potential tradeoffs, but that ultimately, the most powerful conservation tool may be humility. More wilderness, not more control, might be the best way to meet the uncertainties ahead.

Listen to the episode here or wherever you get your podcasts.

Text analysis using AI in R

azandis@gmail.com — Fri, 06 Oct 2023 00:02:01 +0000

Introduction

Analyzing qualitative data is challenging. Such analyses are even more difficult when the topic is controversial and the results will drive important policy decisions. This post explores AI methods for qualitative research, using chatGPT for categorization, embeddings to find hidden topics, and long-context summarization with Claude2 on a case study analyzing free-text public comments to a controversial Environmental Impact decision.

Background

Quite a while ago, I detailed why replacing wolves on Isle Royale National Park was a bad policy decision back by even worse science. Since then, the National Park Service (NPS) decided to commit to wolf replacement anyway, dropping 19 new wolves on the island in 2018 and 2019. The results were expected. The new wolves killed the last original male wolf in 2019, almost certainly ensuring that the new wolf population will be genetically disconnected from the prior population. Of the 20 wolves that NPS attempted to relocate, one died before making it to the island, one voluntarily crossed the ice back to the mainland*, and four others died by the end of 2019. The surviving 14 wolves successfully bred and the population now stands at 31. So, in the end, we have a new, synthetic wolf population that is entirely disjunct from a genetic and ecological perspective. As I predicted in my original post: “in reality, this is not a genetic rescue project, it is a genetic replacement project,” which violates both the scientific and management purpose of the Park.

* This contradicts one of the primary justifications for replacing the wolves. Proponents argued that the lack of ice due to climate change would make natural repopulation impossible.

But neither science nor policy drove NPS’s decision. Management of charismatic mammals, especially in a well-known National Park, is largely a matter of public sentiment. In fact, it is a codified part of the decision process. Federal managers are required to seek public comments as part of the NEPA process.

In general, I am a huge supporter of public voices in important conservation decisions (I’ve even written papers advocating for it). But, sometimes I worry about how advocacy groups can skew the perception of organic public sentiment. That’s what I’d like to analyze in this post.

All of the public comments submitted to NPS on the Isle Royale wolf-moose management plan are public record. You can download and read all 1117 pages of comments.

But 1117 pages is a lot of text to read and digest. In this post, I want to show how you can easily process lots of text using AI (both generative large-language models (LLM), like chatGPT, and LLM embeddings) to make quantitative (or semi-quantitative) analyses.

Basic analyses

Visit my GitHub repo for this project for a fully reproducible analysis.

First, we’ll set up the environment and load in necessary packages.

# Load libraries
library(pdftools) # We will use 'pdftools' to convert the pdf to plain text
library(tidyverse)
library(stringr)
library(RColorBrewer)

# Set up the directory structure:
make_new_dir <- 
     function(DIR_TO_MAKE){
          if(dir.exists(DIR_TO_MAKE) == FALSE){
               dir.create(DIR_TO_MAKE)
          }else{
               print("Directory exists")
          }
     }

make_new_dir("./data/")
make_new_dir("./figs/")

We can download the comments from the NPW website.

download.file(
     url = "https://parkplanning.nps.gov/showFile.cfm?projectID=59316&MIMEType=application%252Fpdf&filename=ISRO%5FMWVPlan%5FAllCorrespondence%5FPEPC%2Epdf&sfid=232552",
     destfile = "./data/ISRO_MWVPlan_AllCorrespondence_PEPC.pdf",
mode = "wb"
)

The first step to analyze the public comments is to parse the pdf into text. This is a tedious process. I won’t show it here, but you can follow all of the steps on my GitHub repo for this project.

Example comment from the formatted PDF document.

You can download my pre-processed dataset to short-cut the the PDF parsing steps.

download.file(
     url = "https://www.azandisresearch.com/wp-content/uploads/2023/09/EIS_comments.csv",
     destfile = "./data/EIS_comments2.csv"
)

EIS_comments <- read.csv("./data/EIS_comments.csv")

The formatting follow the same structure for every comment. I’ve extracted the ‘Comment ID’, ‘Received’ date time, ‘Correspondence Type’, and ‘Correspondence’ text into a dataframe. I’ve also truncated the longest comments (…comment 68 looks like someone copy and pasted their term paper) to 12,000. This will be important later because the context window for chatGPT is 4000 tokens.

EIS_comments %>% glimpse()
Rows: 2,776
Columns: 4
$ ID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,…
$ Received        2015-07-12 20:45:30, 2015-07-14 23:18:34, 2015-07-15 12:03:55, 2015-07-15 13:14:52, 2015-07-15 13:35:47, …
$ Correspondence  "Web Form Correspondence", "Web Form Correspondence", "Web Form Correspondence", "Web Form Correspondence"…
$ Content         "The alternatives are complete enough as a starting point. The issues will be related to the details. The …

We can do some basic summary analysis on these initial variables. The most comments were submitted in the week before the comment deadline on Sept 1. The vast majority of comments were received through the web form. Less than 10% of comments were physical letters and 51 of the 2777 comments were form cards given to Park visitors.

Often, large influxes of web and email comments are the product of advocacy groups encouraging their members to submit pre-written comments. I’ve used this tactic myself in conservation campaigns, so I won’t cast dispersions. But, I’ll also be the first to admit that a copy-and-pasted form letter is far less sincere than a uniquely crafted opinion.

After checking for matches among the comments, it is clear that there were two archetypical pre-written texts. These include 733 near identical comment in favor of wolf replacement (i.e. Alternative B), likely from National Parks Conservation Association:

EIS_comments %>%
+   filter(grepl("I care about the wildlife at our national parks, including the wolves and moose at Isle Royale. Right now there are only three", Content)) %>%
+   group_by(Content) %>%
+   tally() %>%
+   arrange(desc(n)) %>%
+   ungroup() %>%
+   filter(row_number() == 1) %>%
+   .$Content %>% 
+   cat()
Dear Superintendent Green, I care about the wildlife at our national parks, including the wolves and moose at Isle Royale. Right now there are only three wolves left at the park- -the lowest number of wolves in more than 50 years- -threatening the overall ecosystem health of this iconic national park. I support management Alternative B to bring new wolves to the island, but urge the Park Service to do this as needed, rather than one time only. Without wolves, the moose population on the island will continue to increase, eating until the food sources are gone. If we bring new wolves to the island, they will help keep the moose population from rapidly expanding and minimize impacts to the native vegetation. This option is much less intrusive in this wilderness park than culling moose, removing moose from the island, or having to replant native vegetation once the moose consume it. As stewards of this park, the National Park Service should take the least intrusive action that results in the biggest benefit to the island's wildlife and ecosystem. I support the Park Service taking action to bring new wolves to the park immediately, before the population vanishes altogether. Thank you for considering my concerns. Sincerely,

And 55 nearly identical comments in favor of Wilderness (i.e. Alternative A), likely from Wilderness Watch:

EIS_comments %>%
+   filter(grepl("Isle Royale's wilderness designation requires that we protect the area's unmanipulated, untrammeled wilderness character. Wild", Content)) %>%
+   group_by(Content) %>%
+   tally() %>%
+   arrange(desc(n)) %>%
+   ungroup() %>%
+   filter(row_number() == 1) %>%
+   .$Content %>% 
+   cat()
Isle Royale's wilderness designation requires that we protect the area's unmanipulated, untrammeled wilderness character. Wilderness designation means we let Nature call the shots. Transplanting wolves from the mainland to Isle Royale is a major manipulation of the Isle Royale Wilderness and must not be done. Alternative Concept A, the No Action Alternative, is the best alternative to protect Isle Royale's unmanipulated, untrammeled wilderness character.

It is important to flag these duplicated comments because the methods that we will use later on will not behave correctly with nearly identical strings.

EIS_comments_deduplicated <- 
     EIS_comments %>%
     # Remove comments with no content
     filter(!is.na(Content)) %>%
     # Flag the web form duplicates
     mutate(form_duplicate = ifelse(grepl("I care about the wildlife at our national parks, including the wolves and moose at Isle Royale. Right now there are only three", Content), "for Alt B", NA)) %>%
     mutate(form_duplicate = ifelse(grepl("Isle Royale's wilderness designation requires that we protect the area's unmanipulated, untrammeled wilderness character. Wild", Content), "for Alt A", form_duplicate)) %>%
     # Form duplicates are not exact matches
     mutate(Content_dup = ifelse(is.na(form_duplicate), Content, form_duplicate)) %>%
     group_by(Content_dup) %>%
     # Retain one of the duplicate sets
     slice_sample(n = 1)

After removing the duplicates and cleaning the data, we are left with 1970 unique comments.

Text analysis with chatGPT

Now, we can start analyzing the content. There are many ways that we could do this, depending on the question we want to answer. For instance, maybe we want to see with questions naturally group together to see if we can find common themes? Traditionally, a common way to do this type of natural language processing would be to use an approach like a Latent-Dirchelt allocation topic analysis that groups comments by tf-idf values of the stems of words contained in the comment. (I cover tf-idf in a previous post). But, one problems with this approach is that the context of words is lost.

If we want to capture the context of the text, we might try using word embeddings from a LLM like GPT. We’ll try this approach later.

In our case, maybe we just want to know how many comments support a given policy.. It would be hard to answer that from the embeddings ourselves, but we could treat GPT as an agent who could read and categorize comments by preferred policy alternative.

We’ll use two packages. httr helps us interact with the chatGPT API. The API speaks in json format. jsonlite helps us parse formatted prompts and responses.

library(httr)
library(jsonlite)

Working with chatGPT is a lot like working with a new intern. Like an new intern, it has no prior contextual understanding of our specific task–we have to be very explicit with our directions. On the bright side, our chatGPT intern has endless patience and never sleeps!

We will be interacting with chatGPT through the API. This differs from the dialectical way that most people interact with chatGPT. We need to engineer our prompt to get a robust response in exactly the same format, every time. We can do that by passing in quite a bit of context in our prompt and giving specific directions for the output, with examples. Here is the prompt we’ll use:

You are a federal employee tasked with reading the following comment submitted by a member of the public in response to the The Isle Royale National Park Moose-Wolf-Vegetation Management Plan/EIS. The Plan/EIS is a document that evaluates management alternatives for the moose and wolf populations on the island National Park land.
Management alternatives include:

- Alternative A: No Action. Continue the current management of letting nature take its course, without any intervention or manipulation of the moose or wolf populations or their habitats.
- Alternative B: Immediate Wolf Introduction. Introduce 20-30 wolves over a three-year period, starting as soon as possible to reduce the moose population and its impacts on vegetation.
- Alternative C: Wolf Introduction after Thresholds are Met. Introduce wolves if certain thresholds are met, such as the extirpation of wolves, the overabundance of moose, or the degradation of vegetation. The number and timing of wolf introductions would depend on the conditions at the time.
- Alternative D: Moose Reduction and Wolf Assessment. Reduce the moose population by lethal and non-lethal means, such as hunting, contraception, or relocation. The goal would be to lower the moose density to a level that would allow vegetation recovery and assessing introducing wolves to the island in the future.

Here is the text of the public comment: '[INSERT COMMENT TEXT]'.

State which alternative the commenter is most likely to favor (A, B, C, D).
State if the comment is 'For', 'Against', or 'Neutral' on wolf introductions.
State if the strength of the commenter's opinion on a scale from 'Extremely strong', 'Very strong', 'Strong', 'Somewhat strong', or 'Mild'.

Produce the output in json format like this:
{
"favored_alternative": "",
"wolf_opinion": "",
"opinion_strength": ""
}

ChatGPT 3.5 costs 0.002$ per 1000 tokens. We can use the OpenAI tokenizer to estimate the number of tokens constituting our input prompt.

Our input is 420 tokens. The output should be less than 50 tokens. So we can round to assume 500 tokens per query. So, it will cost us about $1 to process 1000 comments. Much cheaper than paying a human!

In the old days, you could pass a list of inputs into chatGPT ‘completions’ model all at once. This is no longer possible. Now, to use the ‘chat/completions’ API requires looping through each of the inputs and making individual requests. Unfortunately, the API often fails or hits the request rate limit. So, we need to be smart about staging and error handling with this larger loop. The structure of this loop is to define the prompt, wait 18 seconds to avoid the rate limit, run a tryCatch block to test if the API call fails, and if so, it skips to the next record and logs the records that the error occurred on, otherwise, parse the response and store the output in a file.

After getting initial responses, I also want to rerun 500 randomly selected comments in order to check chatGPT’s consistency. This is a critical part of using a generative model in quantitative analysis. I’ll talk more about this later.

Here’s the loop. It will take quite a while depending on your rate limit. I’d suggest either running it overnight or putting in on a remote server. Because we write each response out to file, there’s no problem if it fails. Just note the number of the last successful iteration (which will be printed to the screen) and start back up there.

set.seed(7097)

# Randomly select 500 records to resample
IDs_to_resample <- sample(unique(EIS_comments_deduplicated$ID), 500, replace = FALSE)
ID_list <- c(unique(EIS_comments_deduplicated$ID), IDs_to_resample)

# Create a vector to store failed IDs
failed_ids <- c()

ID_list <- Still_need_IDs

for (i in 1:length(ID_list)) {
  ID_number = ID_list[i]
  # Define the prompt
  prompt_content <- paste0( "Here is the text of the public comment: '", EIS_comments_deduplicated %>%
        filter(ID == ID_number) %>%
        .$Content,
      "'.
    State which alternative the commenter is most likely to favor (A, B, C, D).
State if the comment is 'For', 'Against', or 'Neutral' on wolf introductions.
State if the strength of the commenter's opinon on a scale from 'Extremely strong', 'Very strong', 'Strong', 'Somewhat strong', or 'Mild'.
Produce the output in json format like this:\n{\n\"favored_alternative\": \"\",\n\"wolf_opinion\": \"\",\n\"opinion_strength\": \"\"\n}"
    )
  
  # Initialize gpt_response
  gpt_response <- NULL
  
  # With my account, I can make 3 requests per minute. To avoid denied API calls, I add a 18 second pause in each loop.
  Sys.sleep(18)
  
  tryCatch({
    # Call GPT for a response
    gpt_response <- 
      POST(
        url = "https://api.openai.com/v1/chat/completions", 
        add_headers(Authorization = paste0("Bearer ", read_lines("../credentials/openai.key"))),
        content_type_json(),
        encode = "json",
        body = list(
          model = "gpt-3.5-turbo",
          messages = list(
            list(
              "role" = "system",
              "content" = "You are a federal employee tasked with reading the following comment submitted by a member of the public in response to the The Isle Royale National Park Moose-Wolf-Vegetation Management Plan/EIS. The Plan/EIS is a document that evaluates management alternatives for the moose and wolf populations on the island National Park land.
Management alternatives include:
- Alternative A: No Action. Continue the current management of letting nature take its course, without any intervention or manipulation of the moose or wolf populations or their habitats.
- Alternative B: Immediate Wolf Introduction. Introduce 20-30 wolves over a three-year period, starting as soon as possible to reduce the moose population and its impacts on vegetation.
- Alternative C: Wolf Introduction after Thresholds are Met. Introduce wolves if certain thresholds are met, such as the extirpation of wolves, the overabundance of moose, or the degradation of vegetation. The number and timing of wolf introductions would depend on the conditions at the time.
- Alternative D: Moose Reduction and Wolf Assessment. Reduce the moose population by lethal and non-lethal means, such as hunting, contraception, or relocation. The goal would be to lower the moose density to a level that would allow vegetation recovery and assessing introducing wolves to the island in the future."
            ),
            list(
              "role" = "user",
              "content" = prompt_content
            )
          )
        )
      )
    print(paste0("API call successful for ID: ", ID_number, ", index: ", i))
  }, error = function(e) {
    # Handle API call errors
    cat("API call failed for ID: ", ID_number, ", index: ", i, "\n")
    failed_ids <- c(failed_ids, i)
  })
  
  # If the API call was successful, proceed with data wrangling and output
  if (!is.null(gpt_response)) {
    # parse the response object as JSON
    content <- content(gpt_response, as = "parsed")
    
    # Assign the ID to the GPT response
    gpt_response_df <- data.frame(
      response_id = ID_number,
      gpt_response = content$choices[[1]]$message$content
    )
    
    # Convert the JSON to a dataframe and join to the record data
    output <- bind_cols( EIS_comments_deduplicated %>%
        filter(ID == ID_number),
      fromJSON(gpt_response_df$gpt_response) %>% 
        as.data.frame()
    ) %>%
      mutate(response_created_time = Sys.time())
    
    # Append the data to the extant records and write the output to a file. (This is a bit less memory efficient to do this within the loop, but I )
    if (!file.exists("./EIS_GPT_responses.csv")) {
      write.csv(output, "./EIS_GPT_responses.csv", row.names = FALSE)
    } else {
      read.csv("./EIS_GPT_responses.csv") %>%
        mutate(across(everything(), as.character)) %>%
        bind_rows(output %>%
                    mutate(across(everything(), as.character))
        ) %>%
        write.csv("./EIS_GPT_responses.csv", row.names = FALSE)
    }
    
    print(paste0("Completed response ", i))
  }
}

# Log the failed IDs to a file
if (length(failed_ids) > 0) {
  write.csv(data.frame(ID = failed_ids), "./failed_ids.csv", row.names = FALSE)
  cat("Failed IDs logged to 'failed_ids.csv'\n")
}

ChatGPT is nondeterministic, so your responses will differ. You can download the responses I got to follow along.

download.file(
     url = "https://www.azandisresearch.com/wp-content/uploads/2023/09/Final_GPT_Responses.csv",
     destfile = "./data/GPT_output.csv"
)

GPT_output <- read.csv("./data/GPT_output.csv")

GPT_output %>% glimpse()
Rows: 2,470
Columns: 13
$ ID                     93, 440, 2164, 636, 839, 2335, 36, 487, 1268, 2303, 1781, 60, 1033, 1948, 1826, 1538, 1685, 308, 22…
$ Received               "7/29/2015 9:09", "8/9/2015 5:14", "8/27/2015 14:36", "8/18/2015", "8/25/2015", "8/28/2015 12:30", …
$ Correspondence         "Web Form Correspondence", "Web Form Correspondence", "Web Form Correspondence", "Web Form Correspo…
$ Content                "\"100% o wolves examined since 1994...have spinal anomalies.\"- -Of the six alternatives put forth…
$ form_duplicate         NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Content_dup            "\"100% o wolves examined since 1994...have spinal anomalies.\"- -Of the six alternatives put forth…
$ favored_alternative    "C", "C", "Alternative D", "C", "C", "B", "C", "C", "D", "C", "Unknown", "C", "B", "A", "B", "A", "…
$ wolf_opinion           "For", "Against", "Neutral", "For", "Neutral", "For", "For", "For", "Against", "For", "Neutral", "F…
$ opinion_strength       "Very strong", "Very strong", "Strong", "Strong", "Somewhat strong", "Very strong", "Strong", "Stro…
$ response_created_time  "32:19.2", "33:11.7", "33:16.9", "33:19.5", "34:35.2", "34:54.2", "34:55.4", "36:15.1", "36:16.3", …
$ Favored_alternative    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Wolf_opinion           NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Opinion_strength       NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

A couple of interesting things to note here. First, I apparently was not specific enough in my instructions for classifying the favored alternative because chatGPT sometimes returns “Alternative B” instead of just “B”. This is one of the struggles with using chatGPT, a generative model, in this way. It strays from instructions just like human survey respondents when inputting free-text results. For example, common responses to the survey question, “How are you feeling on a scale from 1 (bad) to 10 (good)?” might be “I’m good” or “Okay” or “nine” or “0”. None of those answers fit the instructions, so we have to clean them up.

In the case of chatGPT, we might be able to reduce these errors with more specific prompt engineering. For now, we’ll just clean up the responses on the backend.

# Fix erroneous column names
GPT_output <-
     GPT_output %>%
     mutate(
          favored_alternative = ifelse(is.na(favored_alternative), Favored_alternative, favored_alternative),
          wolf_opinion = ifelse(is.na(wolf_opinion), Wolf_opinion, wolf_opinion),
          opinion_strength = ifelse(is.na(opinion_strength), Opinion_strength, opinion_strength)
          ) %>%
     select(
          -Wolf_opinion,
          -Favored_alternative,
          -Opinion_strength
     )

# There are probably more elegant ways to write generalized rules to classify these reponses, but this does the trick
GPT_output <-
     GPT_output %>%
     # Fix 'favored alternative' responses
     mutate(
          favored_alternative_edit = case_when(
               (grepl(" and ", favored_alternative) | grepl(" or ", favored_alternative) | grepl("/", favored_alternative) | grepl("&", favored_alternative) | favored_alternative == "B, D") & !grepl(" and Wolf ", favored_alternative) & !grepl("N/A", favored_alternative) ~ "Multiple",
               grepl("\\bAlternative A\\b", favored_alternative) | favored_alternative %in% c("A", "No Action (A)") ~ "A",
               grepl("\\bAlternative B\\b", favored_alternative) | favored_alternative == "B" ~ "B",
               grepl("\\bAlternative C\\b", favored_alternative) | favored_alternative %in% c("C", "Concept C") ~ "C",
               grepl("\\bAlternative D\\b", favored_alternative) | favored_alternative == "D" ~ "D",
               TRUE ~ "Other"
          )
     ) %>%
     # Fix 'opinion strength' responses
     mutate(opinion_strength = tolower(opinion_strength)) %>%
     mutate(
          opinion_strength_edit = case_when(
               opinion_strength %in% c("strong", "very strong", "mild", "somewhat strong", "extremely strong") ~ opinion_strength,
               TRUE ~ "other"
          )
     ) %>%
     # Fix 'wolf opinion' responses
     mutate(wolf_opinion = tolower(wolf_opinion)) %>%
     mutate(
          wolf_opinion_edit = case_when(
          wolf_opinion %in% c("for", "against", "neutral") ~ wolf_opinion,
          TRUE ~ "other"
          )
     )

Let’s take a look at the results.

We can see that the majority of comments favor Alternative B: immediate wolf introduction. However, if we exclude the duplicated comments, our conclusion shifts to a majority in favor of the more moderate Alternative C: introduce wolves only after certain thresholds are met. Almost no one supports Alternative D: moose reduction and wolf assessment.

Comments that favored Alternative A were stronger proportionally. Alternative B supporters had mostly strong opinions but very few extremely strong or mild opinions. Supporters of Alternatives C and D were the least opinionated.

Validating chatGPT responses

It is worth asking ourselves how reliable chatGPT is at classifying these responses. One way to test this is to rerun a subset of comments, like we did above and check for agreement. This is called inter-rater reliability* (IRR).

* Although, maybe it should be called intra-rater reliability in this case. I guess it depends on out definition of ‘individual’ with LLM queries, but that’s a very philosophical bag of worms!

First, we need to subset our dataset to the responses that we scored twice.

IRR_comparisons <- 
     GPT_output %>%
     group_by(ID) %>%
     arrange(response_created_time) %>%
     mutate(ID_row_count = row_number()) %>%
     filter(ID_row_count <= 2) %>%
     mutate(n = n()) %>%
     filter(n > 1) %>%
     ungroup()

Then we can see how reliably the favored alternative was scored,

IRR_comparisons %>%
     select(ID, favored_alternative_edit, ID_row_count) %>%
     pivot_wider(
          id_cols = "ID",
          names_from = "ID_row_count",
          values_from = "favored_alternative_edit",
          names_prefix = "val"
     ) %>%
     group_by(val1 == val2) %>%
     tally() %>%
     mutate(
          total = sum(n),
          prop = n/total
     )
# A tibble: 2 × 4
  `val1 == val2`     n total  prop
              
1 FALSE              2   500 0.004
2 TRUE             498   500 0.996

ChatGPT gave consistent responses in 498 out of 500 cases. That’s pretty good! Let’s look at the comments where it disagreed with itself.

IRR_comparisons %>%
     select(ID, favored_alternative_edit, ID_row_count) %>%
     pivot_wider(id_cols = "ID", names_from = "ID_row_count", values_from = "favored_alternative_edit", names_prefix = "val") %>%
     filter(val1 != val2)
# A tibble: 2 × 3
     ID val1  val2 
1   288 C     B    
2  1160 B     C    
 
EIS_comments_deduplicated %>%
     filter(ID == 288) %>%
     .$Content %>%
     cat()
There should be a balance between the wolf population and moose. When it is not balanced there is more harm than good done to the environment. Please introduce more wolves on this island instead of decreasing their population and this will keep the moose in check. Please add more wolves to contain the moose population. So many wolves are under attack in other states and decreasing their population is NOT the answer. It only creates more problems to the environment. There should be intense management of the wolf population to help it thrive and return the land back to it's natural state where there are enough moose and wolves. I think the public should be consulted as far as future plans for any culling. There should be intense management to monitor the effects of climate change as this will affect all aspects of wildlife and plant life on the island. I do not like the idea of a moose cull. I like the idea of introducing more wolves to the island so long as there is harmony with the existing wolves on the island. Maybe possibly try to introduce another type of animal that would be a good balance with the wolves and moose but only if it does not disrupt the balance and create new problems. Other states have adopted disastrous wolf culling plans that are only in the interests of farmers and ranchers. As the wolf population is dwindling, other problems will begin to develop as there is not a proper balance. Please keep wolves in mind and do your best to increase their population before it is too late and more animals will be needlessly killed without the proper balance of mother nature.> 
 
EIS_comments_deduplicated %>%
     filter(ID == 1160) %>%
     .$Content %>%
     cat()
I have heard both sides of this situation and I believe that new wolves should be introduced on Isle Royale. Climate change has made a large impact on the amount of ice that freezes in the Isle Royale region. Previously wolves from the mainland could cross the ice that formed and take up residence on the Isle. The ice hasn't been stable enough for these crossings in the last few years and the wolves are becoming inbred and dying off. If you will check a video that I have watched about the wolves being reintroduced to Yellowstone, you will see that the ecology of the region is benefited by the wolves being there. If enough wolves are transported to Isle Royale, the wolves will keep the moose in check and the ecology will improve. Allowing the pack to die off is really not a positive move. Introducing a new bloodline to the pack will help. I believe the wilderness designation of Isle Royale is a positive thing and that the wolves help to keep the ecosystem there in good order. Thank you for taking comments from the public.

In both cases, chatGPT vacillated between classifying the comment as favoring alternative B or C. Difference between those alternatives is admittedly nuanced. Both alternatives propose replacing wolves, the only difference is in the timing. In Alternative B, wolves would be introduced immediately and in Alternative C wolve would be introduced, “if certain thresholds are met, such as the extirpation of wolves, the overabundance of moose, or the degradation of vegetation. The number and timing of wolf introductions would depend on the conditions at the time.”

Both of the comments that made chatGPT disagree with itself focus on the environmental conditions that wolf introductions might remedy. However, these comments seems to presuppose that those conditions have been met and seem to suggest immediate introduction is necessary. So, I can see where chatGPT might have a hard time solidly classifying these comments.

Let’s also check the IRR for chatGPT’s classification of ‘opinion strength.’ Unlike the favored alternative, where most folks explicitly stated their preference, classifying the strength of an opinion is a far more subjective task.

IRR_comparisons %>%
     select(ID, opinion_strength_edit, ID_row_count) %>%
     pivot_wider(
          id_cols = "ID",
          names_from = "ID_row_count",
          values_from = "opinion_strength_edit",
          names_prefix = "val") %>%
     group_by(val1 == val2) %>%
     tally() %>%
     mutate(
          total = sum(n),
          prop = n/total
     )
# A tibble: 2 × 4
  `val1 == val2`     n total  prop
              
1 FALSE              5   500  0.01
2 TRUE             495   500  0.99

ChatGPT disagreed with itself in 5 cases, but gave reliable classifications 99% of the time. That’s pretty good! However, just assessing binary disagreement or agreement isn’t a strong metric for this variable. A switch from “extremely strong” to “very strong” is less of an issue than a vacillating from “extremely strong” to “mild”.

Instead, we can use the Krippendorff’s Alpha. This metric provides a formal way to assess the the amount of inter-rater disagreement. There are multiple metrics that we could use, but Krippendorff’s Alpha is nice because it can generalize to any number of reviewers and can handle many types of disagreement (i.e. binary, ordinal, interval, categorical, etc.). Here’s a great post for understanding Krippendorff’s Alpha. We’ll use the irr package to estimate it.

library(irr)

The irr package needs the dataset in wide format matrix with one row per reviewer and each record (the package calls records ‘subjects’ because this metric is traditionally used in social science research) as a column. For this analysis, we’ll consider the first and second responses from chatGPT as individual reviewers. We also need to enforce the order of our opinion strength levels; otherwise, R will naturally order them alphabetically.

IRR_comparisons %>%
     mutate(opinion_strength_edit = fct_relevel(
          opinion_strength_edit,
          c(
               "other",
               "mild",
               "somewhat strong",
               "strong",
               "very strong",
               "extremely strong"
           )
     )) %>%
     select(
          ID,
          opinion_strength_edit,
          ID_row_count
     ) %>%
     pivot_wider(
          id_cols = "ID_row_count",
          names_from = "ID",
          values_from = "opinion_strength_edit",
          names_prefix = "ID_"
     ) %>%
     select(-ID_row_count) %>%
     as.matrix() %>%
     kripp.alpha(method = "ordinal")
  
Krippendorff's alpha

 Subjects = 500 
   Raters = 2 
    alpha = 0.996

Krippendorff’s Alpha ranges from -1 to 1, where 1 means perfect concordance, 0 means random guesses among reviewers, and -1 is perfect negative concordance. At .996, we are pretty near perfect reliability.

For many datasets, there will be a lower degree of IRR. But, it is important to remember to interpret the alpha value in context. Perfect concordance may not be realistic, especially in highly subjective classifications. In most cases our goals is not perfect concordance, but simply greater reliability than we’d get if we hired a bunch of humans to do the annotating. Preliminary evidence seems to indicate that even version 3.5 of chatGPT is more reliable than humans (even domain experts!) in subjective classification tasks.

In most cases, you won’t have the resources to get human annotations for an entire dataset for comparison. Instead, you could 1.) get human annotations for a small subset, 2.) use a similar benchmark dataset, or 3.) spot-check responses yourself. If you choose to spot check, I’d suggest rerunning chatGPT multiple times (> 3) in order to estimate the variance in responses. High variance responses indicate especially difficult classifications that you should target for spot-checks. Another tip is to ask chatGPT to return it’s justification with each response. Ultimately, this process will help you diagnose problematic types of responses and enable you to engineer better prompts to deal with those edge cases.

The bottom line is that working with chatGPT is less like working with a model and more like working with human raters–and all of the validation tasks that entails.

Analysis with token embeddings

Up to this point, we’ve presupposed the classifications we wanted ChatGPT to identify in our data. But, what if we wanted to uncover hidden categories in the responses? Folks could advocate for the same Alternative but for different reasons. For example, among those who favor Alternative C, some might argue from the perspective of climate change and some from the perspective of moose populations.

We can use token embeddings to uncover hidden clusters of topics in our responses. Embeddings are the way that LLMs encode free text into numeric form. Each token or ‘unit of language’ is numerically described as a position in multidimensional language space. This is a huge advantage over more traditional language clustering methods that simply count the occurrence of certain words. Embeddings retain the context of each token as it exists in the document.

Embeddings allow us to retain the context of text by expressing tokens in multidimensional language space.

As a toy example, the word “train” in these sentences: “I train a model”, “I train for a marathon”, “I rode the train”, “I’m on the Soul Train” could be described in two dimensions of more or less metaphorical and noun/verb. If we do this for all of the words in a document or chunk of text, we can then think of all the embeddings as a point cloud. Documents with highly overlapping point clouds are more similar that those that don’t overlap at all.

We call a different OpenAI model, text-embedding-ada-002, to return the embeddings. Unlike the chat model, we can pass all of the responses as a list in a single call, instead of looping through each response. This makes embeddings much faster and cheaper than using the chatGPT API.

Prior to embedding, I like to remove non-alpha numeric characters from the text.

# Clean up the text to remove non-alpha numeric characters
input_to_embed <- 
     EIS_comments_deduplicated %>%
     mutate(Content_cleaned = str_replace_all(Content, "[^[:alnum:]]", " "))

# Call OpenAI for the embeddings
embeddings_return <- 
     POST(
          "https://api.openai.com/v1/embeddings",
          add_headers(Authorization = paste0(
               "Bearer ", read_lines("../credentials/openai.key"))
          ),
          body = list(
               model = "text-embedding-ada-002",
               input = input_to_embed$Content_cleaned
               ),
          encode = "json"
     )

The returned object is a bit convoluted. We can use a bit of purrr and jsonlite to extract the embeddings.

# Extract the embeddings from the API return
embeddings_list <-
     embeddings_return %>%
     content(as = "text", encoding = "UTF-8") %>%
     fromJSON(flatten = TRUE) %>%
     pluck("data", "embedding")

Then add the embeddings back into the dataframe.

# Combine the embeddings with the original data
EIS_GPT_embeddings <- 
     EIS_comments_deduplicated %>%
     as_tibble() %>%
     mutate(
          embeddings = embeddings_list,
          ID = as.character(ID)
     ) %>%
     left_join(
# We need to get only the first instance of the GPT response data, which also included the repeated reliability test responses, to know which alternative the comment favors
          GPT_output %>%
               group_by(ID) %>%
               arrange(response_created_time) %>%
               mutate(ID_row_count = row_number()) %>%
               filter(ID_row_count == 1) %>%
               ungroup() %>%
     select(
          ID,
          favored_alternative_edit,
          opinion_strength_edit
          )
     )

Topical clustering from text embeddings

The problem is that those point clouds exist in extremely high dimensions. OpenAI’s text-embedding-ada-002 model returns 1536 dimensions. We need a method to reduce that complexity into something useful.

As mentioned, the embeddings allow us to see how comments relate in high-dimensional language space. We want to figure out where there are denser clusters of point clouds in that space which indicate common themes in the comments.

A couple of common ways to do this is to use a clustering algorithm (e.g. K-means) or dimension reduction (e.g. PCA). For this tutorial I want to use a bit of a hybrid approach called t-SNE (t-distributed Stochastic Neighbor Embedding) that will allow us to easily visualize the clusters of common comments which we can then explore.

We’ll use Rtsne package which requires that the data be in matrix form.

library(Rtsne)

# Rtsne requires the embeddings to be in matrix form, so we extract the lists of emdeddings from the dataframe and convert them to matrix form.
openai_embeddings_mat <-
     matrix(
          unlist(
               EIS_GPT_embeddings %>%
               .$embeddings
               ),
          ncol = 1536,
          byrow = TRUE
     )

# Estimate tSNE coordinates
set.seed(7267158)
tsne_embeddings <-
     Rtsne(
          openai_embeddings_mat,
          pca = TRUE,
          theta = 0.5,
          perplexity = 50,
          dims = 2,
          max_iter = 10000
     )

Determining the proper theta (i.e. learning rate) and perplexity (basically an estimate of how close points are in relation to the expected groupings) is more of an art than a science. This post does a great job of exploring choices for these parameters. By setting pca = TRUE in this case, we are first reducing the dimensionality to 50 principal components and then using tSNE to do the final reduction to two visual dimensions.

# Extract the tSNE coordinates and add them to the main dataset
EIS_GPT_embeddings <- 
     EIS_GPT_embeddings %>%
     mutate(
          tsne_dim1 = tsne_embeddings$Y[,1],
          tsne_dim2 = tsne_embeddings$Y[,2]
     )

# Visualize the tSNE plot
EIS_GPT_embeddings %>%
     ggplot(aes(x = tsne_dim1, y = tsne_dim2)) +
     geom_point(alpha = 0.5, pch = 16)

The tSNE plot uncovers some weak groupings, but there are no extremely clear delineation between most comments. This is likely a symptom of low diversity in comments and the fact that most of our comments are very short, so there is less signal in the content.

The first thing to note is that we are not seeing much discrete grouping of the points. This tells us that that the comments share a lot more in common across all comments than across local groups of comments. The second thing to notice is that despite the spread, we do see a handful of groups budding off along the periphery. In fact, one group in the bottom right is very distinct. It is important to remember that, unlike PCA, the axis dimensions in tSNE are meaningless. In fact, I’ll remove them from plot for the rest of the post. Position doesn’t matter in tSNE–only relative closeness.

At this point, we might want to manually delimit groups that we want to analyze further, like pulling out all of the comments from that cluster in the top left. To make this a bit easier, I’ve opted to cluster the two dimensional tSNE with hierarchical clustering. It is important to realize that this is purely a convenience for visualization. If we really wanted to use clustering to directly define groups (like hierarchical, KNN, etc.), it would make much more sense to cluster directly on the first 50 principle components.

tsne_embedding_clusters <- 
     hclust(
          dist(tsne_embeddings$Y), 
          method = "average"
     )

EIS_embeddings_clustered <-
     EIS_GPT_embeddings %>%
     mutate(
          cluster = cutree(tsne_embedding_clusters, 7)
)

Since we are clustering on tSNE dimensions where distance doesn’t really matter, deciding where to set our breakpoint is a personal choice. I’ve decided to use 7 clusters because it seemed a natural breakpoint and recovered the obvious clusters.

Using hierarchical clustering, we can cluster on the tSNE coordinates. Since tSNE coordinates are mostly meaningless, deciding how many clusters to split the data into is a bit arbitrary.

Text analysis of topical clusters

Now that we have putative clusters of topics, we can perform some classic natural language processing (NLP) to illuminate the themes of those topics. We’ll use tidytext for this task.

library(tidytext)

First, we need to get the data into a long, tidy format where each word in every comments is its own row. We’ll also remove common stop words that are predefined in the tidytext library. Then, we can calculate the term frequency-inverse document frequency (TF-IDF) for the clusters. TF-IDF is basically a measure of how common a word is within a cluster, after accounting for how common a given words is overall.

For example, if we take a look at the most common words in each cluster, it is unsurprising that “wolves”, “moose”, “isle” and “royale” dominate. (Although it is interesting that the top words for clusters 4 and 7 are “wilderness” and “management”… more on that later).

Unsurprisingly, when considering the most common words, “wolves”, “moose”, and “isle” dominate.

However, TF-IDF tells us about the relatively unique words that define a cluster of comments. Some clusters, like 1 and 2 have very even tf-idf distribution and the important words are mostly filler or nonsense words. This happens when clusters are saturated with common words and there is no strong theme producing uniquely important words. We could have guessed from the tSNE plot of the embeddings that the bulk of comments in the center of the plot would fall in this lexical no-man’s-land. But! Clusters 3, 4, 5, and 7 show promisingly skewed distributions.

TF-IDF is a measure of uniquely important words in a ‘document’ (or cluster, in this case) relative to common words across all documents.

Cluster 3 seems to orient towards a topic of animal welfare, with words like, “contraception”, “sterilization”, “lethal”, and “culls”. I suspect that these comments speak to folks’ concerned less about the wolf population or wilderness management, and more about the ethics of any proposed action involving animals. In a similar way, it looks like Cluster 7 is more concerned with the science and measurement behind the management decision and less about the decision itself with words like, “evaluating”, “approximately”, and “tools” with high uniqueness and “management” as the most common word overall. These topics would have been completely lost if we had stopped at categorizing favored alternatives.

Meanwhile cluster 4 appears to be squarely concerned with Wilderness issues. “Wilderness” and “nature” are the most common words in this cluster and “untrammeled” and “unmanipulated” are the most uniquely important words. We might expect that most of the comments that chatGPT categorizes as favoring alternative A will fall into cluster 4.

We can also take a look at how the clusters map onto the chatGPT categorizations.

Mappin the chatGPT categorized ‘favored alternative’ onto the tSNE coordinates, we can see that topical clusters mostly conform to

Mappin the chatGPT categorized ‘favored alternative’ onto the tSNE coordinates, we can see that comments roughly sort by favored alternative. Cluster 6 is almost entirely defined by support for Alternative B – immediate wolf introduction. Cluster 4, which seemed to orient towards Wilderness values is mostly comprised of comments in support of Alternative A – no action.

Cluster 7 and Cluster 3, are mostly skewed to Alternative C – more monitoring, but exhibit very similar distributions. This might be a great example where even folks who tend to agree on the same Alternative, do so for different reasons–a pattern we would have totally missed without text analysis.

The remaining clusters which compose the bulk of the midland in the tSNE plot favor a mix of Alternatives.

Chain-of-density summarization

We can learn a lot from looking at common and important words and using our human judgement to piece together the topical theme of each cluster. Ideally, we would read all of the comments in a cluster to develop a topical summary. But that would take a long time. As an alternative, we can pass all of the comments in a given cluster to an LLM and have it summarize the theme.

Currently, only a handful of models support context windows large enough to digest the entirety of the comments in our clusters. Anthropic’s Claude2 has a context widow of up to 100k tokens (rough 75,00 words). Although, it isn’t quite as good at chatGPT 4. To get the most out of Claude2, we can use a special type of prompting developed for summarization called “chain-of-density”. Chain-of-density prompting forces the model to recurrently check it’s own output to maximize the density and quality of its summarization. Research shows that people tend to like the chain-of-density summaries even better than human-written summaries of new articles.

For demonstration, we’ll use chain-of-density prompting to summarize the theme of cluster 3. Here is the prompt that we will pass to Claude2:

"You will generate increasingly concise entity-dense summaries of the semicolon separated comments included below.

The comments were submitted by a member of the public in response to the The Isle Royale National Park Moose-Wolf-Vegetation Management Plan/EIS. The Plan/EIS is a document that evaluates management alternatives for the moose and wolf populations on the island National Park land.

Now that you know the context, here are the semicolon separated survey response:

[INSERT SEMICOLON SEPARATED COMMENTS]

Instructions: You will generate increasingly concise entity-dense summaries of the above semicolon separated comments. Repeat the following 2 steps 5 times.

Step 1: Identify 1-3 informative entities (delimited) from the comments which are missing from the previously generated summary.
Step 2: Write a new denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.

A missing entity is
- Relevant: to the main themes of the comments.
- Specific: descriptive yet concise (5 words or fewer).
- Novel: not in the previous summary.
- Faithful: present in the comments.
- Anywhere: located in the comments.

Guidelines:
- The first summary should be long (7 - 5 sentences, ~100 words), yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., 'these comments discuss') to reach ~100 words.
- Make every word count. Rewrite the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like 'these comments discuss'.
- The summaries should become highly dense and concise, yet self-contained, e.g., easily understood without having read the comments.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

Remember: Use the exact same number of words for each summary.

Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are 'missing_entities' and 'denser_summary'."

I don’t have API access to Claude2, but unlike chatGPT, Claude2 accepts files as input. So, it is easy to output our prompt with the contained comments as a .txt file and submit it to Claude2 manually.

Claude2 gives us five increasingly dense summaries:

Summary 1: These comments provide a range of perspectives on how the National Park Service should manage the declining wolf population and increasing moose population on Isle Royale. There is discussion of various preliminary alternative concepts, including taking no action, reintroducing wolves once, maintaining both moose and wolves, reducing the moose population, and intensively managing the moose population. Many commenters express support for alternative C, which involves reintroducing and augmenting the wolf population on Isle Royale more than once if necessary. There is also frequent mention of the possibility of relocating problem wolves from elsewhere to the island. Overall, most commenters emphasize the importance of restoring the natural wolf/moose balance on Isle Royale through wolf reintroduction rather than intensively managing the moose population.

Summary 2: These comments provide perspectives on managing the declining Isle Royale wolf and increasing moose populations. Many support alternative C to reintroduce and augment wolves more than once if needed. Wolf relocation from the mainland is also mentioned. Most emphasize restoring the wolf/moose balance through wolf reintroduction rather than intensively managing moose. Climate change reducing ice bridges for wolf migration and the wilderness designation of Isle Royale are frequently discussed. Many oppose lethal moose culling, instead supporting moose contraception/sterilization. Overall there is a focus on restoring the natural predator-prey dynamic through wolf reintroduction while minimizing intensive human management due to the wilderness designation.

Summary 3: These comments discuss managing the declining Isle Royale wolf and increasing moose populations. Many support alternative C to reintroduce and augment wolves more than once if needed to restore the wolf/moose balance. Wolf relocation and genetics are often mentioned given reduced migration from climate change. The wilderness designation and opposition to lethal moose culling are frequently discussed, with support for contraception/sterilization. There is concern about moose overbrowsing without sufficient wolves. Some mention wolf tourism benefits. Overall there is emphasis on restoring the natural predator-prey dynamic and wolf genetics through reintroduction while minimizing intensive human management due to the wilderness designation.

Summary 4: These comments discuss managing the declining Isle Royale wolf and increasing moose populations, often supporting alternative C to reintroduce and augment wolves more than once to restore balance. Wolf relocation and genetics are frequently mentioned to compensate for reduced migration from climate change. Many oppose lethal moose culling, instead supporting contraception/sterilization, expressing concerns about moose overbrowsing and ticks without wolves. Some mention wolf tourism benefits and preserving the wilderness designation. There is emphasis on restoring balance through wolf reintroduction to limit moose impacts on vegetation while minimizing intensive human management.

Summary 5: These comments frequently support alternative C to reintroduce and augment Isle Royale wolves to restore wolf/moose balance and predator-prey dynamics, concerned about moose overbrowsing, ticks, and vegetation without sufficient wolves. Wolf relocation and genetics are often proposed to compensate for reduced migration from climate change. Many oppose lethal moose culling, noting wilderness values. Some mention continuing the iconic predator-prey research, wolf tourism benefits, wolf-livestock issues, and the value of public engagement. Overall there is emphasis on reestablishing a healthy Isle Royale wolf population through reintroduction to minimize intensive human management and moose impacts.

The final two summaries start to get at the nuance particular to this cluster. Summary 4 mentions, “Many oppose lethal moose culling, instead supporting contraception/sterilization”. Summary 5 mentions: “Many oppose lethal moose culling, noting wilderness values.” As expected, much of the summary is likely common across all comments. So, a human interpretation of the summaries is still necessary.

Conclusion

In this post, we explored using chatGPT, text embeddings, and Claude2 to analyze public comments on a complex environmental management decision. We learned methods to responsibly validate chatGPT output. While not perfect, chatGPT showed promising reliability at categorizing free-form opinions. The text embeddings allowed us to uncover hidden topical clusters among comments that traditional methods would have missed. Claude2’s long context window allowed us to further interpret the topical clusters. Together, these tools enabled a nuanced quantitative analysis of subjective text data that would be infeasible for a single human analyst to perform manually.

Arctic Genes in Alaska Magazine

azandis@gmail.com — Sat, 10 Dec 2022 14:14:55 +0000

An article I wrote about an expedition to collect wood frogs in the Alaska Arctic is now online at Alaska Magazine. I’ve included the teaser below, but check out the whole article here.

I am deep in the Alaskan Arctic, 300 miles from the nearest road system, attempting to conduct the kind of science that usually requires a specialized laboratory. We rowed 30 miles of meandering flatwater today, bringing our total to 200 river miles in 12 days since we landed at a lonely gravel bar on the headwaters of Ambler River in Gates of the Arctic National Park.

Mosquitoes spangle the tent canopy arching over me. Backlit by summer solstice sun, the silhouettes of the insects make an inverted night sky of shifting constellations. The sun never sets on the banks of the Kobuk River this time of year. It hangs high above the horizon even now at 11 p.m., transforming my tent into a solar oven as I, ironically, work to uncover the secrets of a frog that can turn into ice.

… Read the rest of the article here.

Chasing Arctic Frogs

azandis@gmail.com — Tue, 17 Aug 2021 19:13:54 +0000

A short recipe for adventurous field science

Take me to the photos!

Step 1: Come up with a hair-brained scheme.

My labmate Yara and I had been dreaming up the idea studying wood frog genomes from across the species’ range since she started her PhD. Wood frogs have the largest range of any North American amphibian. They also happen to be the only North American amphibian that can survive North of the Arctic circle.

Our 200 mile route (in orange) from the headwaters of the Ambler River in Gates of the Arctic National Park, down the Kobuk River through Kobuk Valley National Park Wilderness, and out to the village of Noorvik where the Kobuk meets the Arctic Ocean.

Dr. Julie Lee-Yaw had done a similar study back in 2008. She embarked on a road trip from Quebec all the way up to Alaska to collect wood frog tissue. So, out first step was to ask Dr. Lee-Yaw if she would collaborate and share her samples.

Those samples gave us a solid backbone across the wood frog range, but we were missing population in expansive regions north and west of the road systems. We worked with the Peabody Museum to search for tissue samples that were already housed in natural history collections around the world. We filled a few gaps, but huge portions of the range were still missing.

We knew that there must be samples out there sitting in freezers and labrooms that were not catalogued in museum databases. So, our next step was to begin sleuthing. We looked up author lists from papers and cold-called leads. I even reached out to friends on Facebook (…which actually turned out to be a big success. The aunt of a friend from undergrad happens to do herpetology research in Galena, Alaska and was able to collect fresh samples for us this year!). This effort greatly expanded our sample coverage with new connections (and friends) from Inuvik and Norman Wells in the Northwest Territories, Churchill on the Hudson Bay, and the Stikine River Delta in Southeast Alaska.

But as the points accumulated on the map, we noticed some glaring holes in our coverage. Most importantly, we had no samples from Northwestern Alaska. Populations in this region are the most distant from the ancestral origin of all wood frogs in the southern Great Lakes. If we wanted a truly “range-wide” representation of wood frog samples, we needed tissue from that blank spot on the map!

Step 2: Convince your advisor and funders it’s a good idea.

This might be the hardest step. In our case, Yara and I were lucky that our advisor, Dave, was immediately supportive of the project. After we made the case for the importance of these samples, funders came around to the idea as well.

Step 3: Make a plan …then remake it …then make a new plan yet again.

Once we knew where we required samples from, we needed to figure out how to get there. Alaska in general is remote, but northwestern Alaska is REALLY remote. The road system doesn’t stretch farther than the middle of the state. All of the communities–mainly small villages–are only accessible by plane, and most of them only have runways for tiny prop planes. Travelling out from the villages into the bush is another layer of difficulty. Most people here either travel by boat on the river or by snowmachine during the winter. Traveling on land, over the soggy and brush-choked permafrost, is brutal and most locals only do it when necessary, if at all.

Prior to academia, I made a career of organizing expeditions to the most remote places in the rugged southeastern archipelago of Alaska. Despite my background, the logistic in the Arctic were even inscrutable to me. Fortunately, I had a couple of friends, Nick Jans and Seth Kantner, who know the area well. In fact, Seth grew up in a cabin out on the Kobuk. (Seth and Nick are both talented authors. I suggest checking out Ordinary Wolves by Seth and The Last Light Breaking by Nick). With their help, I was able to piece together the skeleton of a trip.

After many logistic iterations, Yara and I decided to follow in the footsteps of local hunters who, for generations, have used the rivers as conduits into the heart of the wilderness. Our plan was to travel down one of the major arterial rivers and hike inland to search for frog as we went.

Our original itinerary was to raft the 100 mile section of the Kobuk River from just north of Ambler village to the village of Kiana. But at the last minute (literally), our plans changed. As we were loading up the plane, the pilot told us that he couldn’t fly into our planned starting point. Instead, he suggested that we fly into a gravel bar 30 miles up river in Gate of the Arctic. Those “30 miles” turn out to be AIR MILES. Following the river, it ended up adding over 60 miles to our trip.

We packed two inflatable oar rafts, almost 150 pounds of food, and another 300 pounds of camping, rescue, and science gear, into the balloon-wheeled plane. For the next two weeks, we rowed down the swift Ambler River from the headwaters to the confluence of the Kobuk. Then, we rowed down the massively wide and meandering Kobuk River, eventually extending our trip by an additional 30 miles, by-passing Kiana, and continuing to Noorvik, the last village on the river.

Step 4: Recruit a crew.

Despite being the worlds first and only Saudi Arabian Arctic Ecologist with limited camping experience, I knew Yara would be a stellar field partner. But I never like traveling in brown bear country with fewer than four people. Plus, expedition research involves too many daily chores for the two of us to manage alone. So, we recruited a team.

Sam Jordan is a dry land ecologist, but he had been willing to help me with my dissertation fieldwork in wetlands before, so I knew he would be willing to defect for a good adventure. Sam is also an exceptional whitewater paddler and all-around outdoor guru. Plus, he’s just a great guy (when he leaves his banjo at home). He and I spend two weeks floating the Grand Canyon in the dead of winter and there are few people I would want along on a remote river trip.

Kaylyn Messer and I guided sea kayak expeditions in Southeast Alaska back in our youth. I am a bit particular about how I manage my camp system (read: “extremely picky and fastidious to a fault”) on big trips. Kaylyn is one of the few people as scrupulous as me, but she’s also a super amenable Midwesterner at heart. I knew she’d be a huge help out in the field.

We fell into an effective rhythm on the trip. Each morning we woke, made breakfast, broke camp, packed the boats, and launched early in the day. While one person on each boat rowed, the other person checked the maps for frog surveying spots, fished, or photographed. We stopped along the way to bushwhack back into wetlands we’d identified from satellite images. We typically arrived at camp late. Yara and I would set up one tent to process the specimens from the day while Same and Kay made camp and cooked dinner. One of the hidden disadvantages of 24-hour Arctic sunlight is that it is easy to overwork. Most nights we only managed to get sampled finished, dinner cleaned up, and camp bearproofed with enough time to crawl into tents with just eight hours till beginning again the next day.

Step 5: Do the science.

Doing science in the field is difficult. Tedious dissections seem impossible while baking in the omnipresent sun and being alternately hounded by hundreds of mosquitoes or blasted by windblown sand. Trading lab coats for rain jackets and benchtops for sleeping pads covered in trashbags compounds the trouble. Not to mention, keeping tissues safe and cool. Organization and adaptability go a long way.

On remote, self-supported trips, it is inevitable that equipment fails or is lost. On one of the first days, we discovered that our formalin jar was leaking—and formalin is not something you want sloshing around! We cleaned the boats and found a creative solution to replace the offending container: a 750ml Jack Daniel’s bottle!

Planning ahead and engineering backup plans also helps. One of our main struggles was figuring out how to preserve specimens and get them home. It is illegal to ship alcohol by mail and you can’t fly with the high-proof alcohol needed for genetic samples. You can ship formalin, but it is difficult to fly with. To make matters worse, we were flying in and out of “dry” or “damp” villages where alcohol is strictly regulated or forbidden. Also, we happened to be flying out on a Sunday, making it impossible to mail samples home. The solution we arrived at was to ship RNAlater and formaldehyde to our hotel room ahead of time. Tissue would remain stable in RNAlater for a couple of weeks and we could make formalin to fix the specimens. After fixing, we cycled the specimens through water to leach out the formalin. This made it possible for me to fly with all of the tissue tubes and damp specimens in my carry on. Other than a few concerned looks from the TSA folks, all of the samples made it back without issue!

Step 6: Enjoy the adventure.

Despite the hard work, there was a lot to appreciate about the Arctic. We witnessed major changes in ecology as we travelled from the steep headwater streams in the mountains to the gigantic Kobuk. Every day was an entirely new scene.

Here’s an illustrated thread on our collecting trip to the Alaskan Arctic for the @Yale @yalepeabody this past summer. 1/22 pic.twitter.com/Jh3VCqVEgx

— A. Z. Andis Arietta (@azandisarietta) August 24, 2021

Step 7: Forget the hardships

Looking back, it is really easy to forget the sweltering heat, swarms of mosquitoes, inescapable sun, and freak lightning storms. And, it’s probably better to forget those anyway!

Hot competition and tadpole Olympics

azandis@gmail.com — Thu, 24 Dec 2020 12:35:58 +0000

Our newest paper (pdf available on my publications page), led by Kaija Gahm is just out in the Journal of Experimental Zoology as part of special issue on herp physiology that came out of the World Congress of Herpetology last January.

The study:

One of the most consistent findings arising from 20 years of study in our lab is that wood frogs seem to adapt to life in cold, dark ponds. In general, cold-blooded animals like reptiles and amphibians are not suited for the cold and function much better in warmer conditions. So, wood frogs that live in colder ponds should have a harder time competing against their neighbors in warmer ponds.

In response, cold-pond wood frogs seem to have developed adaptations that level the playing field. In separate experiments, we’ve found that wood frog tadpoles in cold-ponds tend to seek out warmer water (like in sunflecks) and have lower tolerance to extremely warm temperatures. Most importantly, they can mature faster as eggs and larvae.

But I’ve always struggled with a lingering question: if cold pond frogs have adapted these beneficial adaptation to compete with warm-pond frogs, what is keeping those genes out of the warm-ponds? Shouldn’t cold-pond genes in a warm pond mean double the benefits? One would expect the extrinsic environmental influence and the intrinsically elevated growth rates to produce super tadpoles that metamorphose and leave the ponds long before all the others.

Kaija, who was an undergrad in the lab at the time, decide to tackle that question for her senior thesis.

We hypothesized that there might be a cost to developing too quickly. Studies in fish suggested that the trade-off could be between development and performance. The idea is that, like building Ikea furniture, if you build the tissue of a tadpole too quickly, the price is loss of performance.

Much like assembling Ikea furniture too quickly, we hypothesized that when tadpoles develop too quickly, there might be a functional cost.

So we collected eggs from 10 frog ponds that spanned the gradient from sunny and warm to dark and cold. We split clutches across two incubators that we set to bracket the warmest and coldest of the ponds.

Then we played parents to 400 tadpoles, feeding and changing water in 400 jars two to three times a week.

Half of the 400 tadpoles we reared in temperature-controlled incubators.

We reared the tadpoles to an appropriate age (Gosner stage 35ish). Those in the warm incubator developed about 68% faster than those in the cold incubator. In addition to our lab-reared tadpoles, we also captured tadpoles from the same ponds in the wild as a comparison. Development rates in the lab perfectly bounded those in the wild.

Fig. 1. from the paper: (a,b) Temperatures in incubators and natal ponds during the 2019 season. ‘High’ and ‘Low’ refer to the corresponding temperature treatments in the lab. Two‐letter codes are abbreviations for the names of individual ponds. (c,d) Development rates of warm treatment, cold treatment, and wild tadpoles

Once they reached an appropriate size, we put them to the test. We simulated a predator attack by a dragon fly naiad by poking them in the tale. Dragonfly naiads are fast, fierce, tadpole-eating machines and a tadpole’s fast-twitch flight response is a good indicator of their chance of evading their insect hunters. It’s a measure of performance that directly relates to a tadpole’s fitness.

Above the test arenas, we positioned highspeed cameras to capture the tadpoles’ burst responses. We recorded 1245 trials, to be exact—way more than we ever wanted to track by hand. Fortunately, Kaija is a wiz at coding; and with a bit of help, she was able to write a Matlab script that could identify the centroid of a tadpole and record its position 60 times per second.

Kaija wrote a script to automatically identify tadpoles and track their movement from the high-speed videos.

We measured the tadpoles’ speed during the first half second of their burst response and looked for an association with their developmental rates. One complicating factor is that a tadpole’s fin and body shape can influence burst speeds. So, a weak tadpole with a giant fin might have a similar burst speed to a super fit tadpole with a small fin. To account for this, we took photos of each tadpole and ran a separate analysis mapping their morphometry and included body shape into our models.

Figure 2 from the paper. Lab reared tadpoles showed very similar shapes with long, narrow tails, large tail muscles, and small bodies. Wild tadpoles had much deeper tails and larger bodies. Other folks have done extensive research on the many factors like water chemistry, food quality, and even the scent of different predators that induce different body shapes, so it is not surprising that we saw so much diversity between ponds and between lab and wild tadpoles that originated from the same pond. And props to Bayla for the painting of the tadpole!

As we had hypothesized, tadpoles reared at warmer temperatures show much slower burst speed than their genetic half-sibling reared in the cold incubator. We even saw a similar, but weaker effect for the tadpoles that were allowed to develop in their natal ponds. It seems that developing too fast reduces performance.

Fig. 3 from the paper: Relationship between development rate and burst speed for (a) lab tadpoles and (b) wild tadpoles. Dots represent pond‐wise means, and in (a), lines connect means from the same pond. Marginal density plots are based on individual tadpoles rather than pond‐wise means. Orange and blue represent tadpoles reared in the high‐ and low‐temperature incubators, respectively

Thus, it certainly seems that the counter-gradient pattern we see of faster development in cold-pond populations, but not in warm-pond populations, is at least partially driven by the trade-off between development rate and performance.

In fact, it may even be the case that we’ve been viewing the pattern backwards all along. Perhaps instead we should consider if warm-pond populations have developed adaptively slower development rates to avoid the performance cost. This especially makes sense given the range of wood frogs. Our populations are at the warm, southern end of the range. Maybe this tradeoff is also a factor constraining wood frogs to the cold north of the continent?

Range map of wood frogs (Rana sylvatica).

If warm weather and faster development are a real liability for wood frogs, it is only going to get worse in the future. We know from another recent study that our ponds have been warming quickly, especially during the late spring and early summer months. But climate change is also causing snow to fall later in the winter forcing frogs to breed later. The net result is that wood frogs may be forced to develop fast intrinsic developmental rates in response to a contracting developmental window, while at the same time, extrinsic forces drive development even faster. That’s a double whammy in the trade-off with performance. And might lead to too many “Ikea furniture mistakes” at the cellular level.

As a separate part of this study, we also measured metabolic rates in out tadpoles in hopes of understanding the relationship between developmental rates, performance, and cellular respiration. I’m in the process of analyzing those data, so stay tuned for more!

Smartphone hemispherical photography

azandis@gmail.com — Wed, 16 Dec 2020 23:41:13 +0000

Hemispherical photography is one of those tasks often prefaced by the statement, “How hard could it be?” I’m pretty sure I said something like this at the beginning of my PhD when we wanted to ask how the canopy over wood frog ponds influences their larval ecology.

Now, five years, four blog posts, and uncountable hours later, I can say that measuring canopy structure with hemispherical photos is surprisingly difficult.

One of the biggest hurdles is understanding the equipment and deploying it properly in the field. For me, nothing is more tedious than standing waste deep in a mucky, mosquito-infested pond while I fiddle around with camera exposure settings and fine-tuning my leveling device. Add to that the constant fear of dropping a couple thousands of dollars of camera and specialized lens into the water, and you get a good sense of my summers.

So, it is with great pleasure that I offer an alternative method for capturing canopy photos that requires nothing but a cell phone, yet produces higher quality images than a traditional DSLR setup. This new method exploits the spherical panorama function available on most cameras (or in the free Google Street View app). Hemispherical photos can then be easily extracted a remapped from the spheres. You can check out the manuscript at Forestry here (a PDF is available on my Publications page) or continue reading while I walk through the paper below.

From figure 2 of the paper: . Comparisons of smartphone spherical panorama hemispherical photographs (SSP HP) (right B and C) to traditional DSLR hemispherical photographs (DSLR HP) (left B and C) captured at the same site. Details of the same subsection of the canopy, indicated by orange boxes, are expanded in C. Binarized images are shown below color images in B and C.

The old way

The most common way to measure canopy structure these days is with the use of hemispherical photographs. These images capture the entire canopy and sky from horizon to horizon. Assuming proper exposure, we can categorize individual pixels as either sky or canopy and run simple statistics to count the amount of sky pixels versus canopy or the number and size of the gaps between canopy pixels. We can also plot a sun path onto the image and estimate the amount of direct and indirect light that penetrated through the canopy. (You can follow my analysis pipeline in this post).

All of this analysis relies on good hemispherical images. But the problem is that there are many things that can go wrong when taking canopy photos, including poor lighting conditions, bad exposure settings, improperly oriented camera, etc. Another problem is that capturing images of high-enough quality requires a camera with a large sensor, typically a DSLR, a specialized lens, and a leveling device, which can cost a lot of money. Most labs only have one hemispherical photography setup (if any), which means that we sacrifice the number of photos we can take in favor of high-quality images.

The new way

In the past few years, researchers have tried to figure out ways to get around this equipment barrier. Folks have tried eschewing the leveling device, using clip-on hemispherical lenses for smartphones, or using non-hemispherical smartphone images. I even tested using a hemispherical lens attachment on a GoPro.

But, none of these methods really produce images that are comparable to the images from DSLRs, for three reasons:

Smartphone sensors are tiny compared to DSLR, so there is a huge reduction in quality.
Clip-on smartphone lenses are tiny compared to DSLR, so again, there is a huge reduction in optical quality.
Canopy estimates are sensitive to exposure settings and DLSRs allow for more control over exposure.

The method I developed gets around all of these issues by using multiple, individual cell phone images to stitch together a single hemispherical image. Thus, instead of relying on one tiny cell phone sensor, we are effectively using many tiny cell phone sensors to make up the difference.

Another advantage of creating a hemispherical image out of many images is that each individual image only has to be exposed for a portion of the sky. This avoids the problems of glare and variable sky conditions that plague traditional systems. An added benefit is that, smartphone cameras operate in a completely different way than DSLRs, so they are much less sensitive to exposure issues in general.

Smartphones are less sensitive to exposure issues because, unlike DSLRs that capture a single instance on the sensor when you hit the shutter button, smartphone cameras use computational photography techniques that blend the best parts of many images taken in short succession. You may not realize it, but your smartphone is constantly taking photos as soon as you turn it on (which makes sense since you can see the scene from the camera on your screen). The phone stores about 15 images at a time, constantly dumping the older versions out of temporary memory as updated images pour in. When you hit the button to take a picture, your phone then automatically blends the last few images with the next few images. The phone’s software selects the sharpest pixels with the most even contrast and color from each image and then composites those into the picture presented to you. With every new software update, the algorithms for processing images get better and better. That’s why modern cell phones are able to take photos that can compete with mid-range DSLRs despite the limitations of their tiny sensors.

So, if each phone photos is essentially a composite of 15 images, and then we take 18 of those composite images and stitch them into a hemispherical image, we are effectively comparing a sensor the size of 270 individual phone camera sensors to the DSLR sensor.

The best part is that there is already software that can do this for us via the spherical panorama feature included with most contemporary smartphone cameras. This feature was introduced in the Google Camera app back in 2012 and iOS users can access the feature via the Google StreetView app. It is incredibly simple to use.

Update: Check out my post on tips for taking spherical panoramas

Once you’ve taken a spherical panorama, it is stored in your phone as a 2D JPEG in equirectangular format. The best part about the photo sphere software is that it utilizes your phone’s gyroscope and spatial mapping abilities to automatically level the horizon. This is advantageous for two reasons. First, it means we can ditch the tedious leveling devices. Second, it means that the equirectangular image can be perfectly split between the upper and lower hemisphere. We simply have to crop the top half of the rectangular image and remap it to polar coordinates to get a circular hemispherical image.

Figure 1 from the paper: Spherical panoramas (A) are stored and output from smartphones as 2D images with equirectangular projection (B). Because spherical panoramas are automatically leveled using the phone gyroscope, the top half of the equirectangular image corresponds to the upper hemisphere of the spherical panorama. The top portion of the equirectangular image (B) can then be remapped onto the polar coordinate plane to create a circular hemispherical photo (C). In all images, zenith and azimuth are indicated by Θ and Φ, respectively.

How to extract hemispherical images from spherical panoramas

UPDATE: Please see my latest post to process spherical images with R.

Command line instructions

If you are proficient with the command line, the easiest way to extract hemispherical images from photo spheres is to use ImageMagick. After you download and install the program you can run the script below to convert all of your images with just a couple lines of code.

cd "YOUR_IMAGE_DIR"

magick mogrify -level 2%,98% -crop 8704x2176-0-0 -resize "8704x8704!" -virtual-pixel horizontal-tile -background black +distort Polar 0 -flop -flip *jpg

You may need to make a few modifications to the script for your own images. The -crop 8704x2176-0-0 flag crops the top half of the image (i.e. upper hemisphere). Be sure to adjust this to 1.00×0.25 the dimensions of your panorama dimensions. The -resize "8704x8704!" flag resizes the image into a square in order to apply a polar transformation. Be sure to adjust this to 1.00×1.00 the width of your panorama

Note that the code above will convert and overwrite all of the .jpg files in your folder to hemispherical images. I suggest that you practice on a folder of test images or a folder of duplicates to avoid any mishaps.

GUI instructions

If you are intimidated by the command line, extracting hemispherical images from photo spheres is also easy with GIMP (I used GIMP because it is free, but you can follow the same steps in Photoshop).

Update: You can also try out this cool web app developed by researchers in Helsinki which allows you to upload spherical panoramas from your computer or phone and automatically converts them to hemispherical images that you can download. However, I would not suggest using this tool for research purposes because the app fixes the output resolution at 1000p, so you lose all of the benefits of high-resolution spherical images.

Spherical panoramas are stored as 2D equirectangular projections from which hemispherical images can be extracted in GIMP.

First, crop the top half of the rectangular photo sphere.

Crop the top half of the panorama.

Second, scale the image into a square. I do this by stretching the image so that the height is the same size as the width. I go into why I do this below.

Scale the image into a square.

Third, remap the image to a polar projection. Go to Filter > Distort > Polar Coordinates

Settings for remapping the panorama into a polar projection.

Once, mapped onto polar coordinates, the image is now a circular hemispherical image.

Fourth, I found that increasing the contrast slightly helps the binarization algorithms find the correct threshold.

All of these steps can be automated in batch with BIMP plugin (a BIMP recipe is available in the supplemental files of the paper). This can also be automated from the command line with ImageMagick (see scripts above and in the supplemental materials of the paper).

The result is a large image with a diameter equal to the width of the equirectangular sphere. Because we are essentially taking columns of pixels from the rectangular image and mapping them into “wedges” of the circular image, we will always need to down sample pixels toward the center of the circular image. Remember that each step out from the center of the image is the same as each step down the rows of the rectangular image. So, the circumference of every ring of the circular image is generated from a row of pixels that is the width of the rectangular image.

With a bit of geometry, we can see that, the circumference matches the width of our rectangular image (i.e. 1:1 resolution) at zenith 57.3 degrees. Zenith rings below 57.3 will be downsampled and those above will be scaled up and new pixels will be interpolated into the gaps. Conveniently, 57.3 degrees is 1 radian. The area within 1 rad, from zenith 0° to 57°, is important for canopy estimates as gap fraction measurements in this portion of the hemisphere are insensitive to leaf inclination angle, allowing for estimated of leaf area index without accounting for leaf orientation.

Thus, we retain most of our original pixel information within this critical portion of the image, but it does mean that we are expanding the pixels (increasing the resolution) closer to the horizon. I tested the impact of resolution directly in my paper and found almost no difference in canopy estimates; so, it is probably okay to downscale images for ease of processing if high resolution is not needed.

The hemispherical image produced can be now be analyzed in any pipeline used to analyze DSLR hemispherical images. You can see the pipeline I uses in this post.

How do images from smartphone panoramas compare to DSLR

In my paper, I compared hemispherical photos taken with a DSLR against those extracted from a spherical panorama. I took consecutive photos at 72 sites. Overall, I found close concordance between measures of canopy structure (canopy openness) and light transmittance (global site factor) between the methods (R² > 0.9). However, the smartphone images were of much greater clarity and therefore retained more detailed canopy structure that was lost in the DSLR images.

Figure 4 from the paper: Difference in canopy structure and light environment estimates between reference (standard DSLR HP) and full resolution SSP HP (dark orange), low resolution SSP HP downsampled to match the standard DSLR resolution (light orange), fisheye HP (blue), and DSLR HP with exposure adjusted from +5 to -5 (light to dark). SSP HP images were generated from spherical panoramas taken with Google Pixel 4a and Google Camera. Fisheye HP images were simulated from smartphone HP for two intersecting 150° FOV images from a Pixel 4a. DSLR HP were captured with Canon 60D and Sigma 4.5mm f2.8 hemispherical lens.

Although the stitching process occasionally produces artifacts in the image, the benefits of this method far outweigh the minor problems. Care when taking the panorama images, as well as ever-improving software will help to minimize imperfect stitching.

Figure 2 from the paper: Comparisons of smartphone spherical panorama hemispherical photographs (SSP HP) (right B and C) to traditional DSLR hemispherical photographs (DSLR HP) (left B and C) captured at the same site. Details of the same subsection of the canopy, indicated by orange boxes, are expanded in C. Binarized images are shown below color images in B and C. Image histograms differ in the distribution of luminance values in the blue color plane (A). In panel E, a section of the canopy from full resolution SSP HP (left), downsampled SSP HP (middle), and DSLR HP (right) is further expanded to demonstrate the effect of image clarity on pixel classification. An example of an incongruous artifact resulting from misalignment in the spherical panorama is outlined in blue in A and expanded in D.

Overall, this method is not only a good alternative, it is probably even more accurate than traditional methods because of the greater clarity and robustness to variable exposure. My hope is that this paper will help drive more studies in the use of smartphone spheres for forest research. For instance, 360 horizontal panoramas could be extracted for basal measurement or entire spheres could be used to spatially map tree stands. The lower hemisphere could also be extracted and used to assess understory plant communities or leaf litter composition. Researchers could even enter the sphere with a virtual reality headset in order to identify tree species at their field sites from the comfort of their home.

Mostly, I’m hopeful that the ease of this method will allow more citizen scientists and non-experts to collect data for large-scale projects. After all, this method requires no exposure settings, no additional lenses, and is automatically levelled. The geolocation and compass heading can even be extracted from the image metadata to automatically orient the hemispherical image and set the location parameters in analysis software. Really, anyone with a cell phone can capture research-grade spherical images!

Be sure to check out my other posts about canopy research that cover the theory, hardware, field sampling, and analysis pipelines for hemispherical photos, and my tips for taking spherical panoramas.

Frogs in the Feral Atlas

azandis@gmail.com — Tue, 17 Nov 2020 15:20:47 +0000

“Every event in human history has been a more-than-human event.” This is the first line from the introduction of Feral Atlas: the more-than-human Anthropocene, a new book out from Stanford Univ. Press that compiles examples of how the natural world enables us to be modern humans. Over our history as a species, we have been a part of reciprocal domestication as we shape our environment and our environment shapes us. Because this process is ongoing and messy, most of our world occupies the feral space between wild and domestic.

Our chapter tells the story of green frogs and the feral condition of their life with suburban human neighbors. We especially highlight the way that the human built-environment of lawns, pavement, sewers, and septic systems is infused into the biology of green frogs (a topic that Max and Dave have studied in depth). As a counter example, I told the story of the wood frog, a species that has escaped a feral fate by clinging to the remnants of wild space away from humans (a topic I study in depth).

Bayla painted the featured image for our chapter. It depicts a green frog in front of a gradient from rural to urbanized environment. Endocrine disrupting chemicals (EDCs) that alter green frog biology leach in from the urban and suburban zones.

I rarely get to write about wood frogs outside of academic articles, so it was a pleasure to contribute to this piece. I think it is some of my best natural history writing. I’ve excerpted my section below (or, read the chapter):

“To better understand why our housing patterns influence frogs, it is worth taking a frog’s-eye-view of suburbanization. Most frogs exhibit distinct life-stages. Like humans, frogs begin development as shell-less and fragile eggs, but while human embryos float within the protection of a womb, frog embryos are buoyed among the vegetation and flotsam of ponds. The embryos have an umbilical relationship to the water that surrounds them. Nutrients and oxygen easily pass through the transparent jelly and are consumed through delicately branching gills. Any contaminants in the water also suffuse the embryos.
Even before their eyes or mouths have formed, the developed embryos hatch as free-swimming larvae not much larger than a grain of rice. Hatchlings are vulnerable. Thus, frogs hedge their bets by producing hundreds of eggs per clutch, hoping that at least a few will win the lottery of life. Some species, like wood frogs, additionally safeguard their offspring by choosing impermanent pools that are devoid of fish as relatively safe nurseries.

Those hatchlings that survive develop into recognizable tadpoles with bulbous bodies and slender tails. A pond’s version of cows, tadpoles graze along the bottom with scraper-like teeth. They consume algae and detritus along with any solid matter that washes into the pond basin. A long digestive tract allows the tadpoles to incorporate nutrients into a growing body. Where ponds neighbor septic systems, this means that human waste makes up a prodigious portion of a tadpole’s body.

The transition from a tadpole to a frog is a remarkable change. It makes the squeaking voice and acne of human puberty seem like a blessing. Every system in the tadpole’s body transforms. The tail gives way to bony limbs. The narrow, disc-shaped mouth morphs into a wide, insect-capturing, gape. The goggle eyes, so fine-tuned to underwater vision, mutate into something much like our own. Even the long and coiled digestive tract shortens and distends. At the end of this metamorphosis, the aquatic vegetarian leaves the water’s edge and becomes a terrestrial carnivore.

Green frogs are parochial and prefer a pond-side life. For a short time as juveniles they might range far and occupy any standing water from lakes and ponds to swimming pools and tire ruts. Upon adulthood though, they settle along freshwater shores where they patiently wait for dragonflies and other insects to approach within range of a lunging gulp. Since green frogs inhabit permanent ponds, they can breed throughout the summer, and without the threat of the pond drying out from beneath them, their tadpoles can be leisurely in development. When snow falls and the pond freezes, both adults and overwintering tadpoles take refuge deep in the insulating layer of pond muck. Because a green frog’s life is so reliant on a pond, they can survive in just about any permanent water with at least a narrow perimeter of vegetation. As long as a homeowner neglects the tufts of grass along the bank, green frogs are more than happy to remain neighbors.

Unlike green frogs, wood frogs become sylvan nomads after metamorphosis. As their home ponds dry up in the summer and fall, they wander the forest floor hunting among the leaves, only briefly returning to recently filled pools in early spring to breed. During the winter months, wood frogs burrow just under the blanket of leaves dropped in fall. This enables them to be the first out of hibernation as the forest thaws in spring. For these reasons, wood frogs rely on leaf-littered landscapes. Manicured lawns where leaves are regularly raked and bagged make inhospitable places for them. Where the balance of forest gives way to lawns, wood frogs disappear…”

Overall, this was a really fun project to work on that gave me a chance to switch up my writing style. It was also a lot of fun to be able to collaborate with my partner, Bayla, who painted the featured image for the chapter.

The online version of the book is a little counter-intuitive to navigate (I think this was intentionally designed as a rhetorical device), but if you can figure it out, it is worth checking out some of the other cool stories of our feral world!

Phenology in a warming world

azandis@gmail.com — Thu, 30 Jul 2020 22:45:02 +0000

I’m thrilled to announce that the first of my dissertation chapters has just been published in Ecography.

Update (Nov. 2020): And, I’m especially thrilled that our piece will be the cover article for the journal, featuring a pair of breeding wood frogs from our population! My hands nearly froze trying to get this underwater shot, so I’m glad it was worth the effort!

Over the past 20+ years, our lab has been monitoring over 50 populations of wood frogs at Yale Myers Forest. Each year in early spring, we listen for the duck-like clucks of the male frogs which means that they have emerged from under the snow and moved into the breeding ponds. Shortly afterward, we head out into the freezing ponds to count the egg masses as a way to monitor population density over time.

Here, I am wading into one of the ponds to count egg masses. Wood frogs are remarkable in the cold temperatures that they can function.

In this study, we looked at how the oviposition date (the day on which frogs deposited eggs) has changed over time. As climates warm, we usually expect for the timing life-history events (like oviposition, emergence from hibernation, flowering time, etc.) called ‘phenology’ to advance in the year as winters get shorter. That’s just what most species do. And the trend of advancing phenology is strongest for amphibians.

This slide from my presentation at the World Herpetological Congress shows that, in three major metaanalyses, amphibians show some of the strongest advances in phenology compared to other species.

Given that annual temperatures at our field site have increased by almost 0.6 C in the past two decades, we expected frogs to breed and lay eggs earlier. If our frogs were like other amphibians, we might expect oviposition to come around 6 days earlier.

Surprisingly, we found the opposite. Our frogs seem to be breeding 3 days LATER.

To figure out what might be going on with our frogs, we decided to look more closely at climate across the season, not just annual averages. It turns out that most of the increase in annual temperatures are felt later in the summer, but relatively less when frogs are breeding. Snowpack, on the other hand, is actually accumulating later and lasting longer. In the figure (Fig. 3 from the paper) below, you can see these trends. On the left are the comparisons between temperature, precipitation, and snowpack between 1980 and 2018. On the right, we plot only the difference in trends over time. At the top-right, we plot the oviposition dates to show how seasonal changes in climate line up with frog breeding.

Figure 3 from the paper. Seasonal trends in daily temperature (a), precipitation (square root scale) (b), and snow water equivalent (c) from 1980 (blue) to 2018 (red) as predicted by generalized additive model with interaction between Year and penalized spline smooth on day-of-year with 95% confidence intervals. Points represent daily values (N = 13,869 for all models). Annual mean oviposition dates (2000-2019) (d) in comparison to relative, seasonal change in temperature (e), precipitation (f), and snow water equivalent (SWE) (g) between 1980 and 2018. Seasonal change is the difference in daily values fit by generalized additive models for between 1980 and 2018. All differences are scaled to the standard deviation between annual averages for each variable in order to compare relative magnitude of change that coincides with the oviposition window (dotted lines). Dark bands indicate significant difference between 95% confidence intervals. Light bands indicate total difference. All meteorological observations from Daymet data between 1980 and 2018.

We also looked at how the timing of oviposition correlated with climate across the season. We found that breeding occurs later when there is more snow at the beginning of the breeding window. Also colder temperatures just before breeding correlate with delayed oviposition (which makes sense if colder temps mean more snow and less melting).

So, we think that frogs may be kind of stuck. Persistent snowpack might be keeping them from breeding earlier. But at the same time, warmer summer temperatures might be drying up their ponds faster. If so, this could be a big problem for tadpoles that need to maximize their time for development. The figure below shows that frogs tend to breed earlier when winter and early spring air temperature are high. As we’d expect, more snowpack correlates with later breeding. High precipitation during the spring delays breeding (probably because it is falling as snow).

Figure S2 from the paper’s supporting information. The correlation between 10-, 20-, 30-, and 40-day rolling averages of daily mean temperature (b), precipitation (c), and snow water equivalent (d) between 2000 and 2018 with oviposition timing (annual averages 2000-2019, 3-day bin width)(a). Dotted lines indicate 95% confidence interval (+/- 0.45) for Pearson’s correlation for n = 20 pairs and 18 degrees of freedom. Light grey bands indicate non-overlapping windows of greatest correlation.

Twenty years is a long time to be collecting ecological data, but it is a pretty short window into the evolutionary history of wood frogs. And, we don’t know how long snow and temperatures may have been working against these frogs. So, as a final piece of our analysis, we used a machine learning technique called a random forest to predict oviposition dates backwards in time an additional 20 years. It doesn’t seem like much has changed over the past half-century or so. In one way, that could be good news in that at least things don’t seem to be getting any worse.

The big question is, how will frogs cope with these climate changes? If tadpoles are faced with an ever-shrinking window of time to develop into frogs, will they be able to keep up? Or, will they lose the race and end up as tadpole-shaped raisins in our ponds?

I won’t give away any spoilers, but I’m looking at our long-term larval datasets to ask that question next.

This male wood frog is learning why it doesn’t pay to get to the breeding ponds too early. His pond is still frozen and he is waiting for the ice to, literally, thaw out from under him.

Julian Date vs Day of the Year

azandis@gmail.com — Mon, 27 Jan 2020 11:40:50 +0000

Julian day and Day of Year (DOY) are NOT the same thing

I recently wrote a paper looking at how frog breeding timing is impacted by climate change. So, I’ve been reading lots of ecological studies of phenology (more on phenology later). One thing that struck me is how almost everyone in ecology misuses the term “Julian Day” when they mean Day-of-Year.

Day-of-Year (DOY), as the name suggests, is the count number of a given day in the year. So, Jan 25 is DOY 25 and March 1 is either DOY 60 or DOY 61 depending if it is a leap year. And we can express the time of day as a decimal, so that 3pm on January 1 is DOY 1.625.

Julian day is a completely different way to measure time. It was defined by an astronomer named Joseph Scalinger back in 1583 (and so, takes serious precedent over contemporary ecologists trying to hijack the term).

The point is, DOY and Julian day/date are wildly different things designed to measure wildly different phenomena.

Unlike DOY that starts counting on January 1^st in any given year, the Julian day count starts on January 1, 4713 BC. There is a complicated historical reason that Scalinger chose 4713 as the starting date that had to do with wedding the Julian and Gregorian dates during the calendar reform (read all about that here), but the point is, DOY and Julian day/date are wildly different things designed to measure wildly different phenomena.

For instance, I’m writing this blog on the 25^th of January 2020.

The DOY today is: 25

The Julian day today is: 2458873

But, it gets even crazier because unlike the DOY count that starts at midnight, Julian days start counting at Noon. So, right now, at 1030am the Julian day is 2458873, but after lunch it will be 2458874.

The Julian day metric is essentially worthless for comparing seasons. There is no ecologist who uses true Julian days; so, please, ecologist, don’t say Julian Day when you mean Day-of-Year.

As Gernot Winkler, former USNO Timer Service director notes:

“[Mixing Julian Day and DOY] is a grossly misleading practice that was introduced by some who were simply ignorant and too careless to learn the proper terminology. It creates a confusion which should not be taken lightly. Moreover, a continuation of the use of expressions “Julian” or “J” day in the sense of a Gregorian Date will make matters even worse. It will inevitably lead to dangerous mistakes, increased confusion, and it will eventually destroy whatever standard practices exist.”

So why does everyone misuse Julian Day? My hunch is that Julian Day sounds more technical than DOY, so folks gravitate toward it and others follow suit without ever questioning what it means.

Why do we care about studying seasonal change across years?

Phenology is the study of seasonal cycles of lifehistory like when bears go into hibernation, when flowers open, or when geese migrate. Phenology is a hot topic these days because climate change is causing wild populations to change their seasonal timing (Thackeray et al. 2016). For instance, frogs increasingly start calling and breeding earlier (Li et al. 2013) and forests green-up earlier (Cleland et al. 2007).

On one hand, shifts in lifehistory timing might be a good way to cope with climate change, but it can be bad news if shifts in one species causes a misalignment in an ecological relationship (Miller-Rushing et al. 2010; Visser & Gienapp 2019). For example, European flycatcher migration generally coincides with a boom in caterpillars that feed on oaks. However, climate change drives oaks to bud earlier, which means that all the juicy caterpillars turn chrysalises before the birds show up (Both & Visser 2001; Both et al. 2006). Similarly, snowshoe hares evolved to change coat color from white to brown in winter, but as snow melts earlier and earlier each year, rabbits are stuck with white coats for too long and become easy targets for predators (Mills et al. 2018).

Needless to say, it is important for use to be able to compare when in the season these critical phenomena take place and compare their change across years. When we do so, we are using DOY to align datasets across year, not Julian day; so, ecologists, let’s stop using the wrong term.

References:

Both, C., Bouwhuis, S., Lessells, C. M., and Visser, M. E. (2006). Climate change and population declines in a long-distance migratory bird. Nature 441, 81–83.

Both, C., and Visser, M. E. (2001). Adjustment to climate change is constrained by arrival date in a long-distance migrant bird. Nature 411, 296–298.

Cleland, E. E., Chuine, I., Menzel, A., Mooney, H. A., and Schwartz, M. D. (2007). Shifting plant phenology in response to global change. Trends Ecol. Evol. 22, 357–365.

Li, Y., Cohen, J. M., and Rohr, J. R. (2013). Review and synthesis of the effects of climate change on amphibians. Integr. Zool. 8, 145–161.

Miller-Rushing, A. J., Høye, T. T., Inouye, D. W., and Post, E. (2010). The effects of phenological mismatches on demography. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 3177–3186.

Mills, L. S., Bragina, E. V., Kumar, A. V., Zimova, M., Lafferty, D. J. R., Feltner, J., et al. (2018). Winter color polymorphisms identify global hot spots for evolutionary rescue from climate change. Science 359, 1033–1036.

Thackeray, S. J., Henrys, P. A., Hemming, D., Bell, J. R., Botham, M. S., Burthe, S., et al. (2016). Phenological sensitivity to climate across taxa and trophic levels. Nature 535, 241–245.

Visser, M. E., and Gienapp, P. (2019). Evolutionary and demographic consequences of phenological mismatches. Nat Ecol Evol 3, 879–885.

The featured image of this post is from joiseyshowaa under creative commons usage.

Evolution of Intrinsic Rates at the Evolution Conference 2019

azandis@gmail.com — Tue, 03 Sep 2019 13:13:38 +0000

At this year’s Evolution Conference in Providence Road island, the organizers managed to recruit volunteers to film most of the talks. This is such a great opportunity for folks who cannot attend the meeting in person to stay up to date in the field. It’s also a useful chance for those of us who presented to critically review our talks.

Here’s my talk from the conference, “Evolution of Intrinsic Rates: Can adaptation counteract environmental change?“: