packages <- c("rgbif", "raster", "maps", "rgdal", "viridis", "picante", "rinat", "sp", "broom", "maptools", "ape", "tidyverse", "magrittr", "sf", "mapview")
#lapply(packages, install.packages)
sapply(packages, library, character.only = T)
To query for specific species, use the rinat::get_inat_obs
function.
OCC <- get_inat_obs(taxon_name = "Lontra canadensis", maxresults = 25)
You can query by taxon name, taxon id (from iNat), or from a general query. The max number of results is set to 100 by default. Here I set it to 25 just to show a quick example.
Take a look at the data.
class(OCC)
## [1] "data.frame"
head(OCC)
summary(OCC)
#OR
str(OCC) #gives structure of object, can look messy at times but does give the class of each column
Queries can be further filtered by year (can only give one year, not a range, but there is a way around this for enterprising individuals…), month(s), status of georeferenced coordinates, and quality. Here I set quality to only those observations deemed research quality (determined when the wider iNaturalist community agrees on a species level ID). If you are using iNat to determine where a species is, you should probably only look at these results.
OCC <- get_inat_obs(taxon_name = "Lontra canadensis", year = 2018, maxresults = 1000, quality = "research", geo = T)
Try plotting the coordinates of 2018 research grade otter observations. The maps::map
function can prove helpful if you need quick boundaries and don’t want to fuss with making a ggplot2 map. The database argument is set to “world” by default, but you can choose a few US options by changing it to “state” or “county.”
plot(OCC[, c("longitude", "latitude")], pch = 19, col = "red", cex = 0.5) #plots points
maps::map(add = T) #plots world boundaries
maps::map(add = T, "state") #plots US state boundaries
You can apply a bounding box to filter out observations (S-LAT, W-LON, N-LAT, E-LON). I created a bounding box below that essentially covers Texas.
OCC <- get_inat_obs(taxon_name = "Lontra canadensis", geo = T, bounds = c(25, -106, 37, -94), maxresults = 1000)
Now you can see a general plot of otter observations in/around Texas.
plot(OCC[, c("longitude", "latitude")], pch = 19, col = "red")
maps::map(add = T, database = "county", "texas") #plots the county boundaries for the great Republic of Texas
Besides querying for a specific species or taxonomic group, one can query for a specific iNaturalist group project (e.g., Herps of Texas or Mammals of Texas).
#DON'T run the below code, it can take a while
MAMM <- get_inat_obs_project(grpid = "mammals-of-texas", type = "observations")
#DO RUN the below line to read a .csv file with the Mammals of Texas Project information
MAMM <- read.csv("iNatMammals.csv", header = T)
summary(MAMM)
Check summary for User.login, Quality.grade, Scientific.name
plot(MAMM[, c("Longitude", "Latitude")], pch = 19, col = "red")
Well that seems to look like Texas.
Let’s take a look at trends by each county. iNat doesn’t have explicit county data for each observation, so we’ll create them. First, take out all those records without coordinates to make things easier on us. We’ll use the dplyr::filter
function for this.
MAMM <- filter(MAMM, !is.na(Longitude))
The above line checks for which rows have NA values (this the is.na
function) for longitude and filters them out to only select those rows that are the opposite of it. The !
symbol acts as a negator for logical values (the result of is.na
), so it grabs those with coordinate data.
The raster::getData
function will grab the R equivalent of a shapefile of US county data (level 0 = country, level 1 = stateProvince, level 2 = county/municipality)
CNTY <- raster::getData(country = "USA", level = 2)
class(CNTY)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
A SpatialPointsDataFrame has an associated data.frame (you can access the data.frame only by using CNTY@data if you wish). Take a look at the NAME_1 and NAME_2 columns.
head(CNTY)
## OBJECTID ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 HASC_2 CCN_2
## 1 1 244 USA United States 1 Alabama 1 Autauga US.AL.AU NA
## 2 2 244 USA United States 1 Alabama 2 Baldwin US.AL.BD NA
## 3 3 244 USA United States 1 Alabama 3 Barbour US.AL.BR NA
## 4 4 244 USA United States 1 Alabama 4 Bibb US.AL.BI NA
## 5 5 244 USA United States 1 Alabama 5 Blount US.AL.BU NA
## 6 6 244 USA United States 1 Alabama 6 Bullock US.AL.BL NA
## CCA_2 TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2
## 1 County County
## 2 County County
## 3 County County
## 4 County County
## 5 County County
## 6 County County
head(CNTY$NAME_1) #state names
## [1] "Alabama" "Alabama" "Alabama" "Alabama" "Alabama" "Alabama"
head(CNTY$NAME_2) #county names
## [1] "Autauga" "Baldwin" "Barbour" "Bibb" "Blount" "Bullock"
The below subsets the polygons to just Texas *ComeAndTakeItFlagWaving.gif*
CNTY <- CNTY[CNTY$NAME_1 %in% "Texas", ]
When subsetting a data.frame, you have a few options:
base::subset
functionsubset(CNTY, CNTY$NAME_1 == "Texas")
dplyr::filter
filter(CNTY, NAME_1 == "Texas")
CNTY[CNTY$NAME_1 == "Texas", ] #(or how I did it above, the %in% symbol is necessary when you want to subset by multiple things, e.g., %in% c("Texas", "Louisiana"))
CNTY$NAME_2
## [1] "Anderson" "Andrews" "Angelina" "Aransas"
## [5] "Archer" "Armstrong" "Atascosa" "Austin"
## [9] "Bailey" "Bandera" "Bastrop" "Baylor"
## [13] "Bee" "Bell" "Bexar" "Blanco"
## [17] "Borden" "Bosque" "Bowie" "Brazoria"
## [21] "Brazos" "Brewster" "Briscoe" "Brooks"
## [25] "Brown" "Burleson" "Burnet" "Caldwell"
## [29] "Calhoun" "Callahan" "Cameron" "Camp"
## [33] "Carson" "Cass" "Castro" "Chambers"
## [37] "Cherokee" "Childress" "Clay" "Cochran"
## [41] "Coke" "Coleman" "Collin" "Collingsworth"
## [45] "Colorado" "Comal" "Comanche" "Concho"
## [49] "Cooke" "Coryell" "Cottle" "Crane"
## [53] "Crockett" "Crosby" "Culberson" "Dallam"
## [57] "Dallas" "Dawson" "Deaf Smith" "Delta"
## [61] "Denton" "Dewitt" "Dickens" "Dimmit"
## [65] "Donley" "Duval" "Eastland" "Ector"
## [69] "Edwards" "El Paso" "Ellis" "Erath"
## [73] "Falls" "Fannin" "Fayette" "Fisher"
## [77] "Floyd" "Foard" "Fort Bend" "Franklin"
## [81] "Freestone" "Frio" "Gaines" "Galveston"
## [85] "Garza" "Gillespie" "Glasscock" "Goliad"
## [89] "Gonzales" "Gray" "Grayson" "Gregg"
## [93] "Grimes" "Guadalupe" "Hale" "Hall"
## [97] "Hamilton" "Hansford" "Hardeman" "Hardin"
## [101] "Harris" "Harrison" "Hartley" "Haskell"
## [105] "Hays" "Hemphill" "Henderson" "Hidalgo"
## [109] "Hill" "Hockley" "Hood" "Hopkins"
## [113] "Houston" "Howard" "Hudspeth" "Hunt"
## [117] "Hutchinson" "Irion" "Jack" "Jackson"
## [121] "Jasper" "Jeff Davis" "Jefferson" "Jim Hogg"
## [125] "Jim Wells" "Johnson" "Jones" "Karnes"
## [129] "Kaufman" "Kendall" "Kenedy" "Kent"
## [133] "Kerr" "Kimble" "King" "Kinney"
## [137] "Kleberg" "Knox" "La Salle" "Lamar"
## [141] "Lamb" "Lampasas" "Lavaca" "Lee"
## [145] "Leon" "Liberty" "Limestone" "Lipscomb"
## [149] "Live Oak" "Llano" "Loving" "Lubbock"
## [153] "Lynn" "Madison" "Marion" "Martin"
## [157] "Mason" "Matagorda" "Maverick" "McCulloch"
## [161] "McLennan" "McMullen" "Medina" "Menard"
## [165] "Midland" "Milam" "Mills" "Mitchell"
## [169] "Montague" "Montgomery" "Moore" "Morris"
## [173] "Motley" "Nacogdoches" "Navarro" "Newton"
## [177] "Nolan" "Nueces" "Ochiltree" "Oldham"
## [181] "Orange" "Palo Pinto" "Panola" "Parker"
## [185] "Parmer" "Pecos" "Polk" "Potter"
## [189] "Presidio" "Rains" "Randall" "Reagan"
## [193] "Real" "Red River" "Reeves" "Refugio"
## [197] "Roberts" "Robertson" "Rockwall" "Runnels"
## [201] "Rusk" "Sabine" "San Augustine" "San Jacinto"
## [205] "San Patricio" "San Saba" "Schleicher" "Scurry"
## [209] "Shackelford" "Shelby" "Sherman" "Smith"
## [213] "Somervell" "Starr" "Stephens" "Sterling"
## [217] "Stonewall" "Sutton" "Swisher" "Tarrant"
## [221] "Taylor" "Terrell" "Terry" "Throckmorton"
## [225] "Titus" "Tom Green" "Travis" "Trinity"
## [229] "Tyler" "Upshur" "Upton" "Uvalde"
## [233] "Val Verde" "Van Zandt" "Victoria" "Walker"
## [237] "Waller" "Ward" "Washington" "Webb"
## [241] "Wharton" "Wheeler" "Wichita" "Wilbarger"
## [245] "Willacy" "Williamson" "Wilson" "Winkler"
## [249] "Wise" "Wood" "Yoakum" "Young"
## [253] "Zapata" "Zavala"
Oh, wow! A vector of Texas county names
Turn the MAMM data.frame into a SpatialPointsDataFrame by specifying the coordinate columns. I found that this is easier than using sp::SpatialPointsDataFrame
.
coordinates(MAMM) <- ~Longitude + Latitude
All coordinates from iNat are stored in unprojected lat/lon coordinates using the WGS84 datum, so you can grab the coordinate reference system from the county data.
crs(MAMM) <- crs(CNTY)
MAMM <- spTransform(MAMM, crs(CNTY))
The raster::over
function is useful for determining what specific polygons (y argument, shown here as CNTY) a point in the x argument (here shown as MAMM) fall into.
OVR <- over(MAMM, CNTY)
dim(OVR)
## [1] 8563 15
head(OVR)
## OBJECTID ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 HASC_2 CCN_2
## 1 2705 244 USA United States 44 Texas 2705 Nueces US.TX.NU NA
## 2 2747 244 USA United States 44 Texas 2747 Tarrant US.TX.TN NA
## 3 2754 244 USA United States 44 Texas 2754 Travis US.TX.TV NA
## 4 2543 244 USA United States 44 Texas 2543 Blanco US.TX.BC NA
## 5 2628 244 USA United States 44 Texas 2628 Harris US.TX.RR NA
## 6 2584 244 USA United States 44 Texas 2584 Dallas US.TX.DS NA
## CCA_2 TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2
## 1 County County
## 2 County County
## 3 County County
## 4 County County
## 5 County County
## 6 County County
This means you now have county information for each iNat record by simply writing out OVR$NAME_2
Let’s add that county information to our SpatialPointsDataFrame.
MAMM@data <- cbind(MAMM@data, county = OVR$NAME_2)
Grab the number of observations by county.
Hoo boy, time for an apply function explanation. For the below line, we are grabbing the vector of Texas county names and indexing the iNat mammal data by each county (this is what everything after function(x) is concerned with). sapply
(results in a vector) and lapply
(results in a list) have the following syntax: sapply(x, function(x) whatever function you can think up).
So this is going county by county and counting up the number of observations.
OBS <- sapply(CNTY$NAME_2, function(x) nrow(filter(MAMM@data, county == x)))
Grabs the number of species found in each county. This is similar to the line written to count observations, but instead further indexes by only grabbing the Scientific.name column and then grabs only the number of unique names in this column.
Soooo, you may be wondering what kind of mystical arts I’m attempting with %$%
and %>%
. I’m lazy, so I constructed a pipeline. Pipelines in R are created in using the handy %>%
batch of symbols from the magrittr package. Essentially what it does is take the result from the left and use it as the first argument in whatever is after %>%
(%$%
is a special usage that only grabs a specific column or index from the left).
SR <- sapply(CNTY$NAME_2, function(x) MAMM@data %>% filter(county == x) %$% Scientific.name %>% unique %>% length)
So essentially, we are doing length(unique(filter(MAMM\@data, county = x)$Scientific.name))
but in an easier to read way (you can actually understand it from left to right).
Ta-dah(rida)! Species richness
One more thing…lets document how many unique users there are recording observations in each county.
USER <- sapply(CNTY$NAME_2, function(x) MAMM@data %>% filter(county == x) %$% User.login %>% unique %>% length)
Place our observation, species richness, and user number vectors as columns within our SpatialPolygonsDataFrame of Texas counties.
CNTY$inat.obs <- OBS
CNTY$inat.sr <- SR
CNTY$inat.user <- USER
How about we visualize what we just wrought? ggplot2 is a fickle companion, so we will use the broom::tidy
function to break down our observations and create a sort of choropleth map. broom::tidy
will turn our SpatialPointsDataFrame into a regular data.frame and ID each row according to the region specified. This will allow the fill argument of the ggplot2::aes
function to fill each county appropriately.
OBSR <- tidy(CNTY, region = "inat.obs")
class(OBSR$id)
## [1] "character"
OBSR$id <- as.numeric(OBSR$id)
We don’t want these numbers to be characters when filling in each polygon, so change them to numeric
Those polygons with no observations will be set to NA so as to not confuse them with counties that only have a handful of observations
OBSR[which(OBSR$id == 0), "id"] <- NA
SRR <- tidy(CNTY, region = "inat.sr")
SRR$id <- as.numeric(SRR$id)
SRR[which(SRR$id == 0), "id"] <- NA
USR <- tidy(CNTY, region = "inat.user")
USR$id <- as.numeric(USR$id)
USR[which(USR$id == 0), "id"] <- NA
Lists are great classes to store things in. Here we have the data.frames featuring county observation, species richness, and user numbers and can access each one by using [[]]
.
INAT <- list(OBSR, SRR, USR)
ggplot(data = INAT[[2]], aes(x = long, y = lat, fill = id, group = group)) + geom_polygon(color = "black", size = 0.2) + scale_fill_viridis(name = "iNaturalist Species Richness", option = "D", na.value = NA)
Now that we have county data associated with each record, you can subset the records by a specific county (such as Brazos Co.). You can use the base::unique
function to determine which species are found in Brazos Co.
filter(MAMM@data, county == "Brazos")
MAMM@data %>% filter(county == "Brazos") %$% Scientific.name %>% unique
## [1] Tadarida brasiliensis Procyon lotor
## [3] Canis latrans Dasypus novemcinctus
## [5] Baiomys taylori Sylvilagus floridanus
## [7] Didelphis virginiana Sciurus niger
## [9] Sigmodon hispidus Peromyscus leucopus
## [11] Lontra canadensis Lynx rufus
## [13] Peromyscus Castor canadensis
## [15] Cryptotis parva Chaetodipus hispidus
## [17] Reithrodontomys fulvescens Urocyon cinereoargenteus
## [19] Odocoileus virginianus
## 229 Levels: Ammospermophilus interpres Ammotragus lervia ... Xerospermophilus spilosoma
CHAET <- occ_search(scientificName = "Chaetodipus hispidus", limit = 10)
class(CHAET)
## [1] "gbif"
Hmmm, it is a gbif class. Can’t really review the information we want (presumably you want info regarding the record(s)).
attributes(CHAET)
## $names
## [1] "meta" "hierarchy" "data" "media" "facets"
##
## $class
## [1] "gbif"
##
## $type
## [1] "single"
##
## $return
## [1] "all"
##
## $args
## $args$limit
## [1] 10
##
## $args$offset
## [1] 0
##
## $args$scientificName
## [1] "Chaetodipus hispidus"
##
## $args$fields
## [1] "all"
CHAET$data
## # A tibble: 10 x 98
## name key decimalLatitude decimalLongitude issues datasetKey
## <chr> <int> <dbl> <dbl> <chr> <chr>
## 1 Chaetod… 1.83e9 40.5 -103. gass84 b15d4952-7d20…
## 2 Chaetod… 1.80e9 34.8 -99.4 gass84 71e82020-f762…
## 3 Chaetod… 1.81e9 40.5 -103. gass84 b15d4952-7d20…
## 4 Chaetod… 1.81e9 40.5 -103. gass84 b15d4952-7d20…
## 5 Chaetod… 1.81e9 40.5 -103. gass84 b15d4952-7d20…
## 6 Chaetod… 1.84e9 35.4 -99.1 gass84 b15d4952-7d20…
## 7 Chaetod… 1.81e9 40.5 -103. gass84 b15d4952-7d20…
## 8 Chaetod… 1.84e9 35.4 -99.1 gass84 b15d4952-7d20…
## 9 Chaetod… 1.81e9 40.5 -103. gass84 b15d4952-7d20…
## 10 Chaetod… 1.84e9 35.4 -99.1 gass84 b15d4952-7d20…
## # ... with 92 more variables: publishingOrgKey <chr>, networkKeys <chr>,
## # installationKey <chr>, publishingCountry <chr>, protocol <chr>,
## # lastCrawled <chr>, lastParsed <chr>, crawlId <int>, extensions <chr>,
## # basisOfRecord <chr>, individualCount <int>, taxonKey <int>,
## # kingdomKey <int>, phylumKey <int>, classKey <int>, orderKey <int>,
## # familyKey <int>, genusKey <int>, speciesKey <int>,
## # scientificName <chr>, kingdom <chr>, phylum <chr>, order <chr>,
## # family <chr>, genus <chr>, species <chr>, genericName <chr>,
## # specificEpithet <chr>, taxonRank <chr>, dateIdentified <chr>,
## # coordinateUncertaintyInMeters <dbl>, continent <chr>,
## # stateProvince <chr>, year <int>, month <int>, day <int>,
## # eventDate <chr>, modified <chr>, lastInterpreted <chr>,
## # references <chr>, license <chr>, identifiers <chr>, facts <chr>,
## # relations <chr>, geodeticDatum <chr>, class <chr>, countryCode <chr>,
## # country <chr>, institutionID <chr>, dynamicProperties <chr>,
## # county <chr>, identificationVerificationStatus <chr>,
## # eventRemarks <chr>, gbifID <chr>, language <chr>, type <chr>,
## # preparations <chr>, locationAccordingTo <chr>, catalogNumber <chr>,
## # institutionCode <chr>, identifiedBy <chr>, georeferencedDate <chr>,
## # identifier <chr>, nomenclaturalCode <chr>, verbatimEventDate <chr>,
## # higherGeography <chr>, georeferencedBy <chr>,
## # georeferenceProtocol <chr>, georeferenceVerificationStatus <chr>,
## # endDayOfYear <chr>, locality <chr>, collectionCode <chr>,
## # verbatimLocality <chr>, occurrenceID <chr>, recordedBy <chr>,
## # otherCatalogNumbers <chr>,
## # http...unknown.org.http_..rs.tdwg.org.dwc.terms.ResourceRelationship <chr>,
## # organismID <chr>, previousIdentifications <chr>,
## # identificationQualifier <chr>, samplingProtocol <chr>,
## # accessRights <chr>, higherClassification <chr>,
## # georeferenceSources <chr>, sex <chr>, lifeStage <chr>,
## # recordNumber <chr>, datasetName <chr>,
## # http...unknown.org.http_..rs.tdwg.org.dwc.terms.MeasurementOrFact <chr>,
## # vernacularName <chr>, verbatimElevation <chr>,
## # reproductiveCondition <chr>
What is a tibble? It is a trimmed down data.frame. You can always change it to something that you are more familiar or comfortable with.
CHAET <- data.frame(CHAET$data)
head(CHAET)
Well, that is a lot of data. Do we really need that much data? Do you REALLY want the vernacularName, license, and publishingCountry, for example?
The summary of a vector of factors will give you a handy breakdown of what the vector is comprised of. Neat!
summary(as.factor(CHAET$county))
## Greer Logan County Washita County
## 1 6 3
Here we not only listed the set of fields that we wanted, but also only grabbed the $data portion and turned it into a data.frame in one line.
CHAET <- occ_search(scientificName = "Chaetodipus hispidus", fields = c('name', 'infraspecificEpithet', 'stateProvince', 'country', 'county', 'locality', 'decimalLongitude', 'decimalLatitude', 'geodeticDatum', 'datasetName', 'institutionCode', 'recordNumber', 'catalogNumber', 'year', 'preparations'), limit = 10) %$% data %>% data.frame
But what if you wanted more than 200,000 records (the limit imposed by the draconian creators of the package)? *thinkingfaceemoji* This lapply function will call the rgbif::occ_search
function twice, each with the country specified as each element of the CNTRY vector.
CNTRY <- c("US", "MX")
CHAET <- lapply(1:2, function(x) as.data.frame(occ_search(scientificName = "Chaetodipus hispidus", country = CNTRY[x], fields = c('name', 'stateProvince', 'country', 'county', 'locality', 'decimalLongitude', 'decimalLatitude', 'geodeticDatum', 'datasetName', 'institutionCode', 'recordNumber', 'catalogNumber', 'year', 'preparations'), limit = 15)$data))
length(CHAET)
## [1] 2
CHAET
This results in a list composed of Chaetodipus hispidus records from the US and from Mexico
“But I don’t want a list”
The base::do.call
function will apply the function specified in the first part of the argument to each portion of the list, thus binding the rows of each list element together. There are other methods to use a function like base::rbind
or base::cbind
on each element of a list (e.g., plyr::rbind.fill
) but base::do.call
is accessible to everyone and is simple to use.
CHAET <- do.call(rbind, CHAET)
dim(CHAET)
## [1] 30 14
Searching using the taxonKey (always found in the url of the GBIF page of the group you are looking at) can be very helpful as well
HMYD <- occ_search(taxonKey = 5504, limit = 20)
HMYD
## Records found [283584]
## Records returned [20]
## No. unique hierarchies [7]
## No. media records [16]
## No. facets [0]
## Args [limit=20, offset=0, taxonKey=5504, fields=all]
## # A tibble: 20 x 108
## name key decimalLatitude decimalLongitude issues datasetKey
## <chr> <int> <dbl> <dbl> <chr> <chr>
## 1 Dipodom… 1.81e9 33.4 -116. cdroun… 50c9509d-22c…
## 2 Dipodom… 1.80e9 33.3 -116. cdroun… 50c9509d-22c…
## 3 Dipodom… 1.81e9 36.7 -122. cdroun… 50c9509d-22c…
## 4 Dipodom… 1.81e9 36.5 -121. cdroun… 50c9509d-22c…
## 5 Dipodom… 1.81e9 36.7 -122. cdroun… 50c9509d-22c…
## 6 Chaetod… 1.83e9 33.1 -116. cdroun… 50c9509d-22c…
## 7 Dipodom… 1.81e9 32.8 -116. cdroun… 50c9509d-22c…
## 8 Perogna… 1.88e9 34.8 -108. cdround b15d4952-7d2…
## 9 Dipodom… 1.81e9 33.3 -116. cdroun… 50c9509d-22c…
## 10 Dipodom… 1.81e9 33.3 -116. cdroun… 50c9509d-22c…
## 11 Chaetod… 1.84e9 36.7 -122. cdroun… 50c9509d-22c…
## 12 Dipodom… 1.80e9 33.3 -116. cdroun… 50c9509d-22c…
## 13 Dipodom… 1.81e9 36.7 -122. cdroun… 50c9509d-22c…
## 14 Dipodom… 1.84e9 32.2 -114. cdroun… 50c9509d-22c…
## 15 Dipodom… 1.83e9 36.5 -117. cdroun… 50c9509d-22c…
## 16 Dipodom… 1.88e9 34.8 -108. cdround b15d4952-7d2…
## 17 Dipodom… 1.88e9 34.8 -108. "" b15d4952-7d2…
## 18 Chaetod… 1.84e9 30.0 -113. cdroun… 50c9509d-22c…
## 19 Dipodom… 1.88e9 34.8 -108. cdround b15d4952-7d2…
## 20 Dipodom… 1.83e9 40.9 -113. "" 06a00852-f76…
## # ... with 102 more variables: publishingOrgKey <chr>, networkKeys <chr>,
## # installationKey <chr>, publishingCountry <chr>, protocol <chr>,
## # lastCrawled <chr>, lastParsed <chr>, crawlId <int>, extensions <chr>,
## # basisOfRecord <chr>, taxonKey <int>, kingdomKey <int>,
## # phylumKey <int>, classKey <int>, orderKey <int>, familyKey <int>,
## # genusKey <int>, speciesKey <int>, scientificName <chr>, kingdom <chr>,
## # phylum <chr>, order <chr>, family <chr>, genus <chr>, species <chr>,
## # genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
## # dateIdentified <chr>, coordinateUncertaintyInMeters <dbl>, year <int>,
## # month <int>, day <int>, eventDate <chr>, modified <chr>,
## # lastInterpreted <chr>, references <chr>, license <chr>,
## # identifiers <chr>, facts <chr>, relations <chr>, geodeticDatum <chr>,
## # class <chr>, countryCode <chr>, country <chr>, rightsHolder <chr>,
## # identifier <chr>, verbatimEventDate <chr>, datasetName <chr>,
## # collectionCode <chr>, verbatimLocality <chr>, gbifID <chr>,
## # occurrenceID <chr>, taxonID <chr>, catalogNumber <chr>,
## # recordedBy <chr>, http...unknown.org.occurrenceDetails <chr>,
## # institutionCode <chr>, rights <chr>, eventTime <chr>,
## # http...unknown.org.http_..rs.gbif.org.terms.1.0.Multimedia <chr>,
## # identificationID <chr>, occurrenceRemarks <chr>,
## # informationWithheld <chr>, individualCount <int>, sex <chr>,
## # elevation <dbl>, elevationAccuracy <dbl>, continent <chr>,
## # stateProvince <chr>, institutionID <chr>, dynamicProperties <chr>,
## # county <chr>, identificationVerificationStatus <chr>,
## # eventRemarks <chr>, language <chr>, type <chr>, preparations <chr>,
## # locationAccordingTo <chr>, identifiedBy <chr>,
## # georeferencedDate <chr>, nomenclaturalCode <chr>,
## # higherGeography <chr>, georeferencedBy <chr>,
## # georeferenceProtocol <chr>, georeferenceVerificationStatus <chr>,
## # endDayOfYear <chr>, verbatimCoordinateSystem <chr>, locality <chr>,
## # otherCatalogNumbers <chr>,
## # http...unknown.org.http_..rs.tdwg.org.dwc.terms.ResourceRelationship <chr>,
## # organismID <chr>, previousIdentifications <chr>,
## # identificationQualifier <chr>, accessRights <chr>,
## # higherClassification <chr>, georeferenceSources <chr>,
## # identificationRemarks <chr>, lifeStage <chr>,
## # infraspecificEpithet <chr>, …
There are over 200,000 heteromyid records on GBIF. Look at all those cool species of Dipodomys, Perognathus, and Chaetodipus!
What if you weren’t interested in a particular species or family, but wanted to know what was in a particular institution? rgbif has you covered there as well
###DO NOT RUN the below code
BRTC <- occ_search(taxonKey = 359, basisOfRecord = "PRESERVED_SPECIMEN", institutionCode = "TCWC", fields = c('name', 'stateProvince', 'country', 'county', 'locality', 'decimalLongitude', 'decimalLatitude', 'geodeticDatum', 'datasetName', 'institutionCode', 'recordNumber', 'catalogNumber', 'year', 'preparations'), limit = 200000)$data
The above code searches for taxonKey 359 (Mammalia) of PRESERVED_SPECIMENS (so not videos, pictures, or hearsay of a species) at the institutionCode formerly known as the TCWC (our Biodiversity Research and Teaching Collections is still considered the TCWC by GBIF)
###DO RUN the below code instead
BRTC <- read.csv("BRTCMammals.csv", header = T)
dim(BRTC)
## [1] 62415 13
summary(BRTC)
## name stateProvince year
## Peromyscus maniculatus: 3282 Texas :18054 Min. :1929
## Tadarida brasiliensis : 3115 Chiapas : 2866 1st Qu.:1984
## Peromyscus leucopus : 2130 Tamaulipas: 1720 Median :1991
## Artibeus jamaicensis : 2019 Veracruz : 1540 Mean :1991
## Sigmodon hispidus : 1845 Guerrero : 1409 3rd Qu.:1995
## Peromyscus levipes : 1541 (Other) :35661 Max. :2015
## (Other) :48483 NA's : 1165 NA's :58150
## country recordNumber county
## United States:28364 1 : 319 Brewster : 1886
## Mexico :17733 2 : 304 Brazos : 1193
## Honduras : 5417 4 : 284 Hill : 682
## Canada : 1680 3 : 278 Tyler : 587
## Guatemala : 1628 5 : 272 Culberson: 584
## (Other) : 7515 (Other):58570 (Other) :22855
## NA's : 78 NA's : 2388 NA's :34628
## locality
## No specific locality : 661
## 7.4 mi SSW San Lorenzo : 365
## Lancetilla, 40 m : 305
## 10.8 mi S, 2.6 mi W Jicaro Galan: 301
## 9.5 mi NW Melaque : 299
## (Other) :60089
## NA's : 395
## preparations catalogNumber institutionCode
## skin | skull :35050 1 : 1 TCWC:62415
## alcoholic : 9092 10 : 1
## skeleton : 5253 100 : 1
## skull : 5107 1000 : 1
## skin in alcohol | body skeleton: 3336 10000 : 1
## (Other) : 4554 10001 : 1
## NA's : 23 (Other):62409
## decimalLongitude decimalLatitude geodeticDatum
## Min. :-148.21 Min. :-28.42 WGS84:15475
## 1st Qu.:-104.56 1st Qu.: 29.93 NA's :46940
## Median : -98.70 Median : 30.88
## Mean :-100.13 Mean : 32.64
## 3rd Qu.: -96.30 3rd Qu.: 34.95
## Max. : 61.81 Max. : 66.40
## NA's :46940 NA's :46940
Look at all those Peromyscus! These data aren’t up to date, but do include much of the BRTC Mammal collection.
summary(BRTC$country)
## Argentina Australia
## 56 2
## Azerbaijan Belize
## 2 21
## Bermuda Botswana
## 1 12
## Brazil Cameroon
## 51 6
## Canada Chile
## 1680 8
## China Colombia
## 12 173
## Costa Rica Cuba
## 910 54
## Czechia Dominica
## 23 12
## Dominican Republic Ecuador
## 23 513
## Egypt El Salvador
## 5 850
## Ethiopia France
## 224 1
## French Guiana Germany
## 122 2
## Guatemala Guyana
## 1628 27
## Honduras India
## 5417 37
## Indonesia Iraq
## 46 19
## Israel Italy
## 19 4
## Jamaica Japan
## 28 3
## Kenya Korea, Republic of
## 42 3
## Madagascar Malawi
## 2 1
## Malaysia Mexico
## 1 17733
## Morocco Mozambique
## 6 2
## Myanmar Namibia
## 1 52
## Nepal New Caledonia
## 1 11
## Nicaragua Niger
## 1292 27
## Pakistan Panama
## 5 332
## Paraguay Peru
## 2 1242
## Philippines Puerto Rico
## 10 11
## Sierra Leone Somalia
## 1 11
## South Africa Spain
## 77 6
## Sudan Suriname
## 1 24
## Swaziland Switzerland
## 1 12
## Tanzania, United Republic of Thailand
## 6 33
## Trinidad and Tobago Uganda
## 348 3
## United States Venezuela, Bolivarian Republic of
## 28364 639
## Viet Nam Virgin Islands, U.S.
## 14 6
## Zambia Zimbabwe
## 18 6
## NA's
## 78
length(unique(BRTC$country))
## [1] 73
The BRTC Mammal collection houses specimens from over 70 countries.
plot(BRTC[, c("decimalLongitude", "decimalLatitude")])
maps::map(add = T)
Specimens with coordinate information are very US biased.
Index the BRTC records to only pick those from Texas. You can use dplyr::filter
, base::subset
, or %in%
to do this.
TEX <- filter(BRTC, stateProvince == "Texas")
summary(TEX)
## name stateProvince year
## Sigmodon hispidus : 1623 Texas :18054 Min. :1936
## Peromyscus leucopus : 1404 Aguascalientes : 0 1st Qu.:1984
## Geomys breviceps : 883 Ahuachapán : 0 Median :1991
## Tadarida brasiliensis: 745 Al Hillah, Liwa: 0 Mean :1993
## Peromyscus gossypinus: 670 Alabama : 0 3rd Qu.:2004
## Lynx rufus : 664 Alajuela : 0 Max. :2015
## (Other) :12065 (Other) : 0 NA's :16672
## country recordNumber county
## United States:18054 1 : 220 Brewster : 1886
## Argentina : 0 2 : 212 Brazos : 1193
## Australia : 0 5 : 204 Hill : 682
## Azerbaijan : 0 4 : 201 Tyler : 587
## Belize : 0 3 : 198 Culberson: 584
## Bermuda : 0 (Other):16312 (Other) :12811
## (Other) : 0 NA's : 707 NA's : 311
## locality preparations
## No specific locality : 641 skin | skull :8997
## Edwards Plateau : 252 skull :3329
## El Paso : 203 skeleton :2973
## 14 mi SW Hunt, 2200 ft: 197 alcoholic :1187
## College Station : 169 skin in alcohol | body skeleton: 583
## (Other) :16343 (Other) : 975
## NA's : 249 NA's : 10
## catalogNumber institutionCode decimalLongitude decimalLatitude
## 101 : 1 TCWC:18054 Min. :-106.63 Min. :20.56
## 1015 : 1 1st Qu.: -99.57 1st Qu.:29.32
## 1016 : 1 Median : -97.61 Median :30.31
## 1017 : 1 Mean : -98.19 Mean :30.20
## 1018 : 1 3rd Qu.: -96.20 3rd Qu.:31.02
## 1019 : 1 Max. : 61.81 Max. :66.40
## (Other):18048 NA's :7953 NA's :7953
## geodeticDatum
## WGS84:10101
## NA's : 7953
##
##
##
##
##