This example has been created alongside the GBIF Guide
for publishing biological survey and monitoring data to GBIF to
illustrate how data from the National Ecological Observatory
Network (NEON) can be expressed using Darwin Core and the Humboldt Extension for Ecological
Inventories. The workflow produces five Darwin Core tables: Event,
HumboldtEcologicalInventory, Occurrence, ResourceRelationship, and
ExtendedMeasurementOrFact.
The dataset is structured hierarchically, with events defined at
multiple levels (project, domain, site, plot, and plot visit).
Occurrences represent collected ticks as well as subsequent laboratory
results derived from them. Contextual information on sampling design,
effort, and scope is expressed through Humboldt Extension terms.
The U.S. National Science Foundation’s National Ecological Observatory Network (NEON) carries out long-term, standardized monitoring of tick populations and tick-borne pathogens across the United States. Sampling is conducted only at terrestrial sites, where field staff survey designated plots every three to six weeks during the active tick season. Surveys are performed by dragging or flagging along the 160 m perimeter of each 40 x 40 m plot, and ticks of all life stages are collected and preserved in ethanol.
In the laboratory, specimens are identified to species, life stage, and sex when possible. A subset of nymphal ticks is further tested for common pathogens of public health and veterinary concern, including Borrelia burgdorferi (Lyme disease), Anaplasma phagocytophilum (anaplasmosis), Babesia microti (babesiosis), Ehrlichia spp., and several Rickettsia species. Only morphologically identified nymphs are selected for pathogen testing.
More details on NEON’s tick monitoring program are available from the NEON tick data resource page. Protocols referenced in the data can be found in the Document Library.
NEON biological datasets are organized as deeply nested structures that reflect the standardized design of the survey program Thorpe et al. 2016. At the highest level is the project, which defines the scope of the observatory-wide program. Beneath this, the locality component is divided into three spatial scales:
Locality information at the domain, site, and sampling unit levels is captured at the upper Event levels of the hierarchy. Information specific to individual visits is reported at the lowest Event level. Sampling context is described using the Darwin Core Event Core and Humboldt Extension, while additional measurements are recorded using the Extended Measurement or Fact (eMoF) Extension. Depending on the data collected, additional extensions can be incorporated, such as Multimedia or DNA-derived data.
Occurrence records are linked to the appropriate Event levels to capture what organisms were observed or collected during each visit. In this example, tick specimens collected in the field are represented as preserved specimen occurrences, while pathogen testing results are represented as separate occurrence records. To maintain associations with the original tick specimens, these records are linked back to the host occurrences using the Resource Relationship Extension.
See also Box 1 of the Guide.
The code examples in this document use R functions provided by the
neonUtilities package together with dplyr for
data manipulation. These libraries allow automated download and
processing of NEON data products to prepare them for mapping to Darwin
Core and the Humboldt Extension.
library(neonUtilities)
library(dplyr)
library(stringr)
library(httr2)
library(purrr)
library(tibble)
The data used in this example come from two NEON data products: tick pathogen status and ticks sampled using drag cloths. This workflow uses data from RELEASE-2026.
A NEON token was used to download the original data for this workflow. Tokens are optional but recommended, as they allow authenticated access and help avoid API rate limits when downloading large datasets. Users preparing their own mappings may wish to generate a personal token. For more information, see the Using an API Token when Accessing NEON Data with neonUtilities tutorial and the NEON API token documentation.
if (!dir.exists("data")) {
dir.create("data")
}
if (!dir.exists("outputs")) {
dir.create("outputs")
}
if (!dir.exists("outputs/full")) {
dir.create("outputs/full")
}
if (!dir.exists("outputs/subset")) {
dir.create("outputs/subset")
}
if (!dir.exists("outputs/zipped")) {
dir.create("outputs/zipped")
}
if (file.exists("data/tick.pathogen.RData")) {
load("data/tick.pathogen.RData")
} else {
tick.pathogen <- loadByProduct(dpID="DP1.10092.001",
package = "basic",
release = "RELEASE-2026",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
save(tick.pathogen, file = "data/tick.pathogen.RData")
}
if (file.exists("data/tick.occurrence.RData")) {
load("data/tick.occurrence.RData")
} else {
tick.occurrence <- loadByProduct(dpID="DP1.10093.001",
package = "basic",
release = "RELEASE-2026",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
save(tick.occurrence, file = "data/tick.occurrence.RData")
}
Catalog numbers for ticks archived at the NEON Biorepository are not available through the data downloads above, so we accessed these directly through the NEON Data API.
Only a subset of ticks are currently housed at the Biorepository, with most specimens located at the U.S. National Tick Collection at Georgia Southern University; we distinguished archived specimens based on the identification protocol version.
if (file.exists("data/catalognumbers.ticks.RData")) {
load("data/catalognumbers.ticks.RData")
} else {
sample_ids_ticks <- tick.occurrence$tck_taxonomyProcessed %>%
filter(
str_starts(identificationProtocolVersion, "NEON"),
!is.na(subsampleID)
) %>%
distinct(subsampleID) %>%
pull(subsampleID)
get_catalog_number_ticks <- function(sub_id) {
resp <- tryCatch({
request("https://data.neonscience.org/api/v0/samples/view") %>%
req_url_query(
apiToken = Sys.getenv("NEON_TOKEN"),
sampleTag = sub_id,
sampleClass = "tck_identification_in.subsampleID"
) %>%
req_retry(max_tries = 1) %>% # retry transient failures
req_timeout(30) %>%
req_perform()
}, error = function(e) NULL)
if (is.null(resp)) {
return(tibble(subsampleID = sub_id, archiveGuid = NA_character_))
}
js <- resp_body_json(resp, simplifyVector = TRUE)
guid <- tryCatch(
js$data$sampleViews$archiveGuid,
error = function(e) NA_character_
)
tibble(
subsampleID = sub_id,
archiveGuid = guid %||% NA_character_
)
}
catalognumbers.ticks <- map_dfr(seq_along(sample_ids_ticks), function(i) {
id <- sample_ids_ticks[i]
if (i %% 100 == 0) message("Processed ", i, " of ", length(sample_ids_ticks))
Sys.sleep(0.26) #rate limiting is 4 requests a second
get_catalog_number_ticks(id)
})
catalognumbers.ticks <- setNames(
catalognumbers.ticks$archiveGuid,
catalognumbers.ticks$subsampleID
)
save(catalognumbers.ticks, file = "data/catalognumbers.ticks.RData")
}
All remaining pathogen genomic extracts from testing are stored at the Biorepository. We can grab all available IDs.
if (file.exists("data/catalognumbers.pathogens.RData")) {
load("data/catalognumbers.pathogens.RData")
} else {
testing_ids <- tick.pathogen$tck_pathogen %>%
filter(!is.na(testingID)) %>%
distinct(testingID) %>%
pull(testingID)
get_catalog_number_pathogens <- function(test_id) {
resp <- tryCatch({
request("https://data.neonscience.org/api/v0/samples/view") %>%
req_url_query(
apiToken = Sys.getenv("NEON_TOKEN"),
sampleTag = test_id,
sampleClass = "tck_pathogenresults_in.testingID"
) %>%
req_retry(max_tries = 1) %>%
req_timeout(30) %>%
req_perform()
}, error = function(e) NULL)
if (is.null(resp)) {
return(tibble(testingID = test_id, archiveGuid = NA_character_))
}
js <- resp_body_json(resp, simplifyVector = TRUE)
guid <- tryCatch(
js$data$sampleViews$archiveGuid,
error = function(e) NA_character_
)
tibble(
testingID = test_id,
archiveGuid = guid %||% NA_character_
)
}
catalognumbers.pathogens <- map_dfr(seq_along(testing_ids), function(i) {
id <- testing_ids[i]
if (i %% 100 == 0) message("Processed ", i, " of ", length(testing_ids))
Sys.sleep(0.26) #rate limiting is 4 requests a second
get_catalog_number_pathogens(id)
})
catalognumbers.pathogens <- setNames(
catalognumbers.pathogens$archiveGuid,
catalognumbers.pathogens$testingID
)
save(catalognumbers.pathogens, file = "data/catalognumbers.pathogens.RData")
}
The event hierarchy presented here follows the structure of NEON’s sampling design, with events defined at project, domain, site, sampling unit (plot), and individual sampling event (plot visit) levels. Some event-level information is not provided directly in the NEON data download, such as details describing the overall project, individual domains, and sites. To capture the full hierarchy of the sampling design, these additional levels are reconstructed and included in the event table. The values are hard-coded based on latest (2025-09-24) NEON information.
Domain-level metadata was compiled from NEON documentation and geospatial files. A polygon shapefile defining NEON’s domain boundaries was downloaded from NEONDomains_2024.zip (last updated October 2024, CRS: Geographic WGS 84). Domain centroids (latitude and longitude) and associated uncertainty values were calculated from this shapefile using ArcGIS.
Site-level metadata was retrieved directly from NEON’s public API on September 24, 2025 using the field site metadata download (NEON_Field_Site_Metadata_20250924). This download represents a flattened subset of the full JSON objects returned by the locations/sites API endpoint.
All fields from the survey event table template are retained for completeness, even when particular values are not applicable for this dataset.
The project event describes the overall tick monitoring program within NEON. It defines the scope of the survey (taxonomic, geographic, temporal), the general sampling protocols, and whether absence, abundance, and material samples are reported. The data for the Event Core and Humboldt extension are split into two separate files.
Mapping Event core fields for the project event level:
NEON_project_event_core <- data.frame(eventID = "NEON",
parentEventID = NA,
fieldNumber = NA,
habitat = NA,
locationID = NA,
countryCode = "US",
decimalLatitude = NA,
decimalLongitude = NA,
coordinateUncertaintyInMeters = NA,
geodeticDatum = NA,
locality = NA,
sampleSizeValue = NA,
sampleSizeUnit = NA,
footprintWKT = NA,
footprintSRS = NA,
eventDate = paste(
min(tick.occurrence$tck_fielddata$collectDate, na.rm = TRUE),
max(tick.occurrence$tck_fielddata$collectDate, na.rm = TRUE),
sep = "/"
),
eventTime = NA,
eventType = "project",
samplingProtocol = "Drag sampling | Flagging",
dataGeneralizations = NA,
informationWithheld = NA,
fieldNotes = NA,
eventRemarks = NA,
identifiedBy = paste(unique(tick.occurrence$tck_taxonomyProcessed$laboratoryName), collapse = " | ")
)
Mapping Humboldt extension fields for the project event level:
NEON_project_event_humboldt <- data.frame(eventID = "NEON",
siteNestingDescription = "46 terrestrial sites each with at least six 40x40 m plots designated for tick sampling (plots may be decommissioned and reassigned as necessary, but six plots are required for sampling) located across 19 ecoclimatic domains",
siteCount = length(unique(tick.occurrence$tck_fielddata$plotID)),
verbatimSiteNames = paste(sort(unique(tick.occurrence$tck_fielddata$plotID)), collapse = " | "),
verbatimSiteDescriptions = NA,
reportedWeather = NA,
reportedExtremeConditions = NA,
totalAreaSampledValue = NA,
totalAreaSampledUnit = NA,
geospatialScopeAreaValue = 9428288, #full geographic extent of USA-NEON boundaries
geospatialScopeAreaUnit = "km²",
isVegetationCoverReported = "false",
eventDurationValue = NA,
eventDurationUnit = NA,
inventoryTypes = NA,
compilationTypes = NA,
compilationSourceTypes = NA,
protocolNames = "Drag sampling | Flagging",
protocolDescriptions = "Ticks are sampled at six plots per site every three to six weeks, depending on prior detections. Sampling follows the 160 m perimeter of each 40 x 40 m plot, using drag cloths (or flagging in dense vegetation). Ticks from all life stages are collected every few meters and preserved in ethanol, with bouts scheduled between green-up and dormancy. Samples are sent to external labs where they are sorted, counted and a subset of identified nymphs are tested for pathogens.",
protocolReferences = "Paull, S. 2022. TOS Protocol and Procedure: TCK - Tick and Tick-Borne Pathogen Sampling. NEON.DOC.014045. NEON (National Ecological Observatory Network). | Laboratory of Medical Zoology (LMZ). 2023. NEON Tick Pathogen Testing SOP, V4.01. University of Massachusetts, Amherst. | Beati, L. 2021. Tick Identification Instructions, USNTC Standard Operating Procedure (SOP). V3. Georgia Southern University, US National Tick Collection (USNTC).",
isAbsenceReported = NA,
absentTaxa = NA,
isAbundanceReported = "true",
isAbundanceCapReported = "false",
abundanceCap = NA,
hasMaterialSamples = "true",
materialSampleTypes = "wholeOrganism",
hasVouchers = "true",
voucherInstitutions = "US National Tick Collection, Georgia Southern University | NEON Biorepository, Arizona State University",
isLeastSpecificTargetCategoryQuantityInclusive = "false",
verbatimTargetScope = "ticks | selected tick pathogens",
targetTaxonomicScope = "Ixodidae | Borrelia burgdorferi | Borrelia miyamotoi | Anaplasma phagocytophilum | Rickettsia rickettsii | Ehrlichia chaffeensis | Borrelia lonestari | Babesia microti | Ehrlichia ewingii | Borrelia burgdorferi sensu lato | Ehrlichia muris-like | Borrelia mayonii | Francisella tularensis | Borrelia sp. | Rickettsia parkeri | Rickettsia philipii",
excludedTaxonomicScope = NA,
isTaxonomicScopeFullyReported = "true",
taxonCompletenessReported = "notReported",
taxonCompletenessProtocols = "Tick drag/flag methods are standardized to maximize detection of all life stages of Ixodidae during the active season.",
hasNonTargetTaxa = "false",
areNonTargetTaxaFullyReported = NA,
nonTargetTaxa = NA,
targetLifeStageScope = "adult | nymph | larvae",
excludedLifeStageScope = "egg",
isLifeStageScopeFullyReported = "true",
targetDegreeOfEstablishmentScope = NA,
excludedDegreeOfEstablishmentScope = NA,
isDegreeOfEstablishmentScopeFullyReported = NA,
targetGrowthFormScope = NA,
excludedGrowthFormScope = NA,
isGrowthFormScopeFullyReported = NA,
hasNonTargetOrganisms = NA,
targetHabitatScope = NA,
excludedHabitatScope = NA,
samplingEffort = "6 plots per bout, 160 m perimeter sampled per plot",
isSamplingEffortReported = "true",
samplingEffortProtocol = "Ticks are sampled at six plots per site, with bouts every 3 or 6 weeks depending on intensity. Each bout consists of dragging (or flagging if needed) along the full 160 m perimeter of each 40 x 40 m plot. Ticks of all life stages are collected at intervals and preserved in ethanol.",
samplingEffortValue = "6, 160",
samplingEffortUnit = "plots per bout, meters per plot circuit",
samplingPerformedBy = "NEON Field Staff"
)
Domains represent NEON’s ecoclimatic regions. Each domain includes one or more terrestrial sites where tick sampling occurs. Tick data are available for 19 of the 20 NEON domains, as no tick sampling is conducted in Domain 20 (Hawai‘i).
Setup:
NEON_domain_metadata <- read.csv("data/NEON_Domain_Metadata_20250924.csv", stringsAsFactors = FALSE)
NEON_domain_list <- unique(tick.occurrence$tck_fielddata[, c("domainID")])
#index
domain_idx <- match(NEON_domain_list, NEON_domain_metadata$domain_id)
Mapping Event core fields for the domain event level:
added_domain_data_core <- data.frame(eventID = NEON_domain_metadata$domain_id[domain_idx],
parentEventID = "NEON",
fieldNumber = NA,
habitat = NA,
locationID = NEON_domain_metadata$domain_id[domain_idx],
countryCode = "US",
decimalLatitude = NEON_domain_metadata$decimalLatitude[domain_idx],
decimalLongitude = NEON_domain_metadata$decimalLongitude[domain_idx],
coordinateUncertaintyInMeters = NEON_domain_metadata$coordinateUncertaintyinMeters[domain_idx],
geodeticDatum = "WGS84",
locality = paste(NEON_domain_metadata$domain_name[domain_idx], " (", NEON_domain_metadata$domain_id[domain_idx], ")", sep = ""),
sampleSizeValue = NA,
sampleSizeUnit = NA,
footprintWKT = NA,
footprintSRS = NA,
eventDate = tapply(
tick.occurrence$tck_fielddata$collectDate,
tick.occurrence$tck_fielddata$domainID,
function(x) paste(min(x, na.rm = TRUE), max(x, na.rm = TRUE), sep = "/")
)[NEON_domain_list],
eventTime = NA,
eventType = "survey", #eventType describes the event, not the internal vocabulary
samplingProtocol = "Drag sampling | Flagging",
dataGeneralizations = NA,
informationWithheld = NA,
fieldNotes = NA,
eventRemarks = ifelse(!(NEON_domain_list %in% (
tick.occurrence$tck_taxonomyProcessed %>%
left_join(tick.occurrence$tck_fielddata %>% dplyr::select(sampleID, domainID), by = c("sampleID", "domainID")) %>%
dplyr::distinct(domainID) %>%
dplyr::pull(domainID)
)),
"identifiedBy is NA because there are no occurrence records for this event",
NA),
identifiedBy = tick.occurrence$tck_taxonomyProcessed %>%
left_join(tick.occurrence$tck_fielddata %>% select(sampleID, domainID),
by = c("sampleID", "domainID")) %>%
group_by(domainID) %>%
summarise(identifiedBy = paste(sort(unique(laboratoryName)), collapse = " | "),
.groups = "drop") %>%
right_join(data.frame(domainID = NEON_domain_list), by = "domainID") %>%
arrange(match(domainID, NEON_domain_list)) %>%
mutate(identifiedBy = ifelse(is.na(identifiedBy), NA, identifiedBy)) %>%
pull(identifiedBy)
)
Mapping Humboldt extension fields for the domain event level:
added_domain_data_humboldt <- data.frame(eventID = NEON_domain_metadata$domain_id[domain_idx],
siteNestingDescription = "Each domain contains 1-3 terrestrial field sites, each with at least six 40x40 m plots designated for tick sampling (plots may be decommissioned and reassigned as necessary, but six plots are required for sampling)",
siteCount = tapply(tick.occurrence$tck_fielddata$plotID,
tick.occurrence$tck_fielddata$domainID,
function(x) length(unique(x)))[NEON_domain_list],
verbatimSiteNames = tapply(tick.occurrence$tck_fielddata$plotID,
tick.occurrence$tck_fielddata$domainID,
function(x) paste(sort(unique(x)), collapse = " | ")
)[NEON_domain_list],
verbatimSiteDescriptions = NA,
reportedWeather = NA,
reportedExtremeConditions = NA,
totalAreaSampledValue = NA,
totalAreaSampledUnit = NA,
geospatialScopeAreaValue = NEON_domain_metadata$sq_km[domain_idx],
geospatialScopeAreaUnit = "km²",
isVegetationCoverReported = "false",
eventDurationValue = NA,
eventDurationUnit = NA,
inventoryTypes = NA,
compilationTypes = NA,
compilationSourceTypes = NA,
protocolNames = "Drag sampling | Flagging",
protocolDescriptions = "Ticks are sampled at six plots per site every three to six weeks, depending on prior detections. Sampling follows the 160 m perimeter of each 40 x 40 m plot, using drag cloths (or flagging in dense vegetation). Ticks from all life stages are collected every few meters and preserved in ethanol, with bouts scheduled between green-up and dormancy. Samples are sent to external labs where they are sorted, counted and a subset of identified nymphs are tested for pathogens.",
protocolReferences = "Paull, S. 2022. TOS Protocol and Procedure: TCK - Tick and Tick-Borne Pathogen Sampling. NEON.DOC.014045. NEON (National Ecological Observatory Network). | Laboratory of Medical Zoology (LMZ). 2023. NEON Tick Pathogen Testing SOP, V4.01. University of Massachusetts, Amherst. | Beati, L. 2021. Tick Identification Instructions, USNTC Standard Operating Procedure (SOP). V3. Georgia Southern University, US National Tick Collection (USNTC).",
isAbsenceReported = NA,
absentTaxa = NA,
isAbundanceReported = "true",
isAbundanceCapReported = "false",
abundanceCap = NA,
hasMaterialSamples = "true",
materialSampleTypes = "wholeOrganism",
hasVouchers = "true",
voucherInstitutions = "US National Tick Collection, Georgia Southern University | NEON Biorepository, Arizona State University",
isLeastSpecificTargetCategoryQuantityInclusive = "false",
verbatimTargetScope = "ticks | selected tick pathogens",
targetTaxonomicScope = "Ixodidae | Borrelia burgdorferi | Borrelia miyamotoi | Anaplasma phagocytophilum | Rickettsia rickettsii | Ehrlichia chaffeensis | Borrelia lonestari | Babesia microti | Ehrlichia ewingii | Borrelia burgdorferi sensu lato | Ehrlichia muris-like | Borrelia mayonii | Francisella tularensis | Borrelia sp. | Rickettsia parkeri | Rickettsia philipii",
excludedTaxonomicScope = NA,
isTaxonomicScopeFullyReported = "true",
taxonCompletenessReported = "notReported",
taxonCompletenessProtocols = "Tick drag/flag methods are standardized to maximize detection of all life stages of Ixodidae during the active season.",
hasNonTargetTaxa = "false",
areNonTargetTaxaFullyReported = NA,
nonTargetTaxa = NA,
targetLifeStageScope = "adult | nymph | larvae",
excludedLifeStageScope = "egg",
isLifeStageScopeFullyReported = "true",
targetDegreeOfEstablishmentScope = NA,
excludedDegreeOfEstablishmentScope = NA,
isDegreeOfEstablishmentScopeFullyReported = NA,
targetGrowthFormScope = NA,
excludedGrowthFormScope = NA,
isGrowthFormScopeFullyReported = NA,
hasNonTargetOrganisms = NA,
targetHabitatScope = NA,
excludedHabitatScope = NA,
samplingEffort = "6 plots per bout, 160 m perimeter sampled per plot",
isSamplingEffortReported = "true",
samplingEffortProtocol = "Ticks are sampled at six plots per site, with bouts every 3 or 6 weeks depending on intensity. Each bout consists of dragging (or flagging if needed) along the full 160 m perimeter of each 40 x 40 m plot. Ticks of all life stages are collected at intervals and preserved in ethanol.",
samplingEffortValue = "6, 160",
samplingEffortUnit = "plots per bout, meters per plot circuit",
samplingPerformedBy = "NEON Field Staff"
)
Each NEON terrestrial site contains at least six 40x40 m plots designated for tick sampling. Tick sampling is conducted only at terrestrial sites (47 in total), but because no sampling is performed at PUUM in Hawai‘i (Domain 20), only 46 sites are included here. Site events record location metadata, habitat information, and elevation.
Setup:
NEON_site_metadata <- read.csv(
"https://www.neonscience.org/field-sites/exports/NEON_Field_Site_Metadata_20250924",
stringsAsFactors = FALSE
)
NEON_site_list <- unique(tick.occurrence$tck_fielddata[, c("siteID")])
#indexes
site_idx <- match(NEON_site_list, NEON_site_metadata$site_id)
Mapping Event core fields for the site event level:
added_site_data_core <- data.frame(eventID = NEON_site_metadata$site_id[site_idx],
parentEventID = NEON_site_metadata$domain_id[site_idx],
fieldNumber = NA,
habitat = gsub("\\|", " | ", NEON_site_metadata$dominant_nlcd_classes[site_idx]),
locationID = NEON_site_metadata$site_id[site_idx],
countryCode = "US",
county = NEON_site_metadata$site_county[site_idx],
stateProvince = NEON_site_metadata$site_state[site_idx],
decimalLatitude = NEON_site_metadata$latitude[site_idx],
decimalLongitude = NEON_site_metadata$longitude[site_idx],
coordinateUncertaintyInMeters = NA,
geodeticDatum = "WGS84",
locality = paste(NEON_site_metadata$site_name[site_idx], " (", NEON_site_metadata$site_id[site_idx], ")", sep = ""),
minimumElevationInMeters = NEON_site_metadata$minimum_elevation_m[site_idx],
maximumElevationInMeters = NEON_site_metadata$maximum_elevation_m[site_idx],
verbatimElevation = paste(NEON_site_metadata$mean_evelation_m[site_idx], "m", sep=""),
sampleSizeValue = NA,
sampleSizeUnit = NA,
footprintWKT = NA,
footprintSRS = NA,
eventDate = tapply(
tick.occurrence$tck_fielddata$collectDate,
tick.occurrence$tck_fielddata$siteID,
function(x) paste(min(x, na.rm = TRUE), max(x, na.rm = TRUE), sep = "/")
)[NEON_site_list],
eventTime = NA,
eventType = "survey",
samplingProtocol = "Drag sampling | Flagging",
dataGeneralizations = NA,
informationWithheld = NA,
fieldNotes = NA,
eventRemarks = ifelse(!(NEON_site_list %in% (
tick.occurrence$tck_taxonomyProcessed %>%
left_join(tick.occurrence$tck_fielddata %>% dplyr::select(sampleID, siteID), by = c("sampleID", "siteID")) %>%
dplyr::distinct(siteID) %>%
dplyr::pull(siteID)
)),
"identifiedBy is NA because there are no occurrence records for this event",
NA),
identifiedBy = tick.occurrence$tck_taxonomyProcessed %>%
left_join(tick.occurrence$tck_fielddata %>% select(sampleID, siteID),
by = c("sampleID", "siteID")) %>%
group_by(siteID) %>%
summarise(identifiedBy = paste(sort(unique(laboratoryName)), collapse = " | "),
.groups = "drop") %>%
right_join(data.frame(siteID = NEON_site_list), by = "siteID") %>%
arrange(match(siteID, NEON_site_list)) %>%
mutate(identifiedBy = ifelse(is.na(identifiedBy), NA, identifiedBy)) %>%
pull(identifiedBy)
)
Mapping Humboldt extension fields for the site event level:
added_site_data_humboldt <- data.frame(eventID = NEON_site_metadata$site_id[site_idx],
siteNestingDescription = "Each site contains at least six 40x40 m plots designated for tick sampling (plots may be decommissioned and reassigned as necessary, but six plots are required for sampling)",
siteCount = tapply(tick.occurrence$tck_fielddata$plotID,
tick.occurrence$tck_fielddata$siteID,
function(x) length(unique(x)))[NEON_site_list],
verbatimSiteNames = tapply(tick.occurrence$tck_fielddata$plotID,
tick.occurrence$tck_fielddata$siteID,
function(x) paste(sort(unique(x)), collapse = " | ")
)[NEON_site_list],
verbatimSiteDescriptions = NEON_site_metadata$site_type[site_idx],
reportedWeather = NA,
reportedExtremeConditions = NA,
totalAreaSampledValue = NA,
totalAreaSampledUnit = NA,
geospatialScopeAreaValue = NEON_site_metadata$terrestrial_sampling_boundary_size_km2[site_idx],
geospatialScopeAreaUnit = "km²",
isVegetationCoverReported = "false",
eventDurationValue = NA,
eventDurationUnit = NA,
inventoryTypes = NA,
compilationTypes = NA,
compilationSourceTypes = NA,
protocolNames = "Drag sampling | Flagging",
protocolDescriptions = "Ticks are sampled at six plots per site every three to six weeks, depending on prior detections. Sampling follows the 160 m perimeter of each 40 x 40 m plot, using drag cloths (or flagging in dense vegetation). Ticks from all life stages are collected every few meters and preserved in ethanol, with bouts scheduled between green-up and dormancy. Samples are sent to external labs where they are sorted, counted and a subset of identified nymphs are tested for pathogens.",
protocolReferences = "Paull, S. 2022. TOS Protocol and Procedure: TCK - Tick and Tick-Borne Pathogen Sampling. NEON.DOC.014045. NEON (National Ecological Observatory Network). | Laboratory of Medical Zoology (LMZ). 2023. NEON Tick Pathogen Testing SOP, V4.01. University of Massachusetts, Amherst. | Beati, L. 2021. Tick Identification Instructions, USNTC Standard Operating Procedure (SOP). V3. Georgia Southern University, US National Tick Collection (USNTC).",
isAbsenceReported = NA,
absentTaxa = NA,
isAbundanceReported = "true",
isAbundanceCapReported = "false",
abundanceCap = NA,
hasMaterialSamples = "true",
materialSampleTypes = "wholeOrganism",
hasVouchers = "true",
voucherInstitutions = "US National Tick Collection, Georgia Southern University | NEON Biorepository, Arizona State University",
isLeastSpecificTargetCategoryQuantityInclusive = "false",
verbatimTargetScope = "ticks | selected tick pathogens",
targetTaxonomicScope = "Ixodidae | Borrelia burgdorferi | Borrelia miyamotoi | Anaplasma phagocytophilum | Rickettsia rickettsii | Ehrlichia chaffeensis | Borrelia lonestari | Babesia microti | Ehrlichia ewingii | Borrelia burgdorferi sensu lato | Ehrlichia muris-like | Borrelia mayonii | Francisella tularensis | Borrelia sp. | Rickettsia parkeri | Rickettsia philipii",
excludedTaxonomicScope = NA,
isTaxonomicScopeFullyReported = "true",
taxonCompletenessReported = "notReported",
taxonCompletenessProtocols = "Tick drag/flag methods are standardized to maximize detection of all life stages of Ixodidae during the active season.",
hasNonTargetTaxa = "false",
areNonTargetTaxaFullyReported = NA,
nonTargetTaxa = NA,
targetLifeStageScope = "adult | nymph | larvae",
excludedLifeStageScope = "egg",
isLifeStageScopeFullyReported = "true",
targetDegreeOfEstablishmentScope = NA,
excludedDegreeOfEstablishmentScope = NA,
isDegreeOfEstablishmentScopeFullyReported = NA,
targetGrowthFormScope = NA,
excludedGrowthFormScope = NA,
isGrowthFormScopeFullyReported = NA,
hasNonTargetOrganisms = NA,
targetHabitatScope = NA,
excludedHabitatScope = NA,
samplingEffort = "6 plots per bout, 160 m perimeter sampled per plot",
isSamplingEffortReported = "true",
samplingEffortProtocol = "Ticks are sampled at six plots per site, with bouts every 3 or 6 weeks depending on intensity. Each bout consists of dragging (or flagging if needed) along the full 160 m perimeter of each 40 x 40 m plot. Ticks of all life stages are collected at intervals and preserved in ethanol.",
samplingEffortValue = "6, 160",
samplingEffortUnit = "plots per bout, meters per plot circuit",
samplingPerformedBy = "NEON Field Staff"
)
Plot events represent the smallest fixed spatial unit within NEON’s tick monitoring design. Each plot is a 40x40 m area where drag or flag sampling is carried out. Plot-level metadata include plot identifiers, habitat classification, and elevation.
Setup:
NEON_plot_metadata <- unique(tick.occurrence$tck_fielddata[, c("namedLocation",
"siteID",
"plotID",
"plotType",
"nlcdClass",
"decimalLatitude",
"decimalLongitude",
"geodeticDatum",
"coordinateUncertainty",
"elevation",
"elevationUncertainty")])
NEON_plot_metadata <- NEON_plot_metadata %>% #There's an issue where a few plots have two recorded elevations, causing them to be duplicated, so we'll just remove the first one for each
distinct(namedLocation, plotID, .keep_all = TRUE)
Mapping Event core fields for the plot event level:
added_plot_data_core <- data.frame(eventID = NEON_plot_metadata$plotID,
parentEventID = NEON_plot_metadata$siteID,
fieldNumber = NA,
habitat = NEON_plot_metadata$nlcdClass,
locationID = NEON_plot_metadata$plotID,
countryCode = "US",
decimalLatitude = NEON_plot_metadata$decimalLatitude,
decimalLongitude = NEON_plot_metadata$decimalLongitude,
coordinateUncertaintyInMeters = NEON_plot_metadata$coordinateUncertainty,
geodeticDatum = "WGS84",
locality = NEON_plot_metadata$plotID,
minimumElevationInMeters = NEON_plot_metadata$elevation - NEON_plot_metadata$elevationUncertainty,
maximumElevationInMeters = NEON_plot_metadata$elevation + NEON_plot_metadata$elevationUncertainty,
verbatimElevation = paste(NEON_plot_metadata$elevation, "m", sep=""),
sampleSizeValue = 40 * 4,
sampleSizeUnit = "m",
footprintWKT = NA,
footprintSRS = NA,
eventDate = tapply(
tick.occurrence$tck_fielddata$collectDate,
tick.occurrence$tck_fielddata$plotID,
function(x) paste(min(x, na.rm = TRUE), max(x, na.rm = TRUE), sep = "/")
)[NEON_plot_metadata$plotID],
eventTime = NA,
eventType = "survey",
samplingProtocol = "Drag sampling | Flagging",
dataGeneralizations = NA,
informationWithheld = NA,
fieldNotes = NA,
eventRemarks = ifelse(!(NEON_plot_metadata$plotID %in% names(tapply(
tick.occurrence$tck_taxonomyProcessed$laboratoryName,
tick.occurrence$tck_taxonomyProcessed$plotID,
function(x) paste(sort(unique(x)), collapse = " | ")
))),
"identifiedBy is NA because there are no occurrence records for this event",
NA
),
identifiedBy = tapply(
tick.occurrence$tck_taxonomyProcessed$laboratoryName,
tick.occurrence$tck_taxonomyProcessed$plotID,
function(x) paste(sort(unique(x)), collapse = " | ")
)[NEON_plot_metadata$plotID]
)
Mapping Humboldt extension fields for the plot event level:
added_plot_data_humboldt <- data.frame(eventID = NEON_plot_metadata$plotID,
siteNestingDescription = "One 40x40 m plot designated for tick sampling",
siteCount = 1,
verbatimSiteNames = NEON_plot_metadata$plotID,
verbatimSiteDescriptions = NA,
reportedWeather = NA,
reportedExtremeConditions = NA,
totalAreaSampledValue = NA,
totalAreaSampledUnit = NA,
geospatialScopeAreaValue = NA,
geospatialScopeAreaUnit = NA,
isVegetationCoverReported = "false",
eventDurationValue = NA,
eventDurationUnit = NA,
inventoryTypes = NA,
compilationTypes = NA,
compilationSourceTypes = NA,
protocolNames = "Drag sampling | Flagging",
protocolDescriptions = "Ticks are sampled at six plots per site every three to six weeks, depending on prior detections. Sampling follows the 160 m perimeter of each 40 x 40 m plot, using drag cloths (or flagging in dense vegetation). Ticks from all life stages are collected every few meters and preserved in ethanol, with bouts scheduled between green-up and dormancy. Samples are sent to external labs where they are sorted, counted and a subset of identified nymphs are tested for pathogens.",
protocolReferences = "Paull, S. 2022. TOS Protocol and Procedure: TCK - Tick and Tick-Borne Pathogen Sampling. NEON.DOC.014045. NEON (National Ecological Observatory Network). | Laboratory of Medical Zoology (LMZ). 2023. NEON Tick Pathogen Testing SOP, V4.01. University of Massachusetts, Amherst. | Beati, L. 2021. Tick Identification Instructions, USNTC Standard Operating Procedure (SOP). V3. Georgia Southern University, US National Tick Collection (USNTC).",
isAbsenceReported = NA,
absentTaxa = NA,
isAbundanceReported = "true",
isAbundanceCapReported = "false",
abundanceCap = NA,
hasMaterialSamples = "true",
materialSampleTypes = "wholeOrganism",
hasVouchers = "true",
voucherInstitutions = "US National Tick Collection, Georgia Southern University | NEON Biorepository, Arizona State University",
isLeastSpecificTargetCategoryQuantityInclusive = "false",
verbatimTargetScope = "ticks | selected tick pathogens",
targetTaxonomicScope = "Ixodidae | Borrelia burgdorferi | Borrelia miyamotoi | Anaplasma phagocytophilum | Rickettsia rickettsii | Ehrlichia chaffeensis | Borrelia lonestari | Babesia microti | Ehrlichia ewingii | Borrelia burgdorferi sensu lato | Ehrlichia muris-like | Borrelia mayonii | Francisella tularensis | Borrelia sp. | Rickettsia parkeri | Rickettsia philipii",
excludedTaxonomicScope = NA,
isTaxonomicScopeFullyReported = "true",
taxonCompletenessReported = "notReported",
taxonCompletenessProtocols = "Tick drag/flag methods are standardized to maximize detection of all life stages of Ixodidae during the active season.",
hasNonTargetTaxa = "false",
areNonTargetTaxaFullyReported = NA,
nonTargetTaxa = NA,
targetLifeStageScope = "adult | nymph | larvae",
excludedLifeStageScope = "egg",
isLifeStageScopeFullyReported = "true",
targetDegreeOfEstablishmentScope = NA,
excludedDegreeOfEstablishmentScope = NA,
isDegreeOfEstablishmentScopeFullyReported = NA,
targetGrowthFormScope = NA,
excludedGrowthFormScope = NA,
isGrowthFormScopeFullyReported = NA,
hasNonTargetOrganisms = NA,
targetHabitatScope = NA,
excludedHabitatScope = NA,
samplingEffort = "160 m perimeter sampled per bout",
isSamplingEffortReported = "true",
samplingEffortProtocol = "Ticks are sampled at six plots per site, with bouts every 3 or 6 weeks depending on intensity. Each bout consists of dragging (or flagging if needed) along the full 160 m perimeter of each 40 x 40 m plot. Ticks of all life stages are collected at intervals and preserved in ethanol.",
samplingEffortValue = "6, 160",
samplingEffortUnit = "plots per bout, meters per plot circuit",
samplingPerformedBy = "NEON Field Staff"
)
Plot visit events describe the actual sampling bouts, usually every three to six weeks during green-up. At this level, we record specific sampling dates, methods, conditions, and any remarks made during fieldwork. Presence-absence of ticks is also recorded here: if no ticks are collected during a visit, the absence is captured at the event level. These events provide the link between the hierarchical survey design and the occurrence records derived from individual tick or pathogen samples.
Setup:
NEON_plot_visit_metadata <- tick.occurrence$tck_fielddata
NEON_tick_occurrence_data <- tick.occurrence$tck_taxonomyProcessed
#Add identification protocol to determine voucher location
NEON_plot_visit_metadata$identificationProtocolVersion <-
NEON_tick_occurrence_data$identificationProtocolVersion[
match(NEON_plot_visit_metadata$sampleID,
NEON_tick_occurrence_data$sampleID)
]
Mapping Event core fields for the plot visit event level:
added_plot_visit_data_core <- data.frame(eventID = ifelse(is.na(NEON_plot_visit_metadata$sampleID),
paste0(
NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")),
NEON_plot_visit_metadata$sampleID
),
parentEventID = NEON_plot_visit_metadata$plotID,
fieldNumber = NA,
habitat = NEON_plot_visit_metadata$nlcdClass,
locationID = NEON_plot_visit_metadata$namedLocation,
countryCode = "US",
decimalLatitude = NEON_plot_visit_metadata$decimalLatitude,
decimalLongitude = NEON_plot_visit_metadata$decimalLongitude,
coordinateUncertaintyInMeters = NEON_plot_visit_metadata$coordinateUncertainty,
geodeticDatum = "WGS84",
locality = NEON_plot_visit_metadata$plotID,
minimumElevationInMeters = NEON_plot_visit_metadata$elevation - NEON_plot_visit_metadata$elevationUncertainty,
maximumElevationInMeters = NEON_plot_visit_metadata$elevation + NEON_plot_visit_metadata$elevationUncertainty,
verbatimElevation = paste(NEON_plot_visit_metadata$elevation, "m", sep=""),
sampleSizeValue = NEON_plot_visit_metadata$totalSampledArea,
sampleSizeUnit = ifelse(
is.na(NEON_plot_visit_metadata$totalSampledArea), NA, "m"
),
footprintWKT = NA,
footprintSRS = NA,
eventDate = paste(NEON_plot_visit_metadata$collectDate),
eventTime = NA,
eventType = "site visit",
samplingProtocol = NEON_plot_visit_metadata$samplingMethod,
dataGeneralizations = NA,
informationWithheld = NA,
fieldNotes = NA,
eventRemarks = local({
base <- ifelse(
is.na(NEON_plot_visit_metadata$targetTaxaPresent),
ifelse(
is.na(NEON_plot_visit_metadata$remarks),
"No survey conducted",
paste(NEON_plot_visit_metadata$remarks, "No survey conducted", sep = "; ")
),
NEON_plot_visit_metadata$remarks)
miss_id <- !(tick.occurrence$tck_fielddata$sampleID %in%
tick.occurrence$tck_taxonomyProcessed$sampleID)
ifelse(
miss_id,
ifelse(
is.na(base),
"identifiedBy is NA because there are no occurrence records for this event",
paste(
base,
"identifiedBy is NA because there are no occurrence records for this event",
sep = "; "
)
),
base
)
}),
identifiedBy = tick.occurrence$tck_fielddata %>%
left_join(
tick.occurrence$tck_taxonomyProcessed %>%
group_by(sampleID) %>%
summarise(
identifiedBy = paste(sort(unique(identifiedBy)), collapse = " | "),
.groups = "drop"
),
by = "sampleID"
) %>%
pull(identifiedBy)
)
Mapping Humboldt extension fields for the plot visit event level:
added_plot_visit_data_humboldt <- data.frame(eventID = ifelse(is.na(NEON_plot_visit_metadata$sampleID),
paste0(
NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")),
NEON_plot_visit_metadata$sampleID
),
siteNestingDescription = "One visit to a 40x40 m plot designated for tick sampling",
siteCount = 1,
verbatimSiteNames = NEON_plot_visit_metadata$plotID,
verbatimSiteDescriptions = NA,
reportedWeather = NA,
reportedExtremeConditions = ifelse(
is.na(NEON_plot_visit_metadata$samplingImpractical) | NEON_plot_visit_metadata$samplingImpractical == "OK", NA,
paste0(NEON_plot_visit_metadata$samplingImpractical, ", sampling impractical")
),
totalAreaSampledValue = NA,
totalAreaSampledUnit = NA,
geospatialScopeAreaValue = NA,
geospatialScopeAreaUnit = NA,
isVegetationCoverReported = "false",
eventDurationValue = NA,
eventDurationUnit = NA,
inventoryTypes = NA,
compilationTypes = NA,
compilationSourceTypes = NA,
protocolNames = NEON_plot_visit_metadata$samplingMethod,
protocolDescriptions = "Ticks are sampled at six plots per site every three to six weeks, depending on prior detections. Sampling follows the 160 m perimeter of each 40 x 40 m plot, using drag cloths (or flagging in dense vegetation). Ticks from all life stages are collected every few meters and preserved in ethanol, with bouts scheduled between green-up and dormancy. Samples are sent to external labs where they are sorted, counted and a subset of identified nymphs are tested for pathogens.",
protocolReferences = paste(NEON_plot_visit_metadata$samplingProtocolVersion, "Paull, S. 2022. TOS Protocol and Procedure: TCK - Tick and Tick-Borne Pathogen Sampling. NEON.DOC.014045. NEON (National Ecological Observatory Network). | Laboratory of Medical Zoology (LMZ). 2023. NEON Tick Pathogen Testing SOP, V4.01. University of Massachusetts, Amherst. | Beati, L. 2021. Tick Identification Instructions, USNTC Standard Operating Procedure (SOP). V3. Georgia Southern University, US National Tick Collection (USNTC).", sep=" | "),
isAbsenceReported = NA,
absentTaxa = NA,
isAbundanceReported = "true",
isAbundanceCapReported = "false",
abundanceCap = NA,
hasMaterialSamples = ifelse(
!is.na(NEON_plot_visit_metadata$sampleID) &
NEON_plot_visit_metadata$sampleID %in% NEON_tick_occurrence_data$sampleID,
"true",
"false"
),
materialSampleTypes = "wholeOrganism",
hasVouchers = ifelse(
!is.na(NEON_plot_visit_metadata$sampleID) &
NEON_plot_visit_metadata$sampleID %in% NEON_tick_occurrence_data$sampleID,
"true",
"false"
),
voucherInstitutions = ifelse(
is.na(NEON_plot_visit_metadata$identificationProtocolVersion),
NA,
ifelse(
grepl("^GSU", NEON_plot_visit_metadata$identificationProtocolVersion),
"US National Tick Collection, Georgia Southern University",
ifelse(
grepl("^NEON", NEON_plot_visit_metadata$identificationProtocolVersion),
"NEON Biorepository, Arizona State University",
NA
)
)
),
isLeastSpecificTargetCategoryQuantityInclusive = "false",
verbatimTargetScope = "ticks | selected tick pathogens",
targetTaxonomicScope = "Ixodidae | Borrelia burgdorferi | Borrelia miyamotoi | Anaplasma phagocytophilum | Rickettsia rickettsii | Ehrlichia chaffeensis | Borrelia lonestari | Babesia microti | Ehrlichia ewingii | Borrelia burgdorferi sensu lato | Ehrlichia muris-like | Borrelia mayonii | Francisella tularensis | Borrelia sp. | Rickettsia parkeri | Rickettsia philipii",
excludedTaxonomicScope = NA,
isTaxonomicScopeFullyReported = "true",
taxonCompletenessReported = "notReported",
taxonCompletenessProtocols = "Tick drag/flag methods are standardized to maximize detection of all life stages of Ixodidae during the active season.",
hasNonTargetTaxa = "false",
areNonTargetTaxaFullyReported = NA,
nonTargetTaxa = NA,
targetLifeStageScope = "adult | nymph | larvae",
excludedLifeStageScope = "egg",
isLifeStageScopeFullyReported = "true",
targetDegreeOfEstablishmentScope = NA,
excludedDegreeOfEstablishmentScope = NA,
isDegreeOfEstablishmentScopeFullyReported = NA,
targetGrowthFormScope = NA,
excludedGrowthFormScope = NA,
isGrowthFormScopeFullyReported = NA,
hasNonTargetOrganisms = NA,
targetHabitatScope = NA,
excludedHabitatScope = NA,
samplingEffort = "160 m perimeter sampled per bout",
isSamplingEffortReported = "true",
samplingEffortProtocol = "Ticks are sampled at six plots per site, with bouts every 3 or 6 weeks depending on intensity. Each bout consists of dragging (or flagging if needed) along the full 160 m perimeter of each 40 x 40 m plot. Ticks of all life stages are collected at intervals and preserved in ethanol.",
samplingEffortValue = "6, 160",
samplingEffortUnit = "plots per bout, meters per plot circuit",
samplingPerformedBy = NEON_plot_visit_metadata$measuredBy
)
event_data_core <- bind_rows(
NEON_project_event_core,
added_domain_data_core,
added_site_data_core,
added_plot_data_core,
added_plot_visit_data_core
)
event_data_humboldt <- bind_rows(
NEON_project_event_humboldt,
added_domain_data_humboldt,
added_site_data_humboldt,
added_plot_data_humboldt,
added_plot_visit_data_humboldt
)
rownames(event_data_core) <- NULL
event_data_core <- event_data_core %>% select(where(~ !all(is.na(.)))) #Remove columns that are entirely NA
rownames(event_data_humboldt) <- NULL
event_data_humboldt <- event_data_humboldt %>% select(where(~ !all(is.na(.)))) #Remove columns that are entirely NA
Occurrence data represent the biological entities observed or derived within events. In this example, two kinds of occurrences are included:
To preserve the link between the tested specimens and the pathogens
detected (or not detected), pathogen occurrences are connected back to
the corresponding tick occurrences using the Darwin Core Resource
Relationship extension. All occurrence records are associated with their
sampling context through the eventID field.
Tick occurrence records correspond to individual specimens collected in the field. Attributes such as scientific name, life stage, sex, preparation, and voucher institution are included where available. These records are treated as preserved specimens and are linked to vouchers curated in external collections. All ticks collected prior to 2024 were identified and archived at the U.S. National Tick Collection at Georgia Southern University and tested at the Laboratory of Medical Zoology (LMZ) at the University of Massachusetts Amherst. As of 2026, ticks are identified and tested at the Laboratory of Medical Zoology (LMZ) at the University of Massachusetts Amherst and archived at the NEON Biorepository at Arizona State University.
NEON_tick_occurrence_data <- tick.occurrence$tck_taxonomyProcessed
NEON_tick_catalognumbers <- catalognumbers.ticks
added_tick_occurrence_data <- data.frame(eventID = NEON_tick_occurrence_data$sampleID,
occurrenceID = NEON_tick_occurrence_data$subsampleID,
basisOfRecord = "PreservedSpecimen",
catalogNumber = NEON_tick_catalognumbers[NEON_tick_occurrence_data$subsampleID],
scientificName = NEON_tick_occurrence_data$scientificName,
scientificNameAuthorship = NEON_tick_occurrence_data$scientificNameAuthorship,
taxonRank = NEON_tick_occurrence_data$taxonRank,
kingdom = "Animalia",
family = NEON_tick_occurrence_data$family,
subfamily = NEON_tick_occurrence_data$subfamily,
tribe = NEON_tick_occurrence_data$tribe,
subtribe = NEON_tick_occurrence_data$subtribe,
genus = NEON_tick_occurrence_data$genus,
subgenus = NEON_tick_occurrence_data$subgenus,
specificEpithet = NEON_tick_occurrence_data$specificEpithet,
infraspecificEpithet = NEON_tick_occurrence_data$infraspecificEpithet,
identificationQualifier = NEON_tick_occurrence_data$identificationQualifier,
identificationReferences = NEON_tick_occurrence_data$identificationReferences,
identificationRemarks = NEON_tick_occurrence_data$identificationProtocolVersion,
occurrenceStatus = "present",
organismQuantity = NA, #individualCount is used instead
organismQuantityType = NA, #individualCount is used instead
individualCount = NEON_tick_occurrence_data$individualCount,
recordedByID = NA, #This information is in the plot visit event
vernacularName = c("Dermacentor variabilis" = "American dog tick",
"Ixodes scapularis" = "Blacklegged tick / Deer tick",
"Haemaphysalis leporispalustris" = "Rabbit tick",
"Amblyomma americanum" = "Lone star tick",
"Haemaphysalis longicornis" = "Asian longhorned tick",
"Ixodes muris" = "Mouse tick",
"Ixodes dentatus" = "Rabbit tick",
"Amblyomma maculatum" = "Gulf Coast tick",
"Ixodes marxi" = "Squirrel tick",
"Ixodes brunneus" = "Bird tick",
"Dermacentor andersoni" = "Rocky Mountain wood tick",
"Dermacentor parumapertus" = "Desert wood tick",
"Ixodes pacificus" = "Western blacklegged tick",
"Ixodes angustus" = "Mouse tick",
"Dermacentor occidentalis" = "Pacific Coast tick"
)[NEON_tick_occurrence_data$scientificName],
sex = ifelse(NEON_tick_occurrence_data$sexOrAge %in% c("Male", "Female"),
NEON_tick_occurrence_data$sexOrAge,
NA),
lifeStage = ifelse(NEON_tick_occurrence_data$sexOrAge %in% c("Male", "Female"),
"Adult",
NEON_tick_occurrence_data$sexOrAge),
establishmentMeans = NA,
degreeOfEstablishment = NA,
pathway = NA,
vitality = "alive",
preparations = NEON_tick_occurrence_data$archiveMedium,
institutionID = NEON_tick_occurrence_data$archiveFacilityID,
identifiedBy = paste(NEON_tick_occurrence_data$identifiedBy, NEON_tick_occurrence_data$laboratoryName, sep=", "),
occurrenceRemarks = NEON_tick_occurrence_data$remarks
)
A subset of collected ticks are subsequently tested for pathogens.
The results of these tests are represented as separate occurrence
records with the basisOfRecord LaboratoryObservation. Each
record includes details of the test method, result, and pathogen name,
and is linked back to the originating tick specimen through the Darwin
Core Resource Relationship extension.
NEON_pathogen_occurrence_data <- tick.pathogen$tck_pathogen %>%
filter(
!is.na(testResult) &
!is.na(testPathogenName) &
testPathogenName != "HardTick DNA Quality" & #Remove DNA Quality test
!grepl("^(Ixodes|Dermacentor|Amblyomma|Haemaphysalis)", testPathogenName) #Remove molecular identification tests
)
NEON_pathogens_catalognumbers <- catalognumbers.pathogens
added_pathogen_occurrence_data <- data.frame(eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID, "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid,
basisOfRecord = "MaterialSample",
catalogNumber = NEON_pathogens_catalognumbers[NEON_pathogen_occurrence_data$testingID],
scientificName = NEON_pathogen_occurrence_data$testPathogenName,
taxonRank = NA,
kingdom = c("Borrelia burgdorferi" = "Bacteria",
"Borrelia miyamotoi" = "Bacteria",
"Anaplasma phagocytophilum" = "Bacteria",
"Rickettsia rickettsii" = "Bacteria",
"Ehrlichia chaffeensis" = "Bacteria",
"Borrelia lonestari" = "Bacteria",
"Babesia microti" = "Protista",
"Ehrlichia ewingii" = "Bacteria",
"Borrelia burgdorferi sensu lato" = "Bacteria",
"Ehrlichia muris-like" = "Bacteria",
"Borrelia mayonii" = "Bacteria",
"Francisella tularensis" = "Bacteria",
"Borrelia sp." = "Bacteria",
"Rickettsia parkeri" = "Bacteria",
"Rickettsia philipii" = "Bacteria"
)[NEON_pathogen_occurrence_data$testPathogenName],
occurrenceStatus = ifelse(NEON_pathogen_occurrence_data$testResult == "Negative", "absent",
ifelse(NEON_pathogen_occurrence_data$testResult == "Positive", "present", NA)
),
organismQuantity = NA,
organismQuantityType = NA,
individualCount = NA,
recordedByID = NA, #This information is in the plot visit event
vernacularName = c("Borrelia burgdorferi" = "Lyme disease agent",
"Borrelia miyamotoi" = "Borrelia miyamotoi disease (BMD) agent",
"Anaplasma phagocytophilum" = "Human granulocytic anaplasmosis (HGA) agent",
"Rickettsia rickettsii" = "Rocky Mountain spotted fever agent",
"Ehrlichia chaffeensis" = "Human monocytic ehrlichiosis agent",
"Borrelia lonestari" = "Southern tick-associated rash illness (STARI) agent",
"Babesia microti" = "Babesiosis parasite",
"Ehrlichia ewingii" = "Human ehrlichiosis agent",
"Borrelia burgdorferi sensu lato" = "Lyme disease agent",
"Ehrlichia muris-like" = "Human ehrlichiosis agent",
"Borrelia mayonii" = "Lyme disease agent",
"Francisella tularensis" = "Tularemia agent",
"Borrelia sp." = "Lyme disease or relapsing fever agent",
"Rickettsia parkeri" = "Rickettsia parkeri rickettsiosis agent",
"Rickettsia philipii" = "Pacific coast tick fever agent"
)[NEON_pathogen_occurrence_data$testPathogenName],
sex = NA,
lifeStage = NA,
establishmentMeans = NA,
degreeOfEstablishment = NA,
pathway = NA,
vitality = NA,
identificationRemarks = paste(NEON_pathogen_occurrence_data$testProtocolVersion,
ifelse(is.na(NEON_pathogen_occurrence_data$remarks), "",
paste("; ", NEON_pathogen_occurrence_data$remarks))
),
identifiedBy = paste(NEON_pathogen_occurrence_data$testedBy, NEON_pathogen_occurrence_data$laboratoryName, sep=", "),
dateIdentified = NEON_pathogen_occurrence_data$testedDate,
materialSampleID = NEON_pathogen_occurrence_data$testingID,
materialEntityRemarks = "materialSampleID corresponds to the NEON testingID, which uniquely identifies the group of specimens for testing. The remaining genomic extracts are archived with corresponding catalogNumber."
)
Resource relationships describe links between different occurrences.
In this dataset, pathogen detections are linked to the ticks from which
they were derived using
relationshipOfResource = "pathogen of".
resource_relationship_data <- data.frame(resourceID = NEON_pathogen_occurrence_data$uid, #occurrenceID (pathogen)
relatedResourceID = NEON_pathogen_occurrence_data$subsampleID, #occurrenceID (tick)
relationshipOfResource = "pathogen of"
)
Certain details do not fit into the core event or occurrence tables. These are represented in the ExtendedMeasurementOrFact (eMoF) table. Examples include plot type, tick counts by life stage, sample barcodes, DNA identifiers, and sample conditions.
emof_plot_data <- data.frame(eventID = NEON_plot_metadata$plotID,
occurrenceID = NA, # This is just for readability of the final table
measurementType = "Plot Type",
measurementValue = NEON_plot_metadata$plotType
)
emof_counts_adult_data <- data.frame( #only create a row if adultCount is not NA
eventID = ifelse(
is.na(NEON_plot_visit_metadata$sampleID), #eventID is the sampleID if a sample is taken, if not, we create one with the same format (plotID.date)
paste0(NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")
),
NEON_plot_visit_metadata$sampleID
)[!is.na(NEON_plot_visit_metadata$adultCount)],
occurrenceID = NA, # This is just for readability of the final table
measurementType = "Adult Count",
measurementValue = as.character(NEON_plot_visit_metadata$adultCount[!is.na(NEON_plot_visit_metadata$adultCount)]),
measurementRemarks = "Excludes records where adultCount is NA"
)
emof_counts_nymph_data <- data.frame( #only create a row if nymphCount is not NA
eventID = ifelse(
is.na(NEON_plot_visit_metadata$sampleID), #eventID is the sampleID if a sample is taken, if not, we create one with the same format (plotID.date)
paste0(NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")
),
NEON_plot_visit_metadata$sampleID
)[!is.na(NEON_plot_visit_metadata$nymphCount)],
occurrenceID = NA, # This is just for readability of the final table
measurementType = "Nymph Count",
measurementValue = as.character(NEON_plot_visit_metadata$nymphCount[!is.na(NEON_plot_visit_metadata$nymphCount)]),
measurementRemarks = "Excludes records where nymphCount is NA"
)
emof_counts_larva_data <- data.frame( #only create a row if larvaCount is not NA
eventID = ifelse(
is.na(NEON_plot_visit_metadata$sampleID), #eventID is the sampleID if a sample is taken, if not, we create one with the same format (plotID.date)
paste0(NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")
),
NEON_plot_visit_metadata$sampleID
)[!is.na(NEON_plot_visit_metadata$larvaCount)],
occurrenceID = NA, # This is just for readability of the final table
measurementType = "Larva Count",
measurementValue = as.character(NEON_plot_visit_metadata$larvaCount[!is.na(NEON_plot_visit_metadata$larvaCount)]),
measurementRemarks = "Excludes records where larvaCount is NA"
)
emof_sample_code_data <- data.frame( #only create a row if sampleCount is not NA
eventID = ifelse(
is.na(NEON_plot_visit_metadata$sampleID), #eventID is the sampleID if a sample is taken, if not, we create one with the same format (plotID.date)
paste0(NEON_plot_visit_metadata$plotID, ".",
format(as.Date(NEON_plot_visit_metadata$collectDate), "%Y%m%d")
),
NEON_plot_visit_metadata$sampleID
)[!is.na(NEON_plot_visit_metadata$sampleCode)],
occurrenceID = NA, # This is just for readability of the final table
measurementType = "Sample Barcode",
measurementValue = NEON_plot_visit_metadata$sampleCode[!is.na(NEON_plot_visit_metadata$sampleCode)]
)
emof_tick_sample_condition_data <- data.frame(eventID = NEON_tick_occurrence_data$sampleID,
occurrenceID = NEON_tick_occurrence_data$subsampleID,
measurementType = "Tick Sample Condition",
measurementValue = NEON_tick_occurrence_data$sampleCondition
)
emof_deprecatedVialID_data <- data.frame(eventID = NEON_tick_occurrence_data$sampleID,
occurrenceID = NEON_tick_occurrence_data$subsampleID,
measurementType = "Deprecated Vial ID",
measurementValue = NEON_tick_occurrence_data$deprecatedVialID
)
emof_pathogen_sample_condition_data <- data.frame(eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID, "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid,
measurementType = "Tick Pathogen Sample Condition",
measurementValue = NEON_pathogen_occurrence_data$sampleCondition
)
emof_cqValue_data <- data.frame(
eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID[!is.na(NEON_pathogen_occurrence_data$cqValue)], "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid[!is.na(NEON_pathogen_occurrence_data$cqValue)],
measurementType = "CQ (Quantification Cycle or Threshold Number) Value",
measurementValue = as.character(NEON_pathogen_occurrence_data$cqValue[!is.na(NEON_pathogen_occurrence_data$cqValue)])
)
emof_dnaSampleID_data <- data.frame(
eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID[!is.na(NEON_pathogen_occurrence_data$dnaSampleID)], "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid[!is.na(NEON_pathogen_occurrence_data$dnaSampleID)],
measurementType = "Identifier for DNA sample",
measurementValue = NEON_pathogen_occurrence_data$dnaSampleID[!is.na(NEON_pathogen_occurrence_data$dnaSampleID)]
)
emof_dnaSampleCode_data <- data.frame(
eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID[!is.na(NEON_pathogen_occurrence_data$dnaSampleCode)], "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid[!is.na(NEON_pathogen_occurrence_data$dnaSampleCode)],
measurementType = "Barcode of a DNA sample",
measurementValue = NEON_pathogen_occurrence_data$dnaSampleCode[!is.na(NEON_pathogen_occurrence_data$dnaSampleCode)]
)
emof_pathogenIndividualCount_data <- data.frame(
eventID = sapply(strsplit(NEON_pathogen_occurrence_data$subsampleID[!is.na(NEON_pathogen_occurrence_data$individualCount)], "\\."),
function(x) paste(x[1:2], collapse = ".")),
occurrenceID = NEON_pathogen_occurrence_data$uid[!is.na(NEON_pathogen_occurrence_data$individualCount)],
measurementType = "Number of Ticks Used for Testing",
measurementValue = as.character(NEON_pathogen_occurrence_data$individualCount[!is.na(NEON_pathogen_occurrence_data$individualCount)])
)
emof_data <- bind_rows(emof_plot_data,
emof_counts_adult_data,
emof_counts_nymph_data,
emof_counts_larva_data,
emof_sample_code_data,
emof_tick_sample_condition_data,
emof_deprecatedVialID_data,
emof_pathogen_sample_condition_data,
emof_cqValue_data,
emof_dnaSampleID_data,
emof_dnaSampleCode_data,
emof_pathogenIndividualCount_data
)
rownames(emof_data) <- NULL
emof_data <- emof_data %>% select(where(~ !all(is.na(.)))) #Remove columns that are entirely NA
The full processed event and occurrence tables generated through this workflow are available in the project repository. NEONTickstoHumboldt/outputs/full.
write.csv(event_data_core, "outputs/full/event.csv", row.names = FALSE)
write.csv(event_data_humboldt, "outputs/full/humboldtecologicalinventory.csv", row.names = FALSE)
write.csv(occurrence_data, "outputs/full/occurrence.csv", row.names = FALSE)
write.csv(resource_relationship_data, "outputs/full/resourceRelationship.csv", row.names = FALSE)
write.csv(emof_data, "outputs/full/extendedMeasurementOrFact.csv", row.names = FALSE)
A subset of the data from plot SCBI_014 is also available. NEONTickstoHumboldt/outputs/subset
event_data_humboldt_subset <- event_data_humboldt %>%
filter(
grepl("SCBI_014", verbatimSiteNames)
)
event_data_core_subset <- event_data_core %>%
filter(eventID %in% event_data_humboldt_subset$eventID)
occurrence_data_subset <- occurrence_data %>%
filter(
grepl("SCBI_014", eventID)
)
resource_relationship_data_subset <- resource_relationship_data %>%
filter(
grepl("SCBI_014", relatedResourceID)
)
emof_data_subset <- emof_data %>%
filter(
grepl("SCBI_014", eventID)
)
write.csv(event_data_core_subset, "outputs/subset/event.csv", row.names = FALSE)
write.csv(event_data_humboldt_subset, "outputs/subset/humboldtecologicalinventory.csv", row.names = FALSE)
write.csv(occurrence_data_subset, "outputs/subset/occurrence.csv", row.names = FALSE)
write.csv(resource_relationship_data_subset, "outputs/subset/resourceRelationship.csv", row.names = FALSE)
write.csv(emof_data_subset, "outputs/subset/extendedMeasurementOrFact.csv", row.names = FALSE)
We also save the full dataset as a DwC-A for publishing.
Write data files:
write.csv(event_data_core, "outputs/zipped/event.csv", row.names = FALSE)
write.csv(event_data_humboldt, "outputs/zipped/humboldtecologicalinventory.csv", row.names = FALSE)
write.csv(occurrence_data, "outputs/zipped/occurrence.csv", row.names = FALSE)
write.csv(resource_relationship_data, "outputs/zipped/resourceRelationship.csv", row.names = FALSE)
write.csv(emof_data, "outputs/zipped/extendedMeasurementOrFact.csv", row.names = FALSE)
Create meta.xml files (the file describing the structure of the data):
# function to create fields
make_fields_xml <- function(df, prefix) {
paste0(
' <field index="', seq_along(colnames(df))[-1] - 1,
'" term="', paste0(prefix, colnames(df)[-1]), '"/>',
collapse = "\n"
)
}
# Event Core
core_xml <- paste0(
'<core encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\\n"
fieldsEnclosedBy=""" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/dwc/terms/Event">
<files>
<location>event.csv</location>
</files>
<id index="0"/>\n',
make_fields_xml(event_data_core, "http://rs.tdwg.org/dwc/terms/"),
'\n</core>\n'
)
# Humboldt Extension
humboldt_xml <- paste0(
'<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\\n"
fieldsEnclosedBy=""" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/eco/terms/Event">
<files>
<location>humboldtecologicalinventory.csv</location>
</files>
<coreid index="0"/>\n',
make_fields_xml(event_data_humboldt, "http://rs.tdwg.org/eco/terms/"),
'\n</extension>\n'
)
# Occurrence Extension
occ_xml <- paste0(
'<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\\n"
fieldsEnclosedBy=""" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
<files>
<location>occurrence.csv</location>
</files>
<coreid index="0"/>\n',
make_fields_xml(occurrence_data, "http://rs.tdwg.org/dwc/terms/"),
'\n</extension>\n'
)
# ResourceRelationship Extension
rr_xml <- paste0(
'<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\\n"
fieldsEnclosedBy=""" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/terms/1.0/ResourceRelationship">
<files>
<location>resourceRelationship.csv</location>
</files>
<coreid index="0"/>\n',
make_fields_xml(resource_relationship_data, "http://rs.tdwg.org/dwc/terms/"),
'\n</extension>\n'
)
# extendedMeasurementOrFact Extension
emof_xml <- paste0(
'<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\\n"
fieldsEnclosedBy=""" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/dwc/terms/MeasurementOrFact">
<files>
<location>extendedMeasurementOrFact.csv</location>
</files>
<coreid index="0"/>\n',
make_fields_xml(emof_data, "http://rs.tdwg.org/dwc/terms/"),
'\n</extension>\n'
)
# Put everything together
meta_xml <- paste0(
'<?xml version="1.0" encoding="UTF-8"?>
<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
',
core_xml,
humboldt_xml,
occ_xml,
rr_xml,
emof_xml,
'</archive>'
)
writeLines(meta_xml, "outputs/zipped/meta.xml")
Create eml.xml files (dataset metadata file):
library(uuid)
package_id <- paste0(UUIDgenerate())
alternate_id_block <- paste0(
' <alternateIdentifier>10.15468/b52b9z</alternateIdentifier>\n',
' <alternateIdentifier>12315bb8-8ab3-446a-b5a4-2be93aade242</alternateIdentifier>\n'
)
title <- "NEON ticks sampled using drag cloths and tick pathogen status"
pub_date <- Sys.Date()
language <- "eng"
creator_block <- paste0(
' <creator id="5b77bf2d-3bb2-4cb3-9f9d-50c20357f4cb">
<organizationName>NEON Biorepository Portal</organizationName>
<electronicMailAddress>biorepo@asu.edu</electronicMailAddress>
</creator>\n'
)
metadata_provider_block <- paste0(
' <metadataProvider>
<organizationName>NEON Biorepository Portal</organizationName>
<electronicMailAddress>biorepo@asu.edu</electronicMailAddress>
</metadataProvider>\n'
)
abstract_text <- paste(
"This dataset contains tick survey data collected by the U.S. National Ecological Observatory Network (NEON) across terrestrial sites in the United States as part of a long-term ecological monitoring program.",
"Ticks are sampled at standardized plots every three to six weeks during the active season using drag and flag methods along the 160 m perimeter of 40 × 40 m plots. Specimens from all life stages are collected, preserved in ethanol, and identified to species, life stage, and sex where possible.",
"A subset of ticks is tested for pathogens of public health and veterinary importance, including Borrelia, Anaplasma, Babesia, Ehrlichia, and Rickettsia species. Pathogen test results are represented as linked records derived from individual tick specimens.",
"The dataset is structured as a hierarchical survey using the Darwin Core Event Core and the Humboldt Extension for Ecological Inventories, with events defined at project, domain, site, plot, and sampling visit levels. Occurrence records represent both collected tick specimens and laboratory-derived pathogen observations, and are linked through ResourceRelationship records.",
"For additional details, see https://data.neonscience.org/data-products/DP1.10093.001 and https://data.neonscience.org/data-products/DP1.10092.001."
)
abstract_block <- paste0(
' <abstract>
<para>', abstract_text, '</para>
</abstract>\n'
)
contact_block <- paste0(
' <contact>
<individualName>
<givenName>Chandra</givenName>
<surName>Earl</surName>
</individualName>
<organizationName>National Ecological Observatory Network Biorepository</organizationName>
<positionName>Biodiversity Informatician</positionName>
<address>
<deliveryPoint>734 W. Alameda Drive Suite 158, DC 4108</deliveryPoint>
<city>Tempe</city>
<administrativeArea>AZ</administrativeArea>
<postalCode>85282</postalCode>
<country>US</country>
</address>
<electronicMailAddress>biorepo@asu.edu</electronicMailAddress>
</contact>\n'
)
methods_block <- paste0(
' <methods>
<methodStep>
<description>
<para>
Data were compiled from the National Ecological Observatory Network (NEON) data portal and associated biorepository resources. Source data included field sampling records, taxonomic identifications, and pathogen testing results. Additional domain- and site-level metadata were obtained from NEON geospatial products and the NEON API, including domain boundary shapefiles and site metadata downloads.
</para>
</description>
</methodStep>
<methodStep>
<description>
<para>
Event-level data were constructed to represent a hierarchical sampling design, with events defined at project, domain, site, plot, and sampling visit levels. Event records were generated by combining NEON metadata with field sampling data and aggregating temporal information to define eventDate ranges at each level.
</para>
</description>
</methodStep>
<sampling>
<studyExtent>
<description>
<para>
This dataset includes observations from NEON terrestrial field sites distributed across multiple ecoclimatic domains in the United States, including Alaska and Puerto Rico. Sampling occurs at fixed plots within each site as part of NEON’s long-term ecological monitoring program.
</para>
</description>
</studyExtent>
<samplingDescription>
<para>
Ticks are collected using standardized drag and/or flag sampling methods along the perimeter of 40 m × 40 m plots. Multiple plots are sampled at each site at regular intervals during the growing season. Samples are preserved in ethanol and processed by external laboratories, where ticks are enumerated and identified to species, life stage, and sex where possible. A subset of specimens is tested for pathogens.
</para>
</samplingDescription>
</sampling>
<methodStep>
<description>
<para>
The dataset is structured as a hierarchical survey using the Darwin Core Event Core, with events defined at project, domain, site, plot, and sampling visit levels. Event-level data were split into two components: a Darwin Core Event Core table and a Humboldt Extension table describing ecological inventory context, including sampling effort, taxonomic scope, and protocol details.
</para>
</description>
</methodStep>
<methodStep>
<description>
<para>
Occurrence records include both physical tick specimens and derived pathogen observations. Pathogen detections are represented as separate occurrence records and are linked to their source specimens using the ResourceRelationship extension. Additional contextual data, including plot characteristics, tick counts by life stage, sample identifiers, and laboratory metadata, were mapped to the Extended Measurement or Fact (eMoF) extension. All data were formatted according to Darwin Core standards for publication as a Darwin Core Archive.
</para>
</description>
</methodStep>
<qualityControl>
<description>
<para>
Quality control procedures were implemented during data collection, ingestion, and processing within NEON systems. Data values were validated through controlled vocabularies and standardized formats, and additional transformations were applied during dataset preparation to ensure consistency, completeness, and correct linkage between events, occurrences, and derived records.
</para>
</description>
</qualityControl>
</methods>\n'
)
rights_block <- paste0(
' <intellectualRights>
<para>
To the extent possible under law, the publisher has waived all rights to these data
and has dedicated them to the
<ulink url="https://creativecommons.org/publicdomain/zero/1.0/">
<citetitle>Public Domain (CC0 1.0)</citetitle>
</ulink>
</para>
</intellectualRights>\n'
)
maintenance_block <- paste0(
' <maintenance>
<description>
<para>NEON datasets are updated annually upon the yearly release</para>
</description>
<maintenanceUpdateFrequency>annually</maintenanceUpdateFrequency>
</maintenance>\n'
)
end_date <- max(as.Date(event_data_core$eventDate), na.rm = TRUE)
coverage_block <- paste0(
' <coverage>
<geographicCoverage>
<geographicDescription>
Data are collected at 46 terrestrial NEON sites across the United States including Alaska and Puerto Rico, but excluding Hawaii.
</geographicDescription>
<boundingCoordinates>
<westBoundingCoordinate>-168.75</westBoundingCoordinate>
<eastBoundingCoordinate>-65.391</eastBoundingCoordinate>
<northBoundingCoordinate>71.301</northBoundingCoordinate>
<southBoundingCoordinate>15.623</southBoundingCoordinate>
</boundingCoordinates>
</geographicCoverage>
<temporalCoverage>
<rangeOfDates>
<beginDate>
<calendarDate>2014-04-02</calendarDate>
</beginDate>
<endDate>
<calendarDate>', end_date, '</calendarDate>
</endDate>
</rangeOfDates>
</temporalCoverage>
<taxonomicCoverage>
<generalTaxonomicCoverage>
This dataset covers hard ticks collected using the drag/flag method including: Ixodes pacificus, Amblyomma americanum, Amblyomma maculatum, Dermacentor andersoni, Dermacentor occidentalis, Dermacentor variabilis, Haemaphysalis leporispalustris, Haemaphysalis longicornis, Ixodes affinis, Ixodes angustus, Ixodes dentatus, Ixodes marxi, Ixodes muris, and Ixodes scapularis. This dataset also covers testing results for several tick-borne pathogens including: Anaplasma phagocytophilum, Babesia microti, Borrelia burgdorferi sensu lato, Borrelia lonestari, Borrelia mayonii, Borrelia myamotoi, Ehrlichia chaffeensis, Ehrlichia ewingii, Ehrlichia muris-like, Francisella tularensis, and Rickettsia rickettsii.
</generalTaxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Family</taxonRankName>
<taxonRankValue>Ixodidae</taxonRankValue>
<commonName>hard ticks</commonName>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Ixodes sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Dermacentor sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Haemaphysalis sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Amblyomma sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Borrelia sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Anaplasma phagocytophilum</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Genus</taxonRankName>
<taxonRankValue>Ehrlichia sp.</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Babesia microti</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Francisella tularensis</taxonRankValue>
</taxonomicClassification>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Rickettsia rickettsii</taxonRankValue>
</taxonomicClassification>
</taxonomicCoverage>
</coverage>\n'
)
project_block <- paste0(
" <project>
<title>National Ecological Observatory Network</title>
<personnel>
<individualName>
<givenName>Kate</givenName>
<surName>Thibault</surName>
</individualName>
<userId directory=\"http://orcid.org/\">0000-0003-3477-6424</userId>
<role>ADMINISTRATIVE_POINT_OF_CONTACT</role>
</personnel>
<abstract>
<para>
The National Science Foundation's National Ecological Observatory Network (NEON) is a continental-scale ecological observation facility, fully funded by NSF and operated by Battelle. NEON collects and provides open data from 81 field sites across the United States that characterize and quantify how our nation's ecosystems are changing. Each year, NEON collects and archives over 100,000 specimens and samples that complement the field observations and automated measurements collected at field sites. These samples represent a rich resource unique among natural history collections due to NEON's utility for continental- and decadal-scale ecology. NEON's archival samples are available upon request to support research studies and analyses. The comprehensive data, spatial extent and remote sensing technology provided by the NEON project will contribute to a better understanding and more accurate forecasting of how human activities impact ecology and how our society can more effectively address critical ecological questions and issues.
</para>
</abstract>
<funding>
<para>National Science Foundation, Award DBI-2217817</para>
</funding>
<studyAreaDescription>
<descriptor name=\"generic\" citableClassificationSystem=\"false\">
<descriptorValue>United States including Alaska, Hawaii and Puerto Rico</descriptorValue>
</descriptor>
</studyAreaDescription>
</project>\n"
)
additional_description_block <- paste0(
' <purpose>
<para>
This dataset was developed alongside the GBIF Guide for publishing biological survey and monitoring data to GBIF
(<ulink url="https://docs.gbif.org/guide-publishing-survey-data/en/"><citetitle>GBIF survey data guide</citetitle></ulink>)
to illustrate how ecological monitoring data from the National Ecological Observatory Network (NEON) can be structured for publication using the Darwin Core standard and the Humboldt Extension for Ecological Inventories
(<ulink url="https://eco.tdwg.org/"><citetitle>Humboldt Extension</citetitle></ulink>).
The workflow demonstrates how complex survey data can be organized into a Darwin Core Archive that supports reuse, integration, and ecological analysis. Code used to transform the original NEON data into this Darwin Core Archive, including all data processing and mapping steps, is available at
<ulink url="https://sunray1.github.io/NEONTickstoHumboldt/"><citetitle>NEON Ticks to Humboldt workflow</citetitle></ulink>.
</para>
</purpose>
<introduction>
<para>
The workflow produces five Darwin Core tables: Event, HumboldtEcologicalInventory, Occurrence, ResourceRelationship, and ExtendedMeasurementOrFact. The dataset is structured hierarchically, with events defined at multiple levels (project, domain, site, plot, and plot visit). Occurrences represent collected ticks as well as subsequent laboratory results derived from them. Contextual information on sampling design, effort, and scope is expressed through Humboldt Extension terms.
</para>
<para>
The U.S. National Science Foundation’s National Ecological Observatory Network (NEON) carries out long-term, standardized monitoring of tick populations and tick-borne pathogens across the United States. Sampling is conducted at terrestrial sites, where field staff survey designated plots every three to six weeks during the active tick season using drag and flag methods along the perimeter of 40 x 40 m plots. Ticks of all life stages are collected, preserved in ethanol, and later identified to species, life stage, and sex when possible.
</para>
<para>
A subset of nymphal ticks is tested for pathogens of public health and veterinary concern, including Borrelia burgdorferi, Anaplasma phagocytophilum, Babesia microti, Ehrlichia species, and several Rickettsia species. Additional details on NEON’s tick monitoring program are available from the NEON tick data resource page
(<ulink url="https://www.neonscience.org/data-collection/ticks"><citetitle>NEON tick data resource</citetitle></ulink>)
and associated protocols in the NEON Document Library
(<ulink url="https://data.neonscience.org/documents"><citetitle>NEON Document Library</citetitle></ulink>).
</para>
<para>
NEON biological datasets are organized as hierarchical structures that reflect the standardized design of the survey program. Locality information is captured at multiple spatial scales, including domains, sites, and sampling units such as plots. Sampling context is described using the Darwin Core Event Core and Humboldt Extension, while additional measurements are recorded using the Extended Measurement or Fact (eMoF) extension. Occurrence records are linked to sampling events to represent collected organisms, and relationships among records are expressed using the ResourceRelationship extension.
</para>
</introduction>
<gettingStarted>
<para>
The dataset is provided as a Darwin Core Archive consisting of an Event Core file and associated extensions, including HumboldtEcologicalInventory, Occurrence, ResourceRelationship, MeasurementOrFact, and a Humboldt Extension table describing ecological inventory context. These components are linked through shared identifiers such as eventID and occurrenceID.
</para>
</gettingStarted>\n'
)
dataset_block <- paste0(
' <dataset>\n\n',
alternate_id_block,
' <title xml:lang="eng">', title, '</title>\n\n',
creator_block,
metadata_provider_block,
' <pubDate>', pub_date, '</pubDate>\n',
' <language>', language, '</language>\n\n',
abstract_block,
rights_block,
coverage_block,
additional_description_block,
maintenance_block,
contact_block,
methods_block,
project_block,
' </dataset>\n'
)
date_stamp <- format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
additional_metadata_block <- paste0(
' <additionalMetadata>
<metadata>
<gbif>
<dateStamp>', date_stamp, '</dateStamp>
<hierarchyLevel>dataset</hierarchyLevel>
<physical>
<objectName>Ticks sampled using drag cloths</objectName>
<characterEncoding>UTF-8</characterEncoding>
<dataFormat>
<externallyDefinedFormat>
<formatName>CSV</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<url function="download">https://data.neonscience.org/data-products/DP1.10093.001</url>
</online>
</distribution>
</physical>
<physical>
<objectName>Tick pathogen status</objectName>
<characterEncoding>UTF-8</characterEncoding>
<dataFormat>
<externallyDefinedFormat>
<formatName>CSV</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<url function="download">https://data.neonscience.org/data-products/DP1.10092.001</url>
</online>
</distribution>
</physical>
<physical>
<objectName>Tick Collection (Vouchers)</objectName>
<characterEncoding>UTF-8</characterEncoding>
<dataFormat>
<externallyDefinedFormat>
<formatName>CSV</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<url function="download">https://biorepo.neonscience.org/portal/collections/misc/neoncollprofiles.php?collid=116</url>
</online>
</distribution>
</physical>
<physical>
<objectName>Tick Collection (Pathogen Extract)</objectName>
<characterEncoding>UTF-8</characterEncoding>
<dataFormat>
<externallyDefinedFormat>
<formatName>CSV</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<url function="download">https://biorepo.neonscience.org/portal/collections/misc/neoncollprofiles.php?collid=75</url>
</online>
</distribution>
</physical>
<physical>
<objectName>U.S. National Tick Collection at Georgia Southern University</objectName>
<characterEncoding>UTF-8</characterEncoding>
<dataFormat>
<externallyDefinedFormat>
<formatName>CSV</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<url function="download">https://biorepo.neonscience.org/portal/collections/misc/neoncollprofiles.php?collid=83</url>
</online>
</distribution>
</physical>
<collection>
<collectionIdentifier>https://www.georgiasouthern.edu/research/centers/us-national-tick-collection</collectionIdentifier>
<collectionName>U.S. National Tick Collection at Georgia Southern University</collectionName>
</collection>
<collection>
<collectionIdentifier>https://biorepo.neonscience.org/portal/collections/misc/neoncollprofiles.php?collid=83</collectionIdentifier>
<collectionName>U.S. National Tick Collection at Georgia Southern University</collectionName>
</collection>
<collection>
<collectionIdentifier>https://biorepo.neonscience.org/portal/collections/misc/collprofiles.php?collid=116</collectionIdentifier>
<collectionName>NEON Biorepository Tick Collection (Vouchers)</collectionName>
</collection>
<collection>
<collectionIdentifier>https://biorepo.neonscience.org/portal/collections/misc/collprofiles.php?collid=75</collectionIdentifier>
<collectionName>NEON Biorepository Tick Collection (Pathogen Extracts)</collectionName>
</collection>
</gbif>
</metadata>
</additionalMetadata>\n'
)
header_block <- paste0(
'<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://rs.gbif.org/schema/eml-gbif-profile/1.3/eml.xsd"
packageId="', package_id, '" system="http://gbif.org" scope="system"
xml:lang="eng">\n\n'
)
footer_block <- '</eml:eml>'
eml_xml <- paste0(
header_block,
dataset_block,
additional_metadata_block,
footer_block
)
writeLines(eml_xml, "outputs/zipped/eml.xml")
Zip files to a DwC-A:
output_zip <- "outputs/zipped/DwC-A.zip"
if (file.exists(output_zip)) {
unlink(output_zip)
}
file_paths <- list.files("outputs/zipped", full.names = TRUE)
zip(zipfile = output_zip, files = file_paths, flags = "-j" )
if (file.exists(output_zip)) {
unlink(file_paths)
}