Skip to contents

This returns a parameter list as used by training+validation and related functions. If no arguments are set then the default values are returned. It is primarily for internal use from other functions.

Usage

set_pipeline_params(
  species = character(0),
  gpu_ram = 10,
  res = 150,
  suffix = as.character(Sys.Date()),
  grid_search_type = "new",
  grid_search_list = list(de_ratio = c(2, 4, 6, 8, 10), obs_prop = c(0.95, 0.975, 0.99,
    0.999, 0.9999), dist_pow = seq(from = 0.1, to = 0.9, by = 0.1), dist_weight =
    NA_real_, ent_weight = NA_real_),
  hdf_path = the$hdf_path,
  hdf_dir = NULL,
  base_output_path = the$output_path,
  output_fullname = NULL,
  output_path = NULL,
  season = "prebreeding",
  truncate_season = FALSE,
  model_selection = "distance_metric",
  clip = "default",
  crs = NULL,
  skip_quality_checks = FALSE,
  fit_only = FALSE,
  ebirdst_year = NULL,
  trim_quantile = 0.99,
  min_season_quality = 3,
  force_refit = FALSE
)

Arguments

species

The species to fit, it should be set by the user or the calling function.

gpu_ram

Set the GPU RAM allocation in GB when fitting the models. It is also passed to BirdFlowR::preprocess_species() and if res is set to NULL that function will set the res to the smallest round number that can be fit with the allocated ram.

res

The resolution of model. Specifically the height and width of the model cells in km. Set to NULL to fit the finest resolution possible with the allocated gpu_ram.

suffix

Characters to append to the output directory and model file names.

grid_search_type

Either "old" to specify all the Python hyperparameters (dist_pow, dist_weight, and ent_weight) directly; or"new" to uses the alternative paramerization of dist_pow, obs_prop, and de_ratio which are passed to the refactor_hyperparameters function to generate the Python parameters. In either case there is a hidden python parameter obs_weight which is always set to 1 when called from BirdFlowPipeline. "new" is the current default behavior because the ratio between distance weight and entropy weight heavily determines the behavior of the model. These hyperparameters are highly sensitive and refactoring them in this way makes it easier to intuit about and appears to create better coverage and performance of the "grid" search.

grid_search_list

These parameters are used in the loss function by BirdFlowPy. They are also used in the grid search when running training+validation. All six possible parameters are in the list even though two of them won't be used - which two depends on grid_search_type. They are the set of values that will be combined factorially in the grid search. The default values represent our currently preferred approach.

New and Old below indicate which setting of grid_search_type they are used with.

de_ratio

New The ratio of the distance weight to entropy weight in the loss function that is optimized while fitting the models

obs_prop

New The proportion of the total weight (among observation, distance, and entropy weights), that is assigned to the observation weight in the loss function.

dist_pow

New and Old The power used to transform movement distances prior to weighting them in the loss function

dist_weight

Old The weight assigned to distances in the loss function.

ent_weight

Old The weight assigned to entropy in the loss function.

hdf_path

The path to the base directory in which hdf5 preprocessed and fitted models are stored.

hdf_dir

The path to the directory in which this run's hdf5 preprocessed and fit models are stored. Set by preprocess_species_wrapper() to <hdf_path>/<species>_<res>km/.

base_output_path

The base path for fitted models, model reports, and other output.

output_fullname

Initially NULL this is set by preprocess_species_wrapper to "<species>_<res>km<suffix>".

output_path

Initially NULL this is set by preprocess_species_wrapper to "<base_output_dir>_<output_fullname>".

season

This is used in two ways: (1) to filter movement data (I think just tracks? - ebp) before evaluating the model and (2) if truncate_season is TRUE it is passed to BirdFlowR::preprocess_species(), producing a model that is truncated to just that season.

truncate_season

If TRUE the model will be truncated to season and marginals for transitions outside of the season won't be fit or included.

model_selection

Set how model selection is performed within rank_models() as called from training+validation. It should be one of:

"str_etc"

Straightness and traverse correlation only.

"pit_etc"

PIT metrics and traverse correlation only.

"real_tracking"

Tracking-focused model selection, this includes traverse correlation; PIT scores; and straightness and n_stopovers targeted to observed values from real tracking data.

"real_tracking_no_cal"

Tracking-focused model selection, this includes traverse correlation; and straightness and n_stopovers targeted to observed values from real tracking, but no PIT scores.)

"averaged_parameters"

This is a place holder for when the loss function parameters are fixed - typically to average values - in which case no model selection is performed.

clip

This is passed to BirdFlowR::preprocess_species() to define a clipping polygon to use while preprocessing - only areas within the polygon are included in the model. If the clip is set to "default", clip the clip definined in the will be used which is currently americas_clip.

crs

Passed to BirdFlowR::preprocess_species() to define the coordinate reference system for the model. With the default of NULL the CRS set in the is used which is currently americas_crs.

skip_quality_checks

Passed to BirdFlowR::preprocess_species() if TRUE an error will be thrown if season quality values in ebirdst::ebirdst_runs are below three. If TRUE then attempt to fit the model regardless of quality.

fit_only

Set to TRUE to fit a model without a grid search, model evaluation, model ranking, or model reports.

ebirdst_year

The version year of the ebirdst package. This shouldn't be set by users and any supplied value will be ignored.

trim_quantile

Passed to BirdFlowR::preprocess_species().

min_season_quality

The minimum score of any season. If the score of any season of the eBird Status and Trends product does not reach the score, not fitting the model.

force_refit

Force refitting the models. Default to FALSE.

Value

A parameter list to be used for batch_flow() and related functions

Details

It's possible to set res to NULL in which case the realized resolution won't be known until after preprocess_species_wrapper() is called. Therefore in a standard run (regardless of whether res is set), parameters that depend on res are set by preprocess_species_wrapper(). These are hdf_dir, output_path, and output_fullname.

Examples

set_pipeline_params()  # default values (new grid search)
#> $species
#> character(0)
#> 
#> $gpu_ram
#> [1] 10
#> 
#> $res
#> [1] 150
#> 
#> $suffix
#> [1] "2026-04-23"
#> 
#> $grid_search_type
#> [1] "new"
#> 
#> $grid_search_list
#> $grid_search_list$de_ratio
#> [1]  2  4  6  8 10
#> 
#> $grid_search_list$obs_prop
#> [1] 0.9500 0.9750 0.9900 0.9990 0.9999
#> 
#> $grid_search_list$dist_pow
#> [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
#> 
#> $grid_search_list$dist_weight
#> [1] NA
#> 
#> $grid_search_list$ent_weight
#> [1] NA
#> 
#> 
#> $hdf_path
#> [1] "/work/pi_drsheldon_umass_edu/birdflow_modeling/pipeline/hdf"
#> 
#> $hdf_dir
#> NULL
#> 
#> $base_output_path
#> [1] "/work/pi_drsheldon_umass_edu/birdflow_modeling/pipeline/output"
#> 
#> $output_fullname
#> NULL
#> 
#> $output_path
#> NULL
#> 
#> $season
#> [1] "prebreeding"
#> 
#> $truncate_season
#> [1] FALSE
#> 
#> $model_selection
#> [1] "distance_metric"
#> 
#> $clip
#> Geometry set for 1 feature 
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 491379.6 ymin: 596053.8 xmax: 12447280 ymax: 16025860
#> Projected CRS: Custom_Lambert_Azimuthal
#> MULTIPOLYGON (((505777.3 14848232, 506295.8 148...
#> 
#> $crs
#> [1] "PROJCRS[\"Custom_Lambert_Azimuthal\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]],\n            ID[\"EPSG\",6326]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"Degree\",0.0174532925199433]]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Lambert Azimuthal Equal Area\",\n            ID[\"EPSG\",9820]],\n        PARAMETER[\"Latitude of natural origin\",30,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",-95,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"False easting\",5500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",9500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"
#> 
#> $skip_quality_checks
#> [1] FALSE
#> 
#> $fit_only
#> [1] FALSE
#> 
#> $ebirdst_year
#> [1] 2023
#> 
#> $trim_quantile
#> [1] 0.99
#> 
#> $min_season_quality
#> [1] 3
#> 
#> $force_refit
#> [1] FALSE
#> 

# Old grid search
set_pipeline_params(grid_search_list = list(
  de_ratio = NA,
  obs_prop = NA,
  dist_weight = seq(from = 0.0008, to = 0.0018, length.out = 5),
  ent_weight = seq(from = 0.00015, to = 0.0004, length.out = 5),
  dist_pow = seq(from = 0.1, to = .9, length.out = 5)))
#> $species
#> character(0)
#> 
#> $gpu_ram
#> [1] 10
#> 
#> $res
#> [1] 150
#> 
#> $suffix
#> [1] "2026-04-23"
#> 
#> $grid_search_type
#> [1] "new"
#> 
#> $grid_search_list
#> $grid_search_list$de_ratio
#> [1] NA
#> 
#> $grid_search_list$obs_prop
#> [1] NA
#> 
#> $grid_search_list$dist_weight
#> [1] 0.00080 0.00105 0.00130 0.00155 0.00180
#> 
#> $grid_search_list$ent_weight
#> [1] 0.0001500 0.0002125 0.0002750 0.0003375 0.0004000
#> 
#> $grid_search_list$dist_pow
#> [1] 0.1 0.3 0.5 0.7 0.9
#> 
#> 
#> $hdf_path
#> [1] "/work/pi_drsheldon_umass_edu/birdflow_modeling/pipeline/hdf"
#> 
#> $hdf_dir
#> NULL
#> 
#> $base_output_path
#> [1] "/work/pi_drsheldon_umass_edu/birdflow_modeling/pipeline/output"
#> 
#> $output_fullname
#> NULL
#> 
#> $output_path
#> NULL
#> 
#> $season
#> [1] "prebreeding"
#> 
#> $truncate_season
#> [1] FALSE
#> 
#> $model_selection
#> [1] "distance_metric"
#> 
#> $clip
#> Geometry set for 1 feature 
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 491379.6 ymin: 596053.8 xmax: 12447280 ymax: 16025860
#> Projected CRS: Custom_Lambert_Azimuthal
#> MULTIPOLYGON (((505777.3 14848232, 506295.8 148...
#> 
#> $crs
#> [1] "PROJCRS[\"Custom_Lambert_Azimuthal\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]],\n            ID[\"EPSG\",6326]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"Degree\",0.0174532925199433]]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Lambert Azimuthal Equal Area\",\n            ID[\"EPSG\",9820]],\n        PARAMETER[\"Latitude of natural origin\",30,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",-95,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"False easting\",5500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",9500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"
#> 
#> $skip_quality_checks
#> [1] FALSE
#> 
#> $fit_only
#> [1] FALSE
#> 
#> $ebirdst_year
#> [1] 2023
#> 
#> $trim_quantile
#> [1] 0.99
#> 
#> $min_season_quality
#> [1] 3
#> 
#> $force_refit
#> [1] FALSE
#>