Skip to contents

Batch fit many species with fixed loss function parameters without tuning, evaluation, or reports.

Usage

batch_species(
  species,
  grid_search_type = "old",
  de_ratio = NA,
  obs_prop = NA,
  dist_pow = 0.4167,
  dist_weight = 0.008177,
  ent_weight = 0.001924,
  show_progress = TRUE,
  ...
)

Arguments

species

A vector of species codes or common names to fit. Must work with ebirdst::get_species()

grid_search_type

Either "old" to specify all the Python hyperparameters (dist_pow, dist_weight, and ent_weight) directly; or"new" to uses the alternative paramerization of dist_pow, obs_prop, and de_ratio which are passed to the refactor_hyperparameters function to generate the Python parameters. In either case there is a hidden python parameter obs_weight which is always set to 1 when called from BirdFlowPipeline. "new" is the current default behavior because the ratio between distance weight and entropy weight heavily determines the behavior of the model. These hyperparameters are highly sensitive and refactoring them in this way makes it easier to intuit about and appears to create better coverage and performance of the "grid" search.

de_ratio, obs_prop, dist_pow, dist_weight, ent_weight

loss function parameters see definition in grid_search_list below under ...

show_progress

If TRUE batch_species() will display a progress bar and wait for all the launched jobs to finish. If FALSE batch_species() will exit immediately after launching.

...

Arguments passed on to set_pipeline_params

gpu_ram

Set the GPU RAM allocation in GB when fitting the models. It is also passed to BirdFlowR::preprocess_species() and if res is set to NULL that function will set the res to the smallest round number that can be fit with the allocated ram.

res

The resolution of model. Specifically the height and width of the model cells in km. Set to NULL to fit the finest resolution possible with the allocated gpu_ram.

suffix

Characters to append to the output directory and model file names.

grid_search_list

These parameters are used in the loss function by BirdFlowPy. They are also used in the grid search when running training+validation. All six possible parameters are in the list even though two of them won't be used - which two depends on grid_search_type. They are the set of values that will be combined factorially in the grid search. The default values represent our currently preferred approach.

New and Old below indicate which setting of grid_search_type they are used with.

de_ratio

New The ratio of the distance weight to entropy weight in the loss function that is optimized while fitting the models

obs_prop

New The proportion of the total weight (among observation, distance, and entropy weights), that is assigned to the observation weight in the loss function.

dist_pow

New and Old The power used to transform movement distances prior to weighting them in the loss function

dist_weight

Old The weight assigned to distances in the loss function.

ent_weight

Old The weight assigned to entropy in the loss function.

hdf_path

The path to the base directory in which hdf5 preprocessed and fitted models are stored.

hdf_dir

The path to the directory in which this run's hdf5 preprocessed and fit models are stored. Set by preprocess_species_wrapper() to <hdf_path>/<species>_<res>km/.

base_output_path

The base path for fitted models, model reports, and other output.

output_fullname

Initially NULL this is set by preprocess_species_wrapper to "<species>_<res>km<suffix>".

output_path

Initially NULL this is set by preprocess_species_wrapper to "<base_output_dir>_<output_fullname>".

season

This is used in two ways: (1) to filter movement data (I think just tracks? - ebp) before evaluating the model and (2) if truncate_season is TRUE it is passed to BirdFlowR::preprocess_species(), producing a model that is truncated to just that season.

truncate_season

If TRUE the model will be truncated to season and marginals for transitions outside of the season won't be fit or included.

model_selection

Set how model selection is performed within rank_models() as called from training+validation. It should be one of:

"str_etc"

Straightness and traverse correlation only.

"pit_etc"

PIT metrics and traverse correlation only.

"real_tracking"

Tracking-focused model selection, this includes traverse correlation; PIT scores; and straightness and n_stopovers targeted to observed values from real tracking data.

"real_tracking_no_cal"

Tracking-focused model selection, this includes traverse correlation; and straightness and n_stopovers targeted to observed values from real tracking, but no PIT scores.)

"averaged_parameters"

This is a place holder for when the loss function parameters are fixed - typically to average values - in which case no model selection is performed.

clip

This is passed to BirdFlowR::preprocess_species() to define a clipping polygon to use while preprocessing - only areas within the polygon are included in the model. If the clip is set to "default", clip the clip definined in the will be used which is currently americas_clip.

crs

Passed to BirdFlowR::preprocess_species() to define the coordinate reference system for the model. With the default of NULL the CRS set in the is used which is currently americas_crs.

skip_quality_checks

Passed to BirdFlowR::preprocess_species() if TRUE an error will be thrown if season quality values in ebirdst::ebirdst_runs are below three. If TRUE then attempt to fit the model regardless of quality.

fit_only

Set to TRUE to fit a model without a grid search, model evaluation, model ranking, or model reports.

ebirdst_year

The version year of the ebirdst package. This shouldn't be set by users and any supplied value will be ignored.

trim_quantile

Passed to BirdFlowR::preprocess_species().

min_season_quality

The minimum score of any season. If the score of any season of the eBird Status and Trends product does not reach the score, not fitting the model.

force_refit

Force refitting the models. Default to FALSE.

Value

It returns invisibly the parameter list. Including a newly added item batch_tools_registry which is the path to the batch tools registry for the job.