Fit many species with identical parameters
batch_species.RdBatch fit many species with fixed loss function parameters without tuning, evaluation, or reports.
Usage
batch_species(
species,
grid_search_type = "old",
de_ratio = NA,
obs_prop = NA,
dist_pow = 0.4167,
dist_weight = 0.008177,
ent_weight = 0.001924,
show_progress = TRUE,
...
)Arguments
- species
A vector of species codes or common names to fit. Must work with
ebirdst::get_species()- grid_search_type
Either
"old"to specify all the Python hyperparameters (dist_pow,dist_weight, andent_weight) directly; or"new"to uses the alternative paramerization ofdist_pow,obs_prop, andde_ratiowhich are passed to therefactor_hyperparametersfunction to generate the Python parameters. In either case there is a hidden python parameterobs_weightwhich is always set to 1 when called from BirdFlowPipeline."new"is the current default behavior because the ratio between distance weight and entropy weight heavily determines the behavior of the model. These hyperparameters are highly sensitive and refactoring them in this way makes it easier to intuit about and appears to create better coverage and performance of the "grid" search.- de_ratio, obs_prop, dist_pow, dist_weight, ent_weight
loss function parameters see definition in
grid_search_listbelow under...- show_progress
If
TRUEbatch_species()will display a progress bar and wait for all the launched jobs to finish. IfFALSEbatch_species()will exit immediately after launching.- ...
Arguments passed on to
set_pipeline_paramsgpu_ramSet the GPU RAM allocation in GB when fitting the models. It is also passed to BirdFlowR::preprocess_species() and if
resis set toNULLthat function will set theresto the smallest round number that can be fit with the allocated ram.resThe resolution of model. Specifically the height and width of the model cells in km. Set to
NULLto fit the finest resolution possible with the allocatedgpu_ram.suffixCharacters to append to the output directory and model file names.
grid_search_listThese parameters are used in the loss function by BirdFlowPy. They are also used in the grid search when running training+validation. All six possible parameters are in the list even though two of them won't be used - which two depends on
grid_search_type. They are the set of values that will be combined factorially in the grid search. The default values represent our currently preferred approach.New and Old below indicate which setting of
grid_search_typethey are used with.de_ratioNew The ratio of the distance weight to entropy weight in the loss function that is optimized while fitting the models
obs_propNew The proportion of the total weight (among observation, distance, and entropy weights), that is assigned to the observation weight in the loss function.
dist_powNew and Old The power used to transform movement distances prior to weighting them in the loss function
dist_weightOld The weight assigned to distances in the loss function.
ent_weightOld The weight assigned to entropy in the loss function.
hdf_pathThe path to the base directory in which hdf5 preprocessed and fitted models are stored.
hdf_dirThe path to the directory in which this run's hdf5 preprocessed and fit models are stored. Set by
preprocess_species_wrapper()to<hdf_path>/<species>_<res>km/.base_output_pathThe base path for fitted models, model reports, and other output.
output_fullnameInitially
NULLthis is set by preprocess_species_wrapper to"<species>_<res>km<suffix>".output_pathInitially
NULLthis is set by preprocess_species_wrapper to"<base_output_dir>_<output_fullname>".seasonThis is used in two ways: (1) to filter movement data (I think just tracks? - ebp) before evaluating the model and (2) if
truncate_seasonisTRUEit is passed to BirdFlowR::preprocess_species(), producing a model that is truncated to just that season.truncate_seasonIf
TRUEthe model will be truncated toseasonand marginals for transitions outside of the season won't be fit or included.model_selectionSet how model selection is performed within
rank_models()as called from training+validation. It should be one of:"str_etc"Straightness and traverse correlation only.
"pit_etc"PIT metrics and traverse correlation only.
"real_tracking"Tracking-focused model selection, this includes traverse correlation; PIT scores; and straightness and n_stopovers targeted to observed values from real tracking data.
"real_tracking_no_cal"Tracking-focused model selection, this includes traverse correlation; and straightness and n_stopovers targeted to observed values from real tracking, but no PIT scores.)
"averaged_parameters"This is a place holder for when the loss function parameters are fixed - typically to average values - in which case no model selection is performed.
clipThis is passed to
BirdFlowR::preprocess_species()to define a clipping polygon to use while preprocessing - only areas within the polygon are included in the model. If the clip is set to"default",clipthe clip definined inthewill be used which is currently americas_clip.crsPassed to BirdFlowR::preprocess_species() to define the coordinate reference system for the model. With the default of
NULLthe CRS set intheis used which is currently americas_crs.skip_quality_checksPassed to
BirdFlowR::preprocess_species()ifTRUEan error will be thrown if season quality values in ebirdst::ebirdst_runs are below three. IfTRUEthen attempt to fit the model regardless of quality.fit_onlySet to
TRUEto fit a model without a grid search, model evaluation, model ranking, or model reports.ebirdst_yearThe version year of the ebirdst package. This shouldn't be set by users and any supplied value will be ignored.
trim_quantilePassed to
BirdFlowR::preprocess_species().min_season_qualityThe minimum score of any season. If the score of any season of the eBird Status and Trends product does not reach the score, not fitting the model.
force_refitForce refitting the models. Default to
FALSE.