Changelog • BirdFlowPipeline

BirdFlowPipeline 0.0.0.9015

2026-04-21

Add optional credit section to footer for sources of tracking data
Move species reports from BirdFlowWork to BirdFlowPipeline invoke with make_new_report().
Add plot_grid_search_metrics() and plot_tuning to plot grid search metrics from the parameter tuning along with some helper functions and objects to set symbology.

BirdFlowPipeline 0.0.0.9014

2026-04-16

Restore batch_species() to package.
Add americas_crs and americas_clip data objects to package and use them as the default CRS and clip in the and set_pipeline_params().
Fix errors and warnings produced by packagedown on Github.

BirdFlowPipeline 2025-05 to 2026-01

Pipeline & orchestration - BatchBirdFlowTrainer / fit.BatchBirdFlowTrainer: builds grid of fits, auto-calculates GPU RAM, supports test_one_fit to run one local fit instead of submitting to Slurm. - batch_modelfit_wrapper & birdflow_modelfit_args_df: construct grid, skip/clean existing HDF5s, submit batchtools jobs, resubmit failures.

Model fitting & GPU/job handling - birdflow_modelfit: calls Python update_hdf.py to run a single fit. - GPU logic: small (<10GB) requests favor preferred GPUs; large requests clear constraints to use high-memory GPUs; resource fields (constraint.gpu/prefer.gpu) propagated.

Robust HDF5 handling - identify_hdf5_model & loaders: added try/catch, remove unreadable HDF5s, compact() previous-results, and tolerate broken files when evaluating. - batch_evaluator / evaluate.BatchBirdFlowEvaluator: test_one_evaluate flag, job-failure counting, reduceResultsList use.

Data loading & splits - TransitionsLoader (new S3), new_transitions_loader, load.TransitionsLoader, split.TransitionsLoader, train_test_split: centralize combining banding/Motus/tracking into transitions, produce training/test (including one-week) bundles.

Validation & API cleanup - validate_* functions renamed/strengthened (validate_BatchBirdFlowTrainer, validate_TransitionsLoader, validate_split_data, validate_BatchBirdFlowEvaluator). - NAMESPACE updated: removed deprecated exports (batch_flow, batch_species, multiple_species_batch).

Utilities & helpers - refactor_hyperparams: convert de_ratio/obs_prop → dist_weight/ent_weight. - identify_hdf5_model/birdflow_modelfit_args_df: numeric-signif fixes to avoid FP mismatch and remove extra hdf5s.

Docs, tests, vignettes - Many man pages, vignettes, and example Rmd/html added/updated; tests adjusted to renamed methods (load_data, split_data) and paths.

BirdFlowPipeline 0.0.0.9013

2025-05-23

Update rts_stats() it now works with BirdFlowRoutes objects as well as the data frame precursor to those objects.

Update functions and tests for ebirdst 2.2023.0. Tests that depend on the ebirdst package version are also now skipped if an older version is installed.

Clean up some other testing issues.

Resolve warnings and notes from package check. * Cross references missing package anchors * Functions not imported from Utils

BirdFlowPipeline 0.0.0.9012

2024-10-03

Add license to footer. Three variables optionally defined in the parent .Rmd affect the licence. * is_html controls whether the license is defined as html or markdown, html version is fancier. It will default to TRUE if not defined. * focal_species For species specific version of the license this should be the common name of the species. * doc_url The URL (in final location) of the current document.

BirdFlowPipeline 0.0.0.9011

2024-10-02

Add title argument to build_collection_index() and drop the default description from the document when a custom description is supplied. Default description is now in rmd/collection_description.Rmd.

BirdFlowPipeline 0.0.0.9010

2024-09-26

Add build_collection_index()

Move build_collection_index() from BirdFlowR to BirdFlowPipeline.
Add description argument to allow inserting collection specific markdown.
Switch to the standard footer: rmd/footer.Rmd

BirdFlowPipeline 0.0.0.9009

2024-09-25

Minor updates to model reports. This was in an attempt to adapt the model reports for tuned models to work for models fit with averaged hyperparameters. Ultimately these changes weren’t used but I cleaned up some documentation and added comments in the process so decided to keep the changes.

BirdFlowPipeline 0.0.0.9008

2024-08-15

Add show_progress argument to batch_species() and batch_flux() if FALSE the functions execute immediately after launching. This facilitates programmatically calling them in a loop to adjust computing resources to varied run times and memory requirements.
New inst/rmd/footer.Rmd has links to BirdFlow and eBird and the eBird disclaimer in a standard format. It is inserted in the end of
flux_index.Rmd, flux_report.Rmd, andmodel_report.Rmd. Most of these documents had the links, none previously had the disclaimer.

BirdFlowPipeline 0.0.0.9007

2024-08-14

Add handling of trim_quantile argument to BirdFlowR::preprocess_species() to the pipeline functions. The default of NULL (no trimming) is set in set_pipeline_parameters() but can be overridden by setting another value when calling batch_species() or batch_flow()

BirdFlowPipeline 0.0.0.9006

2024-06-12

Update how resources are requested on slurm to fit large GPU RAM models

Add #SBATCH --constraint vram[X] where [X] is various levels of GB GPU RAM, to slurm.tmpl. The vram[X] constraint is only included when running with GPUs and the level of [X] is automatically set to the lowest level that is higher than the memory requested for the job in modelfit_resources. For a list of the supported levels see the unity constraint list..
If resources$prefer.gpu or resources$constraint.gpu are set to NULL or NA than the corresponding lines are dropped in slurm.tmpl, these constraints were taking preference over the vram[X] constraints causing jobs to run on nodes that didn’t support the requested GPU RAM.
batch_species() now sets prefer.gpu and constraint.gpu to NULL. The other batch_ functions are not changed but in the future setting modelfitresources$ contraint.gpu and prefer.gpu to NULL would allow for fitting larger models with those functions, but lose the GPU preferences Dave had set.
batch_flux() now will request the cpu-preemt and cpu-long partitions if the walltime requested exceeds 24 hours. Previously you could request more time with walltime but if the job ended up on the cpu queue it would still timeout after 24 hours.

BirdFlowPipeline 0.0.0.9005

2024-06-07

Add prev and next buttons to flux model reports.

BirdFlowPipeline 0.0.0.9004

2024-05-31

Adapt to Unity updates. Changed the slurm template file sligtly to avoid triggering an erronious error “Request a CPU Partition for CPU jobs”.
Switch report logo source to github so it doesn’t depend on a local file - which was deleted.
Rollback change in batch_flow() so that hdf files are copied and then deleted. Moving doesn’t work across drives and casused an error in the unit tests.
Export`batch_flux()

BirdFlowPipeline 0.0.0.9003

Batch Flux

New functions * batch_flux() Launch flux on slurm in parallel via batchtools * process_flux() Wrap BirdFlowR::flux() and add importing the model from a file and writing the result to a file. * make_flux_index() Create an html index to a directory of flux reports. * make_flux_report() Create an html flux report.
* get_job_efficiency() Get the efficiency of a completed slurm job.

BirdFlowPipeline 0.0.0.9002

Note for existing old output to work with the changes below we’ll have to update file names:
* “ll_df.rds” to “eval_metrics.rds”

Add batch_species() and preprocess_and_fit() for fitting species to fixed hyperparameters.

Vectorize refactor_hyperparams() and add test for it.

Add set_pipeline_params() function to set parameters for batch_flow() and multi_species_batch(). This achieves two things: 1. It makes it easier to set just one parameter to a non-default value while leaving the rest at the default. 2. Adds documentation once in set_pipeline_parameters() that is inherited by batch_flow() and multispecies_batch().

Standardized and updated naming: * “ll_df” -> “eval_metrics” * “my_sp”“,”my_species”“, and”one_species” -> “species” * “my_res” -> “res” * “output_nickname” -> “suffix” * “mydir”, “my_dir” -> “dir” – in the context of calls to python

In params object. * break output_dir into two: * base_output_path the parent directory to output_dir * output_path the output directory where results should be written for the model currently being turned; set by preprocess_species_wrapper(). Previously output_dir was equivalent to base_output_dir until preprocess_species_wrapper() is called and output_dir after. * add output_full_name= NULL at start - final value is set by preprocess_species_wrapper()

Vectorize birdflow_modelfit_args_df() and add test to make sure end result is the same.

BirdFlowPipeline 0.0.0.9001

Feb 9, 2024

Add NEWS.md
Add schema outlining major package functions, how they call each other, and what files they write.
Add gpu_ram() to calculate the GB of GPU ram needed to fit a model