Guide to working with subgrids¶
Concept¶
The idea of this approach is that we split up the model grid into non-overlapping subgrids, and are then able to run each step of a full BEAST run for each grid individually. The calculations for the individual subgrids can then be run in parallel, for a speed boost without memory overhead, or sequentially, to use less memory. Of course, a combination of the two is also possible, splitting a grid into many subgrids, and running a couple of subgrid calculations at the same time.
At the end of the calculation, we will possess partial PDFs, statistics, and log likelihoods of each subgrid, which can be merged into a single 1D PDF file, a single stats file, and a single likelihood file. By taking into account the weights of each subgrid correctly, the resulting files will be equivalent to the result of a BEAST run on the full grid.
Workflow¶
Functions that implement subgrids can be found in beast.tools.subgridding_tools
.
To make use of this functionality, simply set n_subgrid
in beast_settings
to the desired number of subgrids. An example workflow script that utilizes the
subgridding tools can be found in
BEAST examples.
We will now give a summary of the steps in a subgrid run and how they differ from a non-subgridded run.
Physics model¶
First the spectral (stellar) grid is created, using make_iso_table
,
make_spectral_grid
and add_stellar_priors
. Then, the extinction
parameters are applied to this grid, and an extinguished SED grid is obtained,
using make_extinguished_sed_grid
.
The splitting of the grids has to happen somewhere in this function.
Technically, split_grid
can be either after obtaining the spectral grid with
prior weights, or after obtaining the complete SED grid. The former makes more
sense however, because then make_extinguished_sed_grid
can be run for
individual spectral subgrids, which avoids the memory impact of creating the
complete SED grid. This choice also allows the user to run the construction of
the grid in parallel.
Tip
The split_grid
function returns the file names of the newly created
subgrids. It is very useful to save these to a text file, so that they can be
used in the other steps.
AST input list¶
This is the only step where the complete SED grid is needed. The subgrids can be
merged into a single file using merge_grids
. Just provide an output name,
and a list of file names pointing to all the subgrids. The rest of the AST input
list generation needs no changes once the full grid file is available.
Observations/Noise models¶
Here we will create separate noise model files, one for each subgrid. Nothing
special happens here, e.g. just call make_toothpick_noise_model
for each
subgrid using the same AST results file, providing adequate output names for the
resulting noise models. It is safe to run this in parallel.
Trimming of the physics and noise models¶
The same as the above applies here. Just make sure that the subgrid/subnoisemodel files are paired correctly.
Fitting & merging the results¶
Compatibility¶
To make sure that the results of the fitting routine for the individual grids are compatible, there are several subtleties which come into play here. Firstly, it needs to be made sure that the 1D PDFs are compatible: their number of bins and the values for the bin centers need to be exactly the same. To ensure this, we need to fix three values for each quantity:
the minimum value
the maximum value
the number of unique values
This is why a new optional argument is provided in the main fitting function,
summary_table_memory
, which allows the user to override the min, max and
number of unique values for all of the quantities.
The option is called grid_info_dict
, and needs to be a nested dictionary of
a certain format. subgridding_tools
contains a function called
reduce_grid_info
which will generate this dictionary for you. Just provide
the filenames to all the (trimmed) subgrids and their (trimmed) noisemodels.
This dictionary has an entry like this for each quantity (Av
in this example):
grid_info_dict['Av'] = {'min': 0, 'max': 10, 'num_unique': 20}
Fit¶
When the info described above has been collected, you can start calling
summary_table_memory
for each of the subgrids, each time providing a trimmed
subgrid/trimmed subnoisemodel pair, and adequate filenames for the output. The
rest of the arguments can be identical the fit on each subgrid. However, be sure
to set do_not_normalize
to True
, see note below.
Merge¶
When all the subgrid fits have been successfully completed, the merge step can be
started. To do this, pass the 1D PDF and stats file names to
merge_pdf1d_stats
and the log likelihood file names to merge_lnp
.
Note
The main fitting function needed to be modified so that the Pmax
values
that it stores (which are the maximum log likelihood, needed to calculate the
Best
values) are compatible between subgrids. This meant getting rid of
some forms of normalization (specifically, the prior weight normalization
needed to be disabled). Setting do_not_normalize
should have no effect on
the result actually, so we might remove this option altogether and make it
the default behavior.
Note
To calculate the expectation values, another modification to the same function
has been done. It now stores a measure for the total weight of the subgrid,
total_log_norm
. This value is equal to log(sum(exp(lnp)))
, and is
calculated by taking the log of the normalization factor used in the code
(because sum(exp(lnp)) / normalization = 1
). By comparing this value
between subgrids, we are able to calculate a weighted average for each
expectation value, which should be close to the one that would be obtained
by fitting over the whole grid at once.