# Guide to working with subgrids¶

## Concept¶

The idea of this approach is that we split up the model grid into non-overlapping subgrids, and are then able to run each step of a full BEAST run for each grid individually. The calculations for the individual subgrids can then be run in parallel, for a speed boost without memory overhead, or sequentially, to use less memory. Of course, a combination of the two is also possible, splitting a grid into many subgrids, and running a couple of subgrid calculations at the same time.

At the end of the calculation, we will possess partial PDFs and statistics of each subgrid, which can be merged into a single 1dpdf file and a single stats file. By taking into account the weights of each subgrid correctly, the resulting file should be equivalent to the result of a BEAST run on the full grid.

## Workflow¶

To make use of this functionality, no extra data or changes to the datamodel file are needed, but you will need a custom run script that makes use of a set of newly implemented functions, and new options for existing functions. We will now give a summary of what such a run script has to pay attention to, for each of the steps in a BEAST run. (An example script might be provided later).

Most the new functions can be found in beast.tools.subgridding_tools.

Please refer to the regular example code for the single grid implementation of these steps (beast/examples/phat_small).

### Physics model¶

First the spectral (stellar) grid is created, using make_iso_table, make_spectral_grid and add_stellar_priors. Then, the extinction parameters are applied to this grid, and an extinguished SED grid is obtained, using make_extinguished_sed_grid.

The splitting of the grids has to happen somewhere in this function. Technically, split_grid can be either after obtaining the spectral grid with prior weights, or after obtaining the complete SED grid. The former makes more sense however, because then make_extinguished_sed_grid can be run for individual spectral subgrids, which avoids the memory impact of creating the complete SED grid. This choice also allows the user to run the construction of the grid in parallel.

Tip

The split_grid function returns the file names of the newly created subgrids. It is very useful to save these to a text file, so that they can be used in the other steps.

### AST input list¶

This is the only step where the complete SED grid is needed. The subgrids can be merged into a single file using merge_grids. Just provide an output name, and a list of file names pointing to all the subgrids. The rest of the AST input list generation needs no changes once the full grid file is available.

### Observations/Noise models¶

Here we will create separate noise model files, one for each subgrid. Nothing special happens here, e.g. just call make_toothpick_noise_model for each subgrid using the same AST results file, providing adequate output names for the resulting noise models. It is safe to run this in parallel.

### Trimming of the physics and noise models¶

The same as the above applies here. Just make sure that the subgrid/subnoisemodel files are paired correctly.

### Fitting & merging the results¶

#### Compatibility¶

To make sure that the results of the fitting routine for the individual grids are compatible, there are several subtleties which come into play here. Firstly, it needs to be made sure that the 1dpdfs are compatible: their number of bins and the values for the bin centers need to be exactly the same. To ensure this, we need to fix three values for each quantity:

1. the minimum value
2. the maximum value
3. the number of unique values

This is why a new optional argument is provided in the main fitting function, summary_table_memory, which allows the user to override the min, max and number of unique values for all of the quantities.

The option is called grid_info_dict, and needs to be a nested dictionary of a certain format. subgridding_tools contains a function called reduce_grid_info which will generate this dictionary for you. Just provide the filenames to all the (trimmed) subgrids and their (trimmed) noisemodels.

This dictionary has an entry like this for each quantity (Rv in this example):

grid_info_dict['Rv'] = {'min': 0, 'max': 10, 'num_unique': 20}


#### Fit¶

When the info described above has been collected, you can start calling summary_table_memory for each of the subgrids, each time providing a trimmed subgrid/trimmed subnoisemodel pair, and adequate filenames for the output. The rest of the arguments can be identical the fit on each subgrid. However, be sure to set do_not_normalize to True, see note below.

#### Merge¶

When all the subgrid fits have been successfully completed, the merge step can be started. To do this, just gather all the filenames for the pdf1d and stats files, and pass them to merge_pdf1d_stats.

Note

The main fitting function needed to be modified so that the Pmax values that it stores (which are the maximum log likelihood, needed to calculate the Best values) are compatible between subgrids. This meant getting rid of some forms of normalization (specifically, the prior weight normalization needed to be disabled). Setting do_not_normalize should have no effect on the result actually, so we might remove this option altogether and make it the default behavior.

Note

To calculate the expectation values, another modification to the same function has been done. It now stores a measure for the total weight of the subgrid, total_log_norm. This value is equal to log(sum(exp(lnp))), and is calculated by taking the log of the normalization factor used in the code (because sum(exp(lnp)) / normalization = 1). By comparing this value between subgrids, we are able to calculate a weighted average for each expectation value, which should be close to the one that would be obtained by fitting over the whole grid at once.