############################## Guide to working with subgrids ############################## Concept ======= The idea of this approach is that we split up the model grid into non-overlapping subgrids, and are then able to run each step of a full BEAST run for each grid individually. The calculations for the individual subgrids can then be run in parallel, for a speed boost without memory overhead, or sequentially, to use less memory. Of course, a combination of the two is also possible, splitting a grid into many subgrids, and running a couple of subgrid calculations at the same time. At the end of the calculation, we will possess partial PDFs and statistics of each subgrid, which can be merged into a single 1dpdf file and a single stats file. By taking into account the weights of each subgrid correctly, the resulting file should be equivalent to the result of a BEAST run on the full grid. Workflow ======== To make use of this functionality, no extra data or changes to the datamodel file are needed, but you will need a custom run script that makes use of a set of newly implemented functions, and new options for existing functions. We will now give a summary of what such a run script has to pay attention to, for each of the steps in a BEAST run. (An example script might be provided later). Most the new functions can be found in ``beast.tools.subgridding_tools``. Please refer to the regular example code for the single grid implementation of these steps (``beast/examples/phat_small``). Physics model ------------- First the spectral (stellar) grid is created, using ``make_iso_table``, ``make_spectral_grid`` and ``add_stellar_priors``. Then, the extinction parameters are applied to this grid, and an extinguished SED grid is obtained, using ``make_extinguished_sed_grid``. The splitting of the grids has to happen somewhere in this function. Technically, ``split_grid`` can be either after obtaining the spectral grid with prior weights, or after obtaining the complete SED grid. The former makes more sense however, because then ``make_extinguished_sed_grid`` can be run for individual spectral subgrids, which avoids the memory impact of creating the complete SED grid. This choice also allows the user to run the construction of the grid in parallel. .. tip:: The ``split_grid`` function returns the file names of the newly created subgrids. It is very useful to save these to a text file, so that they can be used in the other steps. AST input list -------------- This is the only step where the complete SED grid is needed. The subgrids can be merged into a single file using ``merge_grids``. Just provide an output name, and a list of file names pointing to all the subgrids. The rest of the AST input list generation needs no changes once the full grid file is available. Observations/Noise models ------------------------- Here we will create separate noise model files, one for each subgrid. Nothing special happens here, e.g. just call ``make_toothpick_noise_model`` for each subgrid using the same AST results file, providing adequate output names for the resulting noise models. It is safe to run this in parallel. Trimming of the physics and noise models ---------------------------------------- The same as the above applies here. Just make sure that the subgrid/subnoisemodel files are paired correctly. Fitting & merging the results ----------------------------- Compatibility ~~~~~~~~~~~~~ To make sure that the results of the fitting routine for the individual grids are compatible, there are several subtleties which come into play here. Firstly, it needs to be made sure that the 1dpdfs are compatible: their number of bins and the values for the bin centers need to be exactly the same. To ensure this, we need to fix three values for each quantity: 1) the minimum value 2) the maximum value 3) the number of unique values This is why a new optional argument is provided in the main fitting function, ``summary_table_memory``, which allows the user to override the min, max and number of unique values for all of the quantities. The option is called ``grid_info_dict``, and needs to be a nested dictionary of a certain format. ``subgridding_tools`` contains a function called ``reduce_grid_info`` which will generate this dictionary for you. Just provide the filenames to all the (trimmed) subgrids and their (trimmed) noisemodels. This dictionary has an entry like this for each quantity (``Rv`` in this example): .. code:: python grid_info_dict['Rv'] = {'min': 0, 'max': 10, 'num_unique': 20} Fit ~~~ When the info described above has been collected, you can start calling ``summary_table_memory`` for each of the subgrids, each time providing a trimmed subgrid/trimmed subnoisemodel pair, and adequate filenames for the output. The rest of the arguments can be identical the fit on each subgrid. However, be sure to set ``do_not_normalize`` to ``True``, see note below. Merge ~~~~~ When all the subgrid fits have been successfully completed, the merge step can be started. To do this, just gather all the filenames for the pdf1d and stats files, and pass them to ``merge_pdf1d_stats``. .. note:: The main fitting function needed to be modified so that the `Pmax` values that it stores (which are the maximum log likelihood, needed to calculate the `Best` values) are compatible between subgrids. This meant getting rid of some forms of normalization (specifically, the prior weight normalization needed to be disabled). Setting ``do_not_normalize`` should have no effect on the result actually, so we might remove this option altogether and make it the default behavior. .. note:: To calculate the expectation values, another modification to the same function has been done. It now stores a measure for the total weight of the subgrid, `total_log_norm`. This value is equal to ``log(sum(exp(lnp)))``, and is calculated by taking the log of the normalization factor used in the code (because ``sum(exp(lnp)) / normalization = 1``). By comparing this value between subgrids, we are able to calculate a weighted average for each expectation value, which should be close to the one that would be obtained by fitting over the whole grid at once.