Pipeline Cookbook
The heuristics adopted in the pipeline appear to work well in the majority of cases, but there are invariably cases not adequately treated by the defaults and thus require custom processing. To aid in the processing of a FAUST target, the pipeline provides helper classes that abstract away the book-keeping aspects into individual tasks to be configured depending on the requirements of a particular field and SPW.
Note that because the pipeline is computationally intensive, even imaging a single SPW can take a day or more when CASA is run using single threaded execution, and still a substantial amount of time when executed in parallel.
Running the default pipeline
The faust_imaging.ImageConfig
class provides the primary interface to
tclean
within CASA and encapsulates properties specific to a field, SPW,
array configuration, and desired tclean
parameters. Please refer to the
API Documentation and the docstring for additional
information on the calling convention of this class.
To run all tasks of the pipeline with default parameters, first create an
instance of the faust_imaging.ImageConfig
class and use the
faust_imaging.ImageConfig.run_pipeline()
method.
config = ImageConfig.from_name('CB68', '244.936GHz_CS', weighting=0.5)
config.run_pipeline()
The full list of SPW labels may be found in the ALL_SPW_LABELS variable.
The above command will generate the default pipeline image products for
the target field CB68, for the CS (5-4) line of Setup 2, using a Briggs
robust factor of 0.5
. The full calling convention is:
ImageConfig.from_name(
'<FIELD>', # field name, e.g., "CB68". See the global var `ALL_FIELDS`
'<LABEL>', # SPW label, e.g., "244.936GHz_CS"
weighting=0.5,
# ^ Use a number for Briggs robust or a string recognized by tclean,
# e.g., 'uniform' or 'natural'. The default is 0.5.
fullcube=True,
# ^ Image the full bandwidth of the SPW or a narrow 20km/s window
# centered on the primary line of interest. The default is True.
kind='joint',
# ^ What combination of array configurations to use. Possible values
# include ('joint', '12m', '7m'). The default is 'joint'.
)
Pipeline task description
The faust_imaging.ImageConfig.run_pipeline()
method discussed above is
primarily a wrapper for calling all of the pipeline tasks in sequence using the
default parameters. For custom imaging, it is recommended that users create a
recipe using the underlying tasks. As an example, this is the code that is
executed by default:
config = ImageConfig.from_name(...)
ext = 'clean' # extensionn name for final cleaned products
config.make_dirty_cube()
config.clean_line_nomask(sigma=4.5)
config.make_seed_mask(sigma=5.0)
config.clean_line(mask_method='seed+multithresh', ext=ext)
config.postprocess(ext=ext, make_fits=True)
The following sections describe the function of each pipeline step in detail. The start of each sub-section includes a link to the API documentation where a detailed description of the parameters may be found.
make_dirty_cube
: Creating the dirty cube
faust_imaging.ImageConfig.make_dirty_cube()
. A call to tclean
is
first made with niter=0
to grid and transform the data. These data
products have the default extension name of “dirty” which may be changed
by modifying the global variable DIRTY_EXT
. The dirty image products
are used for computing the RMS and computing the common frequency coverage
between different array configurations.
clean_line_nomask
: Initial un-masked cleaning
faust_imaging.ImageConfig.clean_line_nomask()
. The “auto-multithresh”
method for automated clean mask generation in tclean
determines noise
characterstics per-plane, along with other heuristics based on the maximum
negative residual, which can make cleaning extended emission where substantial
filtering present problematic. For this reason an initial deconvolution is
performed without masking to a relatively shallow depth set by the parameter
sigma
with a default value of 4.5 times the RMS. Files are generated using
the default extension name “nomask” which can be set using the globa variable
NOMSK_EXT
. To avoid diverging in the unrestricted clean, large angular
scales may be excluded from multiscale clean using the scale_upper_limit
parameter.
make_seed_mask
: Creating the threshold mask
faust_imaging.ImageConfig.make_seed_mask()
. A threshold is applied to
the restored image generated by clean_line_nomask
to create an initial
input mask to “seed” the mask created with auto-multithresh. This ensures
that all channels, even those with strong spatial filtering, are masked.
The threshold to apply may be set with the sigma
parameter; the
default is 5.0 times the RMS.
clean_line
: Second, masked cleaning
faust_imaging.ImageConfig.clean_line()
. A second, new round of
deconvolution is performed using the thresholded mask generated in the
previous steps as a “seed” for auto-multithresh. This combination of the seed
mask and auto-multithresh is the recommended method and is the default masking
method, and set as the default through the parameter
mask_method="seed+multithresh"
. If mask_method="auto-multithresh"
is
used then the prior two pipeline steps used to generate the “seed” mask are not
required. The global clean threshold can be set with the sigma
parameter.
The parameter ext
sets the string appended to the filename. The standard
convention is to use “clean” for these final products, but other names may be
used when experimenting with different parameters to avoid over-writing
existing runs.
Runs can be restarted and cleaned interactively using the restart
and
interactive
keyword arguments. See Restarting for further detail.
postprocess
: Image cube postprocessing
faust_imaging.ImageConfig.postprocess()
. Lastly, a few minor post
processing steps are applied to the image products generated by clean_line
.
These include (1) checking whether the maximum value of the residual image
exceeds 5.5-sigma, (2) correcting the restored image for the primary beam
attenuation, (3) smoothing the image to a common-beam angular resolution, and
(4) exporting the CASA image to a FITS image. The final FITS image names will
be of the form:
# template form:
<PROD_DIR>/<FIELD>/<FIELD>_<LABEL>_<KIND>_<WEIGHT>_<EXT>.image.pbcor.common.fits
# example:
images/CB68/CB68_244.936GHz_CS_joint_0.5_clean.image.pbcor.common.fits
As an optional final step, quality assurance plots can be generated. These plots are useful for assessing whether further deconvolution is required. Making these plots is described in QA Plots.
Quality assurance plots
Quality Assurance (QA) plots are useful for quickly obtaining an overview of
whether the deconvolved products are satisfactory. The function
faust_imaging.make_all_qa_plots()
can be used to generate channel maps of
all restored images and residual images for a field where the peak restored
image intensity exceeds 6-sigma. PDF and PNG files are written to the directory
specified in PLOT_DIR
(by default <PROD_DIR>/plots
).
# to overwrite all existing plots, use the default overwrite=True
make_all_qa_plots('CB68', ext='clean')
# to skip plots that have already been made, set overwrite=False
make_all_qa_plots('CB68', ext='clean', overwrite=False)
To make an individual QA plot from an image path name use
faust_imaging.make_qa_plots_from_image()
:
# example filename for CB68 CS (5-4)
imagename = 'images/CB68/CB68_244.936GHz_CS_joint_0.5_clean.image'
make_qa_plots_from_image(imagename)
For developing custom plotting routines, the faust_imaging.CubeSet
class may be of use.
Cleaning up and removing files
The pipeline produces a large number of intermediate image products. These products can be kept in order to restart unsatisfactory imaging results or removed if the final products are suitable.
config = ImageConfig.from_name('CB68', '244.936GHz_CS', weighting='natural',
kind='12m')
# The helper method ".remove_all_files()" will identify all files/images with
# names matching the configuration file stem, in this example:
# images/CB68/CB68_244.936GHz_CS_12m_natural*
# but not others runs imaged with joint 12/7m data or other uv-weightings.
print(config.get_imagebase())
config.remove_all_files()
Intermediate image products that are unnecessary for the functioning of the
pipeline or restarting image processes are removed during the run. If you
wish to preserve these image products, such as for example the .model
image to the un-masked clean run, then all intermediate products can be kept by
setting the
faust_imaging.ImageConfig.preserve_all_intermediate_products
attribute:
config = ImageConfig.from_name(...)
config.preserve_all_intermediate_products = True
config.run_pipeline()
Imaging cut-out velocity windows
The default pipeline setting of fullcube=True
will image the entire
bandwidth of the SPW. These can be rather large, typically more than 470
channels or 90 km/s. If only a particular line is of interest, then
a cut-out in frequency may be imaged to reduce run-time cost and disk
space requirements.
If the line to be imaged was the primary target of the SPW, then no changes
need to be made, e.g., Setup 1 C18O J=2-1 near 219.56 GHz. The default
velocity bandwidth is 20 km/s (+/- 10 km/s about the system velocity) but may
be set with the attribute faust_imaging.ImageConfig.line_vwin
.
config = ImageConfig.from_name(..., fullcube=False)
# Set the window for +/- 2.5 km/s (full-width of 5 km/s) centered on the
# system velocity.
config.line_vwin = '5km/s'
config.run_pipeline()
The primary transition targeted by the SPW can be determined by comparing the
value of spw.mol_name
(primary molecule) to spw.name
(full SPW name
with transitions from the correlator configuration).
Imaging cut-outs that were not the primary transitions targeted by an SPW
requires creating new instances of the classes faust_imaging.Spw
and
faust_imaging.DataSet
. An instance of ImageConfig
then must be
created directly. This can be particularly useful for the continuum windows
which are resource intensive to process with full bandwidth cubes.
# We wish to image the acetaldehyde CH3CHO 11(1,10)-10(1,9) transition
# also found in the Setup 1 SPW ID 27. The primary targeted line was
# deuterated ammonia NH2D 3(2,2)s-3(2,1)a. Create a copy of this window
# and change the molecule name (for files and paths) and the line
# rest frequency of the new transition.
spw = ALL_SPWS['216.563GHz_NH2D'].copy()
spw.mol_name = 'CH3CHO'
spw.restfreq = '216.58193040GHz' # from SLAIM
# Initialize the DataSet class, which contains information such as image
# and cell size, etc., with the same Setup as our SPW (Setup 1 in this
# example).
dset = DataSet('CB68', setup=spw.setup, kind='joint')
# Initialize the `ImageConfig` class directly from the instances and
# run the pipeline.
config = ImageConfig(dset, spw, fullcube=False)
config.line_vwin = '5km/s' # default 20 km/s
config.run_pipeline(ext='5kms_clean')
# The final image products will be named as:
# CB68_216.582GHz_CH3CHO_joint_0.5_5kms_clean.*
Modifying multi-scale parameters
To adjust the default parameters used by multi-scale CLEAN in the pipeline, modify the attributes listed in the example below:
config = ImageConfig.from_name('CB68', '244.936GHz_CS')
# The default scales, in pixels, are (0, 15, 45, 135) corresponding to
# approximately 0, 0.45, 1.35, 4.05 arcsec for a typical HPBW of 0.3as
# and an over-sampling factor of 10 (cell size of 0.03as). Here, we
# replace the 45 pixel scale with two new scales: 30 and 60 pixels.
config.scales = [0, 15, 30, 60, 135] # pix
# We also increase the small-scale bias parameter from the pipeline
# default of -1.0 to -0.5.
config.smallscalebias = -0.5
config.run_pipeline()
Restarting tclean
and manual masking
Some datasets can be difficult to clean satisfactorily with the default
pipeline settings, particularly those with extended emission that suffers
heavy spatial filtering (ex. C180, c-C3H2). Good results on these cubes
may require manual intervention in the defining the clean masks. After
inspecting the results of the pipeline products, a run can be restarted using
the same settings but using the restart
and interactive
keyword
arguments of faust_imaging.ImageConfig.clean_line()
:
config = ImageConfig(...)
# The pipeline should already have been run previously and for this example
# there should exist images with names ending in the extension "clean".
# Now restart the deconvolution using the existing files and run it in
# interactive mode. It's likely a good idea to backup the `.model` and
# `.mask` files in the event the deconvolution fairs poorly. Also, if you
# wish to clean more deeply, one can set the `sigma` argument to a lower
# value here.
config.clean_line(ext='clean', restart=True, interactive=True)
# Note that an alias is included for this use, identical in arguments and
# calling convention as above, named:
# config.clean_line_interactive_restart(ext='clean')
# Now, if the results are satisfactory, apply the post-processing steps
# to finish the pipeline.
config.postprocess(ext='clean')
The above commands will pull up the CASA interactive viewer for creating the
clean masks manually. It is recommended to set the cycleniter
in the upper
left to a low value, perhaps 100 iterations, and to use the green “recycling”
arrow to execute a few iterations. The un-cleaned emission of interest is often
present in channels with extended emission near the edge of the primary beam
mask. These channels have a high-probability of diverging if left to finish
the run using the blue “right arrow”. Once the extended emission appears to
have been cleaned satisfactorily, finish the run by clicking the red “stop
sign” button.
Restarting tclean
can also be performed without using the interactive mode.
One example usage may be cleaning to a shallow depth, inspecting the results or
applying a few tweaks, and then cleaning more deeply.
config = ImageConfig(...)
# ... pipeline has been run up to `.clean_line`
# First, clean relatively shallowly to 3.5-sigma and let it run
# automatically.
config.clean_line(ext='clean', sigma=3.5)
# Inspect the resulting cubes to see if the results are satisfactory. The
# QA plots could be made for the target, for example.
# Now, continue the deconvolution to a lower depth of 2.0-sigma
config.clean_line(ext='clean', restart=True, sigma=2.0)
config.postprocess(ext='clean')
Creating moment maps
Moment maps and other point estimators (e.g., maximum) may be generated
from the data using the faust_imaging.make_all_moment_maps()
function for
all of the SPWs of a target or faust_imaging.make_moments_from_image()
for a single SPW of a target.
# Create moment maps for all SPWs. The `vwin` parameter sets the velocity
# window half-width in km/s to calculate the moment over.
make_all_moment_maps('CB68', ext='clean', vwin=5)
# Generate a single set of moment maps
imagename = 'images/CB68/CB68_244.936GHz_CS_joint_0.5_clean.image'
make_moments_from_image(imagename, vwin=5)
The moments are calculated by masking pixels which are not (a) in the clean mask and (b) do not meet a significance cut on a Hanning smoothed cube. The moments are computed using the unsmoothed data.
Manually setting the RMS
By default the global RMS used for deriving thresholds is computed from the
full dirty cube. For small windows where >50% of the channels may contain
significant emission, this globally RMS value may not be appropriate. If
the desired RMS value to use is known, the
faust_imaging.ImageConfig.rms
attribute may be set manually.
config = ImageConfig(...)
config.rms = 0.001 # in Jy
Frequency-chunked image processing
The memory requirements for imaging the full spectral windows using the
fullcube=True
are demanding, requiring several hundred gigabytes of RAM.
In principle running tclean
with the parameter chankchunks=-1
serially processes individual frequency ranges to conserve memory,
unfortunately problems persist. The most serious issues observed are that the
final concatentation step in tclean
can segfault, and that copying the
internal mask files using makemask
also frequently fails for large image
cubes.
To relieve memory requirements, the pipeline may be run on individual
frequency intervals or “chunks”. By default the pipeline procedure
faust_imaging.ImageConfig.run_pipeline()
will chunk and concatenate the
results:
config = ImageConfig(...) # or `.from_name(...)`
config.run_pipeline(ext='clean')
# or explicitly
config.run_pipeline(ext='clean', use_chunking=True, nchunks=4)
The number of chunks can be controlled with the nchunks
parameter. If left
unset, then the number of chunks is chosen heuristically. The chunked configs
may also be created from a normal instance using
faust_imaging.ImageConfig.duplicate_into_chunks()
and treated
individually for more customized processing. This creates an instance of
faust_imaging.ChunkedConfigSet
which both encapsulates the chunked
images and provides several helper methods.
# Initialize an image configuration instance with the desired properties.
full_config = ImageConfig(...)
# Create 4 chunks with properties inherited from the above, full instance.
# Note that if a ".sumwt" file does not exist, a dirty image will be
# made of a small field in order to calculate it first.
chunked_configs = full_config.duplicate_into_chunks(nchunks=4)
# Standard pipeline processing may now proceed on each chunked config.
for config in chunked_configs:
config.run_pipeline(ext='clean')
# Post-process each chunked individually but using information from
# all of the runs (such as for determining the common beam)
chunked_configs.postprocess(ext='clean')
By default, most image extensions (e.g., ‘.image’, etc.) are concatenated
by faust_imaging.ChunkedConfigSet.postprocess()
. Images may also
be explicitly concatenated with faust_imaging.ChunkedConfigSet.concat_cubes()
.
The pipeline procedures may also be run in different instances in CASA to
process parts of the image in parallel. To do so, simply ensure that the
same configuration options are applied in order to reproduce the equivalent
ImageConfig
instances, as below:
# In CASA instance 1, process chunks 0 and 1. Note that Python syntax for
# slicing (i.e., ":2") is inclusive on the lower limit but exclusive on
# the upper limit.
full_config = ImageConfig(...)
chunked_configs = full_config.duplicate_into_chunks(nchunks=4)
first_half = chunked_configs[:2]
for config in first_half:
config.run_pipeline(ext='clean')
# In CASA instance 2, process chunks 2 and 3. Ensure that the same
# configuration options and modifications are also applied here as well!
full_config = ImageConfig(...)
chunked_configs = full_config.duplicate_into_chunks(nchunks=4)
second_half = chunked_configs[2:]
for config in second_half:
config.run_pipeline(ext='clean')
# Now, in any CASA instance after the above two have finished running,
# merge the image products.
full_config = ImageConfig(...)
chunked_configs = full_config.duplicate_into_chunks(nchunks=4)
# Post-process each chunk separately and then concatenate the results.
chunked_configs.postprocess(ext='clean')
For specific example recipes, please refer to the Parallel CASA section.
Imaging Setup 3 SPWs with small chunk sizes
Due to the large field-of-view, small beam size, and large number of channels, imaging the Setup 3 SPWs poses a formidable data processing challenge (typical image dimensions of 3500x3500x1000). Individual Setup 3 cubes can also be about 10 to 40 times larger in file size (~40 to 200 GB for a single cube) which can cause significant issues in CASA usinng machines even with 500 GB RAM.
To effectively process these the Setup 3 SPWs, processing in small frequency interval “chunks” is required (see Chunking). Chunked images of only ~10 channels however may report biased image RMS values if emission is present over most channels (see Set Rms). Written below is a recipe for creating a few dirty cubes, calculating their RMS values, and then using that RMS value for all chunks.
# First calculate the frequency intervals of the chunks. For N2H+ with ~950
# channels and 100 chunks, a typical chunked image has 9 channels.
full_config = ImageConfig.from_name('CB68', '93.181GHz_N2Hp')
chunked_configs = full_config.duplicate_into_chunks(nchunks=100)
# Create dirty cubes and check the RMS values at the low-end, middle, and
# high-end of the band. The little helper function below is only for
# brevity. Note that the computing the RMS at the middle of the band is
# safe for N2H+ since the band center frequency is offset! This is not
# always the case.
def get_chunk_rms(config):
config.make_dirty_cube()
return config.rms
lo_rms = get_chunk_rms(chunked_configs[ 2])
md_rms = get_chunk_rms(chunked_configs[49])
hi_rms = get_chunk_rms(chunked_configs[97])
# Check the RMS values here to see that they are pretty similar, if within
# ~10%, then it is reasonable to simply set it for all chunks.
# Now remake the config and run all chunks. By setting the RMS of the
# prototype instance it is propagated to all of the duplicate instances.
full_config = ImageConfig.from_name('CB68', '93.181GHz_N2Hp')
full_config.rms = md_rms # or whatever
chunked_configs = full_config.duplicate_into_chunks(nchunks=100)
for config in chunked_configs:
config.run_pipeline()
chunked_configs.postprocess(ext='clean')
Ozone lines are present near several SPWs that raise the RMS values appreciably (>30%) close to the band edge (e.g., “231.221GHz_13CS” and “231.322GHz_N2Dp”). For these SPWs setting a uniform RMS is not appropriate. It is straight forward however to increase the RMS for certain chunks or even set the value for each chunk individually using an interpolation function.
# Set the top 15 chunks (85-99) to have double the RMS.
rms = 0.003
full_config = ImageConfig.from_name('CB68', '93.181GHz_N2Hp')
full_config.rms = rms
chunked_configs = full_config.duplicate_into_chunks(nchunks=100)
for config in chunked_configs:
if config.chunk.index > 84:
config.rms = 2 * rms
config.run_pipeline()
Restarting and manually cleaning a single chunk
Absorption and strong spatial filtering occassionally yield unsatisfactory
results using the default heuristics of the pipeline. To manually restart
and clean problematic chunks without re-running the full pipeline one can
deconvolve and post-process chunks individually. For example, one can retrieve
a specific image chunk by its channel number and then restart the deconvolution
using interactive masking (as described in the Restarting section).
The post-processing can be run individually for a chunk that is re-imaged
without repeating the post-processing steps for all of the other chunks using
the use_existing_except
keyword parameter in
faust_imaging.ChunkedConfigSet.postprocess()
. The original concatenated
products will then be replaced by new the ones containing the modified
chunk(s).
# Create the configuration instances in the same way as they were set in the
# original run. For CB68 in CS (5-4), the default is 4 chunks:
full_config = ImageConfig.from_name('CB68', '244.936GHz_CS')
chunked_configs = full_config.duplicate_into_chunks()
# After reviewing the full, concatenated cubes of the pipeline run, it
# is found that channel index number 255 (i.e., indexed from 0, which is
# also the convention of the `casaviewer`) is insufficiently masked. For
# a cube with 477 channels in 4 chunks, this corresponds to chunk index 2.
# The chunk indices are also reflected in the image file names ("_chunk2_").
# Here we retrieve the configuration containing the desired channel index:
config_with_issues = chunked_configs.get_chunk_from_channel(255)
# Clean the line interactively, restarting using the model and mask on disk.
config_with_issues.clean_line(ext='clean', interactive=True, restart=True,
sigma=3)
# Remake the products for chunk2, but use the existing products from the
# other chunks. Concatenate all results together into new final cubes.
chunk_index = config_with_issues.chunk.index
chunked_configs.postprocess(ext='clean', use_existing_except=[chunk_index])
# Alternatively, just re-make everything.
#chunked_configs.postprocess(ext='clean')
The QA plots and final concatenated image products will report channel indices
for the full cube and these differ from the channel indices of the chunks (as
they start again from 0). To inspect or retrieve the corresponding channel
indices from the full cube, one can access the faust_imaging.ChunkConfig
class instance faust_imaging.ImageConfig.chunk
to convert between
channels of the full cube and channels of the chunk.
# Continuing from the previous example but renaming for brevity's sake:
config = config_with_issues
# For reference, print the chunk index number:
print(config.chunk.index) # -> 2
# The channel index of interest is 255 in the full cube and corresponds
# to channel 17 in chunk 2. Note that these values are indexed from zero.
# One can calculate these values using the following methods:
print(config.chunk.convert_full_chan_to_chunk(255)) # -> 17
# Or do the reverse
print(config.chunk.convert_chunk_chan_to_full(17)) # -> 255
# A full list of which channels in the full cube the chunk channels
# correspond to can be found in:
print(config.chunk.fullcube_chan_indices) # -> [238, 239, ..., 356]
Parallelized computation with multiple CASA processes
By default the pipeline will chunk the images and process each chunk in serial. Because this proceeds quickly for chunks that are mostly noise and reduces the gridding overhead on chunks with emission, this alone reduces the run-time. Helper scripts/templates are provided however to improve the performance further by running multiple instances of CASA simultaneously to process the image chunks in parallel. Example scripts may be found in the “pipe_scripts” directory in the pipeline GitHub repository. The shell files “run_pipe.sh” and “qsub_run_pipe.sh” may be lightly modified for job name and resources requested. The CASA Python files are tailored to specific example usages, such as parallelizing over SPWs (one SPW per job) or parallelizing over the chunks of a single SPW (multiple jobs per SPW). Recipes are included for:
run_pipe_cb68_setup1_continuum.py
Chunk the Setup 1 continuum SPW into 40 chunks (the default) and process batches of chunks in parallel. This considerably improves run-time performance compared to processing each chunk sequentially.
run_pipe_cb68_cs.py
Chunk the CS (5-4) SPW into 50 chunks (the default is 4) and process the chunks in parallel. This is useful for quickly processing one window that is well-characterized by a single RMS (i.e., not those near telluric lines).
run_pipe_cb68_all_setup1_cont_parallel.py
Process all of the narrow-band SPWs serially by chunk but distribute the chunks of the continuum SPW among the CASA instances for parallel processing. This more efficiently balances the work-load among the CASA instances because the continuum SPW is substantially more computationally intensive to image. For 13 narrow-band SPWs and 8 CASA instances, each instance will process 1-2 narrow-band SPWs and ~5 chunks of the continuum SPW. This script is useful for processing a complete Setup in a straightforward way.
For correct usage with the given shell scripts, each Python script must
implement the top-level functions _preprocess()
, _run_subset(<INDEX>)
,
and _postprocess()
. If they are not required, the _preprocess
and
_postprocess
functions can simply execute a pass
instruction for a
no-op.
Personal machine
To run the pipeline in parallel on a user’s personal machine or an interactive
node on nmpost, first copy the “run_pipe.py” and “run_pipe.sh” files from the
GitHub repository and place them in the directory specified by PROD_DIR
(i.e., where you run CASA). The template files can be renamed to aid
organization, simply ensure that the name of the Python script (“run_pipe.py”)
matches what is referenced in the shell script (“run_pipe.sh”).
In the shell script, set the NBATCHES
variable for the number of CASA
processes/sessions used. From a limited number of experiments, dividing the
number of available CPU cores by 2-3 provides reasonably good CPU utilization.
The number of jobs set by this variable must be greater than or equal to the
number of chunks or targets set in the Python script (each job must have at
least one thing to process!).
Next in the Python script, modify the template function _get_config
for the
desired field, transition, and number of chunks. Global configuration settings
can also be set here by modifying the attributes of full_config
(such as
the faust_imaging.ImageConfig.rms
for example, or the auto-multithresh
parameters). Note that the _RUN_EXT
global variable may be changed to
distinguish different runs, but intermediate products such as the “nomask” and
“dirty” products will be over-written. For science cases that require per-chunk
configuration, program logic may be added to _run_subset
.
Finally, execute the shell script (./run_pipe.sh
, say). Files for both the
CASA log and the STDOUT terminal output will be created for each CASA process
(one can monitor progress in real time with tail -f file.log
).
If issues are discovered at certain frequencies/channels, such as divergences or insufficient masking, individual chunks may be restarted without re-running all of the others following the description in the Restarting One Chunk section.
Torque job submission
To run the pipeline in parallel using Torque on the NM or CV post-processing
computing clusters (“nmpost”), copy the “run_pipe.py” and “qsub_run_pipe.sh”
templates from the GitHub repository and place them in the PROD_DIR
directory. Change the global variables as described above. The number of CPUs
requested on the line:
#PBS -l nodes=1:ppn=16
should be more than the number of CASA instances/processes set by NBATCHES
.
Applying a uv-taper
Source “IRAS_15398-3359” happens to have higher-than-requested angular resolution. To match the requested resolution (0.32 arcsec), we can apply a \(1100\, \mathrm{k}\lambda\) uv-taper:
config = ImageConfig.from_name('IRAS_15398-3359', '244.936GHz_CS')
# Use a 1100 kilo-lambda (wavelength) circular taper.
config.uvtaper = ['1100klambda']
config.run_pipeline()