Skip to content

v0.3.0

Compare
Choose a tag to compare
@github-actions github-actions released this 23 Jun 22:14
730893f

Major Updates

  • Ark-Analysis now uses Python 3.7.
  • Added FlowSOM implementation and pipeline.

🚀 Features

remove external manual clustering @camisowers (#615)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #574.
Removes the example_manually_adjust_metaclusters notebook and the metaclustering folder from ark-analysis/data/example_dataset.

Run expensive per-FOV pixel clustering processes in parallel @alex-l-kong (#596)

What is the purpose of this PR?

Closes #532. There are many processes for pixel clustering that run on a one-per-FOV basis that can be optimized with parallelization.

How did you implement your changes

We target four functions:

  1. create_pixel_matrix
  2. run_pixel_som
  3. pixel_consensus_cluster
  4. apply_pixel_meta_cluster_remapping

1 and 4 are done in Python and rely on the multiprocessing module to generate separate processes in parallel. 2 and 3 are done in R, which provides a much cleaner solution using parallel, doParallel, and foreach.

Added Release Drafter @srivarra (#598)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #597.

Adds Release Drafter using Github Actions.

How did you implement your changes

Added:

  1. workflows/release-drafter.yaml: Uses Github Actions Workflows to build the release notes.
  2. release-drafter.yaml: Organizes the version resolver, and organizes labeled PRs into the following categories:
    1. Features
    2. Bug Fixes
    3. Maintenance
    4. Documentation

Remaining issues

We can add more categories as well using labels, and projects.

Remove intermediate `_clustered`/`_consensus` directories/files from pixel and cell clustering pipeline @alex-l-kong (#586)

What is the purpose of this PR?

Closes #571. There is no need to create a separate pixel_mat_clustered directory/cell_mat_clustered file as the data is just the channel expression/pixel cluster count expression values with an additional pixel_som_cluster/cell_som_cluster attached. Ditto for the pixel_mat_consensus directory/cell_mat_consensus file. These should be removed from the pixel and cell clustering processes.

How did you implement your changes

We should overwrite the original data with the newly created data in the cluster_pixels/pixel_consensus_cluster (and similarly for the cell clustering functions).

Remaining issues

There are a few extra checks we need to include for validation if the user decides to rerun the clustering pipeline in the same session.

Pix preprocessing @ngreenwald (#579)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Adds in functionality to normalize each channel of image data data separately prior to pixel clustering. This helps to make sure that markers which have different intensity values are treated equally right from the beginning of the clustering process.

In addition, it changes from removing pixels that have 0 total counts to removing pixels in the bottom 5% of total counts from the image. This better matches the format of the data following rosetta, where there are very few true zeros.

Remaining issues
The testing I put together is very basic. @alex-l-kong, if you could go in and double check that everything is working as intended, and if needed adding more testing, that would be great. Also feel free to change the organization/saving structure if you think it would be better some other way. I didn't add any new tests for create_pixel_matrix, that will likely need to be checked as well.

Added `sub_dir` option for `save_fov_images` @srivarra (#580)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #533.

Allows fov_image masks to be saved in a separate subfolder instead of at the top level.

How did you implement your changes

Added a new subdirectory parameter in save_fov_images. Added another test as well.

Remaining issues

None atm.

Custom channel smoothing @ngreenwald (#581)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Adds in functionality for different smoothing of individual channels in pixel clustering

How did you implement your changes
Provides the user a separate step to specify individual channels for increased smoothing, along with custom smoothing coefficients.

Also adds in an error check to create_pixel_matrix which identifies cases where the user provided the original channel name, but a modified channel name exists. This will be useful for other pre-processing steps we introduce which produce modified channels

Spatial lda @bcollica (#437)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Creates the directories and some files for the spatial-LDA pipeline and gives an initial example of code for preprocessing steps.

How did you implement your changes

Added a function for formatting a cell table to be compatible with the existing spatial_lda library. Also added a function for checking valid arguments along with a test.

Remaining issues

Fixing code format to be pep8 compliant so builds can pass.

Update FlowSOM pipeline with updated `MapDataToCodes` optional flag in `SOM` function @alex-l-kong (#570)

What is the purpose of this PR?

Closes #568. This addresses the issue caused by the current functionality of SOM in the FlowSOM library which causes MapDataToCodes to run by default.

How did you implement your changes

  1. We've created a fork of the main FlowSOM repo and added an optional flag map to the SOM function which allows us to prevent the call to MapDataToCodes. This parameter is also added to our SOM calls in create_pixel_som.R and create_cell_som.R.
  2. For the time being, we add an explicit command to overwrite the default FlowSOM installation with our forked FlowSOM package. This will be removed whenever the devs of the main FlowSOM repo review the PR I've opened addressing the map parameter.

Remaining issues

  1. It is possible there may be other memory-related issues we'll run into that will need further addressing.
  2. It is possible the main FlowSOM devs do not approve our PR. In that case, we'll have to continue piggybacking on the forked FlowSOM repo for the foreseeable future. That might be a good idea considering we may want to make more optimizations to the C code and what not.
Append cell meta cluster labels to cell table @alex-l-kong (#567)

What is the purpose of this PR?

Closes #551 and closes #523. The cell table is a requirement for several downstream analysis scripts, not just pixel clustering. To ensure analysis can be performed based on the cell meta cluster labels, these will need to be appended to the cell table as well.

How did you implement your changes

A function add_consensus_labels_cell_table will run this appending process, with the call in example_cell_clustering.ipynb placed at the end of the notebook. The original cell_table_name will be overwritten with the same data including the 'cell_meta_cluster_rename' column in cell_consensus_name (simply 'cell_meta_cluster' in the cell table).

Users will be able to save multiple times and can also re-run the cell SOM training with the newly-saved cell table if desired.

Additionally, we now allow more flexible locations for the cell table. Initially, it was forced to be in base_dir. Now, we default the location to segmentation_dir, with options to change the root. Note that the updated cell table will be saved in the same root directory as the original cell table.

Delete references to Docker locally rebuilding in `README.md` @alex-l-kong (#560)

What is the purpose of this PR?

There shouldn't be any more references to Docker needing to rebuild locally now that we have Dockerhub integration.

How did you implement your changes

This necessitates the removal of the line indicating that Docker would rebuild with the -u or --update flag pass to start_docker.sh.

Speed up generation of pixel cluster masks overlay @alex-l-kong (#546)

What is the purpose of this PR?

Closes #535. generate_pixel_cluster_masks in data_utils.py runs very slowly and needs to be optimized.

How did you implement your changes

The main issue is in the way the mask for each FOV is currently indexed. Currently, each FOV image is accessed in its original 2D format. Due to how 2D arrays are stored in memory, this leads to several cache misses that adds a significant bottleneck.

By flattening the array using .ravel and converting the coordinates so we can do 1D indexing, significant time is saved as 1D arrays are stored contiguously in memory. We get about a 3x speedup per FOV.

Remaining issues

If this implementation is still too slow, we should examine a different option. Unfortunately, the per-FOV iteration is a bottleneck we can't eliminate.

Remove extraneous output printed by `ConsensusClusterPlus` @alex-l-kong (#545)

What is the purpose of this PR?

Closes #524. The call to ConsensusClusterPlus currently prints out a lot of logging output that's confusing and takes up a lot of space. We don't need to display this to the user and should suppress it.

How did you implement your changes

These outputs are generated using the message function. To fix this, enclose each line containing ConsensusClusterPlus with the suppressMessages function to remove them from the printout.

This change is included in both pixel_consensus_cluster.R and cell_consensus_cluster.R.

Verify warnings @camisowers (#539)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #539
Adds optional warnings to verify functions, as well as the appropriate tests.

How did you implement your changes

Edited verify_in_list and verify_same_elements to have new argument warn that defaults to False if left unspecified. This should not change the existing implementation of these functions in the rest of the repo.
Also fixed a logic error in the error message for list one in verify_same_elements.

Update pixel clustering pipeline @cliu72 (#530)

What is the purpose of this PR?

Update pixel clustering pipeline

How did you implement your changes

  • Set the random seed for subsetting globally (instead for each fov)
  • Get 99.9% normalization values for each fov during preprocessing, average these values
  • Use these values to do 99.9% normalization before training and clustering
  • Read in fovs in batches during training to help with RAM issue

Remaining issues

Need to change test functions

Don't check contrast for skimage save function. @srivarra (#513)
  • add an ignore_warnings function for imageio.core.util
  • added reference

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #511. Removes the low contrast warnings for images.

How did you implement your changes

Manually added check_contrast=False to all instances of skimage.io.imsave in ark.

In addition fixed a small typo and added a line to the .gitignore to ignore
.python-version files (pyenv).

Reference

Remaining issues

None at the moment.

Add SOM implementation @alex-l-kong (#374)

What is the purpose of this PR?

Had to reopen this PR because the previous one's commit history got massively messed up.

Addresses and closes #325. Candace's pixel clustering algorithm needs to be integrated with ark-analysis now. This PR begins to add this functionality in, starting with a basic SOM implementation. We need to verify that this PR produces the same or at least very similar results.

How did you implement your changes

We piggyback off of MiniSOM (https://github.com/JustGlowing/minisom), a Python-based SOM implementation that should work for our purposes. It shouldn't be hard to translate most of the code into our project, but optimizing the process may be difficult.

Once we've verified our outputs, we then need to implement downsampling. A SOM would take a very long time to train on hundreds of thousands of pixel data from various images. We don't need all of these pixel entries.

We also need to implement

Remaining issues

Preprocessing takes a very long time. Unfortunately, it seems like apply_along_axis doesn't offer much of a speedup. Alternatives would be greatly welcomed!

Added option to ignore hidden files and directories @srivarra (#510)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #506. The purpose of this PR is to add an option for the user to ignore hidden directories in io_utils::list_folders and hidden files in io_utils::list_files.

How did you implement your changes

When ignore_hidden == True, the functions ignore all files and folders which start with a .. Defaults to True.

Remaining issues

None.

Improved ValueError statements for verify\_in\_list and verify\_same\_elements @srivarra (#509)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #494. The purpose of this PR is to improve the user readability of error messages in misc_utils::verify_in_list and misc_utils::verify_same_elements.

How did you implement your changes

Adjusted misc_utils::verify_in_list and misc_utils::verify_same_elements by allowing up to 10 invalid values to be displayed in the ValueError print statement. In addition created a new function misc_utils::create_invalid_data_str which returns a formatted string for printing these invalid values. There is also a test for this function.

Remaining issues

None.

Removed mibi directory @srivarra (#508)
  • deleted the mibi folder
  • adjusted templates_qc notebooks

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes angelolab/toffy#20. It removes the mibi directory as it is now in toffy.

How did you implement your changes

A detailed description of what modifications you made to the codebase. Please include details on what functions you needed to change, how you changed them, and why. If you added new functions, give a description of what they do, as well as any specific design decisions

Removed ark\mibi and the templates_qc folders along with their contents.

Remaining issues

None.

🐛 Bug Fixes

Allow segmentation pipeline to handle blank images @ngreenwald (#607)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Updates the segmentation pipeline to work with images without any cells.

How did you implement your changes
Check for number unique objects in segmentation mask. If there aren't any cells, skips the regionprops extraction and just returns an empty df. Also adds tests to prevent future reversions

Change default zip size @ngreenwald (#601)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Closes #587. Changes the default zip file size so that large cohorts don't cause an error.

How did you implement your changes
Changed default from 100 to 10

Remaining issues

Singleton list of dtype fix @srivarra (#603)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes angelolab/toffy#120. Fixes a dtype iteration issue.

How did you implement your changes

Added a check to allow str to become an iterable as [str]. Added several tests for the following objects: list, np.array, and the following types str, bool, int.

Remaining issues

None atm.

Remove `channel_norm.feather` and `pixel_norm.feather` from `cluster_pixels` verification @alex-l-kong (#595)

What is the purpose of this PR?

Because the channel_norm and pixel_norm files were changed from .json to .feather format in pixel_mat_dir, they now get read in along with the fov files when retrieving them for verification in this line:

data_files = io_utils.list_files(data_path, substrs='.feather')

In some cases, this can cause channel_norm.feather to be read in as a FOV file by mistake. This needs to be prevented.

How did you implement your changes

Add two .remove statements after the aforementioned line to prevent channel_norm.feather and pixel_norm.feather from interfering. Because these are hard-coded names set in create_pixel_matrix, the naming will not cause any issues.

Adds np array support to verify in list @ackagel (#584)

Addresses issue #583 . Also adds test case for array compliance. Closes #583, and closes #585.

`remove_file_extensions` is more robust to various fov naming schemes @srivarra (#577)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #537.

Allows more complex fov name parsing in remove_file_extensions.
Removes the extensions correctly for files named such as:

"fov.1" -> "fov.1"
"fov.2.tiff" -> "fov.2"
"fov3" -> "fov3"
"fov4.tar.gz" -> "fov4.tar"

How did you implement your changes

Added explicit checks for the following file extensions:

"tiff", "tif", "png", "jpg", "jpeg", "tar", "gz", "csv", "feather".

Added an extra test for more complicated fov filenames.

Remaining issues

Should we account for more file extensions?

Address metacluster GUI height and width issues @alex-l-kong (#565)

What is the purpose of this PR?

Closes #536 and closes #550. There are issues with the current hard-coded width and height settings for the visualization that cause the channel names to overlap and/ or the metacluster heatmap to become small and illegible. For now, we hard code new values. Depending on how we move forward, we may also add a programmatic way to compute these.

How did you implement your changes

The issue lies with:

width_ratios = [
    int(self.mcd.cluster_count / 7),
    self.mcd.cluster_count,
    self.mcd.metacluster_count,
]
height_ratios = [6, self.mcd.marker_count, 3, 3]
subplots = plt.subplots(
    4, 3,
    gridspec_kw={
    'width_ratios': width_ratios,
    'height_ratios': height_ratios},
    figsize=(self.width, 6),
)

We double the width of self.mcd.metacluster_count, double the height of self.mcd.marker_count, and double the height of the entire figure in the figsize param of plt.subplots.

Check subprocess call to R scripts for pixel/cell clustering process for memory errors @alex-l-kong (#547)

What is the purpose of this PR?

Related to #518. @ngreenwald has been running into an I/O error in cluster_pixels, which could be masking a memory-related error in prior function calls.

The way subprocess currently makes calls to the R scripts responsible for running the FlowSOM pipeline does not provide a way to check the return code. This led to several processes being killed due to memory-related issues without the user knowing of them. There needs to be a way for users to know if this has happened so they can make adjustments without strange errors being thrown later on.

How did you implement your changes

Because we're directly running the R script from the command line, we need to check the return code of the process. If the process is terminated due to memory errors, it will throw -9 (the universal code for SIGKILL, negated for subprocess return values).

For simplicities sake, we will assume that any non-zero return code is a memory-related error. In this case, we add a conditional to check for a non-zero return value, then throw a MemoryError if that is the case.

Remaining issues

There may still be underlying issues with Noah's particular dataset. However, this PR should at least clarify whether there is a memory-related issue first.

raise error if wrong channel is specified: load\_imgs\_from\_tree @srivarra (#543)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #517. Channels that are provided are silently skipped if they do not exist. If an fov name is provided instead of a list, the fov name is iterated on.

How did you implement your changes

Added a check to see if fovs is a string, if so make it a list of fovs.
In order to counter the silently skipping issue, used misc_utils.verify_same_elements on the input channels, and the channels found with respect to the input channels.

Remaining issues

None atm.

Put call to `update_notebooks.sh` back into `start_docker.sh` @alex-l-kong (#555)

What is the purpose of this PR?

When start_docker.sh was modified to handle multiple flag cases, the control statement to run update_notebooks.sh was removed. This needs to be added back in.

How did you implement your changes

See above.

Revert index back to 1 when getting shape of `pixel_cluster_mask` @alex-l-kong (#557)

What is the purpose of this PR?

Closes #556. There was an issue merging the indexing PR into master which reverted:

coordinates = x_coords * img_data.shape[1] + y_coords

to

coordinates = x_coords * img_data.shape[0] + y_coords

We should revert this back.

How did you implement your changes

See above.

Ensure interactive visualization can be run iteratively @alex-l-kong (#541)

What is the purpose of this PR?

Closes #520. Users may want to run the interactive visualization in the same session multiple times if they determine the original meta cluster remapping needs changes. Currently, there is no way to do this.

  1. If the user doesn't change the names of the meta clusters (or has a purely numeric renaming scheme), the overlay will pull extra columns and doesn't properly execute all the components correctly
  2. If the user changes the names of the meta clusters to string values, the visualization will throw an error midway due to currently assuming all metacluster values are integers

How did you implement your changes

Flexibility has been added to account for the additional rename problem which solves both issues 1 and 2 listed above:

  1. metacluster_from_files in file_reader.py now handles the renaming of the {pixel/cell}_meta_cluster_rename column to metacluster_rename. This provides a backbone that we can work off of in metaclusterdata.py.
  2. _metacluster_displaynames_map in MetaClusterData gets prepopulated to ensure that the axes labels contain the renamed meta clusters where applicable.
  3. Ensure the clusters_from_metaclusters property in MetaClusterData also takes into account the metacluster_rename column when reindexing with _marker_order. Previously, it was assumed that there would just be the metacluster column.
  4. Ensure the clusters property in MetaClusterData also drops the metacluster_rename column if necessary to prevent clashes (for example, with computing linkage_matrix).

Remaining issues

The visualization has not been tested on a variety of datasets as I have only 1 on hand.

Enforce stricter indexing of colormap used for pixel and cell cluster overlays @alex-l-kong (#542)

What is the purpose of this PR?

NOTE: this PR is high priority and should be done by Thursday at latest

The current scheme of defining the color bar for the pixel and cell cluster overlays (plot_pixel_cell_cluster_overlay) is not robust to heavy remapping which scrambles the indices of the colors significantly. In certain situations, it will cause the color map indices to become misaligned, mapping colors to wrong meta cluster names.

This should not happen even if the user does significant remapping and renaming.

How did you implement your changes

Fixing this requires explicitly setting an index value for each metacluster that corresponds to their position in mc_colors, which defines the colors to use. We can use the index defined in metacluster_id_to_name as a baseline.

Remaining issues

This issue has only been significantly tested on Marte's dataset. Please post an issue if a different indexing issue pops up for a different dataset.

Fixing Docker Startup Directory @srivarra (#529)
  • added scripts in --notebook-dir argument in Dockerfile
  • fixed jupyterlab starup directory
  • start_docker.sh formatting

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #526. When Jupyter Lab is opened, it will open the scripts directory now instead of the base directory.

How did you implement your changes

Had to reintroduce the parameter -e JUPYTER_DIR=$JUPYTER_DIR in the start_docker.sh file.

Remaining issues

None atm.

Patch channel file path creation for pixel cluster overlay @alex-l-kong (#514)

What is the purpose of this PR?

The order of the file path creation for channel files contained in FOV folder subdirectories got swapped due to merging issues. The line:

chan_file = os.path.join(
    fovs[0], img_sub_folder, io_utils.list_files(os.path.join(tiff_dir, img_sub_folder, fovs[0]), substrs=['.tif', '.tiff'])[0]
)

should be

chan_file = os.path.join(
    fovs[0], img_sub_folder, io_utils.list_files(os.path.join(tiff_dir, fovs[0], img_sub_folder, substrs=['.tif', '.tiff'])[0]
)

How did you implement your changes

See above

Added exact\_match to list\_folders @srivarra (#507)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

This PR closes #360. It adds an exact_match function parameter to the list_folders function. In addition it adds specific tests for
the case where exact_match is set to True.

How did you implement your changes

  1. The same case from list_files was used.
  2. Instances of using .sort() within assert statements were replaced with sorted(), as .sort() returns None.

Remaining issues

There could be more instances of using asserts with .sort()
instead of sorted().

For example:

a = [6,5,4,3]

assert a.sort() == [3,4,5,6]

Would not pass as a.sort() returns None.

However:

a = [6,5,4,3]

assert sorted(a) == [3,4,5,6]

Would pass as sorted() returns the sorted iterable.

🧰 Maintenance

Upgraded to Python 3.7 @srivarra (#592)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Updated the Python version of ark-analysis to 3.7.

How did you implement your changes

Had to adjust some Pandas and xarray indexing. Had to update the following packages:

Requirements

Package Python 3.6 Version Python 3.7 Version
jupyterlab 3.1.5 3.4.3
numpy 1.16.3 1.21.6
pandas 0.23.3 1.3.5
requests 2.25.1 2.27.6
scikit-image 1.14.3 1.16.2
scikit-learn 0.19.1 0.24.2
scipy 1.1.0 1.7.3
seaborn 0.10.1 0.11.2
statsmodels 0.11.1 0.13.2
tables 3.6.1 3.7.0
umap-learn 0.4.6 0.5.3
xarray 0.12.3 0.17.0
tqdm 4.54.1 4.64.0

Requirements Test

Package Python 3.6 Version Python 3.7 Version
attrs 20.3.0 21.4.0
coveralls 1.8.2 1.11.1
pytest 6.0.0 7.1.2
pytest-asyncio 0.15.1 0.18.1
six 1.13.0 1.16.0
testbook 0.2.3 0.4.2
coveralls 1.8.2 1.11.1

Travis CI

Added a line to install the latest versions of setuptools and importlib-metadata. Travis-CI Python 3.7 environments contain a very old version of importlib-metadata which is incompatible with the majority of libraries for 3.7+.

Remaining issues

[ ] Docker builds successfully

Updating other libraries

In order for ark-analysis to work with toffy, both toffy and mibi-bin-tools need a couple libraries to be updated.

Order of Updates:

  1. Merge #592 (this one)
  2. Merge angelolab/mibi-bin-tools#36
  3. Merge angelolab/toffy#132
Remove GoogleDrivePath Dead Code @srivarra (#528)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #512. Removes all references to GoogleDrivePath.

How did you implement your changes

All use cases of the Google Drive API in ark-analysis.

  • Google Python Packages in the requirements.txt and in setup.py.

    • data_utils.py
    • Imports GoogleDrivePath, drive_write_out, path_join
  • deepcell_service_utils.py

    • Imports GoogleDrivePath, drive_write_out, path_join, DriveOpen
  • google_drive_utils_test.py

    • Delete it
  • google_drive_utils.py

    • Delete it
  • io_utils

  • load_utils.py

  • segmentation_utils.py

    • drive_write_out and path_join
  • conf.py

    • Remove Google packages
  • index.rst

    • Remove GoogleDrive references
  • google_drive_usage.md

    • Remove file
  • Segment_Image_Data.ipynb

    • Uses deepcell_service_utils.py.
  • Remove Google API tokens

  • Successfully building the Docker Image

Remaining issues

None ATM.

📚️ Documentation

Update example\_pixel\_clustering.ipynb @ngreenwald (#608)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Updates the description of additional blurring to more clearly indicate the purpose. Closes #604

Added documentation for setting up development on M1 @srivarra (#594)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #593.

How did you implement your changes

Added instructions on getting ark-analysis and running on M1 for development.

Remaining issues

This runs under the Intel to Arm transition layer, as we update to Python 3.8, we might want to look at updating libraries to get a native Arm implementation.

Update contributing.md @ngreenwald (#582)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Updates the contributing docs. I noticed that this is now a bit out of date when I sent it to a new collaborator.

How did you implement your changes
Updated it to reflect our design doc workflow

adding windows external docs @srivarra (#564)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #562.

Adding an external drive on WSL is a bit ambiguous, so there is now documentation on how to specifically mount an external drive.

How did you implement your changes

Added brief documentation based on @ackagel 's issue.

Remaining issues

None.

Docker Image build instructions, removed rebuild logic @srivarra (#538)

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #516.

Placed instructions for building the Docker Image locally in docs/_rtd/development.md. Removed the logic which rebuilds the image if changes are detected in start_docker.sh.

How did you implement your changes

Added instructions based off Installation instructions in the README.md. Deleted the logic directly.

Remaining issues

None atm.

Documentation for how to run ark-analysis on Windows @alex-l-kong (#492)

What is the purpose of this PR?

Closes #474 and closes #475. Running ark-analysis on Windows requires several extra steps that are not totally intuitive. To provide some hand-holding, create a document on RTD specifically for Windows configuration.

How did you implement your changes

Add an RTD page for Windows configuration, which includes WSL, Git, Docker, and carriage-return fixing using dos2unix.

Remaining issues

If there are any additional issues that other users, including yourself, have encountered on Windows, please let me know.

@ackagel, @alex-l-kong, @bcollica, @camisowers, @cliu72, @ngreenwald and @srivarra