-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from smaht-dac/rclone-support-fresh-20240423
Mostly rclone related work
- Loading branch information
Showing
120 changed files
with
258,226 additions
and
4,034 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Build for submitr | ||
|
||
name: INTEGRATION TESTS | ||
|
||
# Controls when the action will run. | ||
on: | ||
# Triggers the workflow on push or pull request events but only for the master branch | ||
push: | ||
branches: [ master ] | ||
pull_request: | ||
branches: [ master ] | ||
|
||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
# A workflow run is made up of one or more jobs that can run sequentially or in parallel | ||
jobs: | ||
# This workflow contains a single job called "build" | ||
build: | ||
name: TEST INTEGRATION WITH PYTHON ${{ matrix.python_version }} | ||
|
||
# The type of runner that the job will run on | ||
runs-on: ubuntu-22.04 | ||
strategy: | ||
matrix: | ||
python_version: [3.11] | ||
|
||
# Steps represent a sequence of tasks that will be executed as part of the job | ||
steps: | ||
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v3 | ||
with: | ||
python-version: ${{ matrix.python_version }} | ||
|
||
- name: BUILD | ||
run: | | ||
make build | ||
# The integration tests actually talk to AWS S3 and Google Cloud Storage (GCS); | ||
# both directly (via Python boto3 and google.cloud.storage) and via rclone. | ||
# The access credentials are defined by the environment variables described below. | ||
- name: INTEGRATION TESTS | ||
env: | ||
# These are setup in GitHub as "secrets". The AWS access key values are currently, | ||
# May 2024, for the special user test-integration-user in the smaht-wolf account; | ||
# the access key was created on 2024-05-15. The Google value is the JSON from the | ||
# service account file exported from the HMS Google account for the smaht-dac project; | ||
# the service account email is ga4-service-account@smaht-dac.iam.gserviceaccount.com; | ||
# its key ID is b488dd9cfde6b59b1aa347aabd9add86c7ff9057; it was created on 2024-04-28. | ||
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }} | ||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
GOOGLE_CLOUD_SERVICE_ACCOUNT_JSON: ${{ secrets.GOOGLE_CLOUD_SERVICE_ACCOUNT_JSON }} | ||
run: | | ||
make test-integration |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -145,3 +145,7 @@ docs/build | |
|
||
# Vim | ||
*.swp | ||
|
||
# Junk Python files. | ||
?.py | ||
.tmp/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
Notes on demo file (bcm_formatted_hapmapmix.xlsx) for Annual Meeting in St. Louis, June 2024 | ||
|
||
- Some commands: | ||
|
||
- Submit metadata: | ||
> submit-metadata-bundle --env smaht-local --submit --directory files bcm_formatted_hapmapmix.xlsx | ||
|
||
- Validate metadata: | ||
> submit-metadata-bundle --env smaht-local --validate --directory files bcm_formatted_hapmapmix.xlsx | ||
|
||
- Submit metadata with rclone support to upload (transfer) to S3 from Google (GCS) if a file for upload is there: | ||
> submit-metadata-bundle --env smaht-local --submit --directory files \ | ||
--rclone-google-source smaht-submitr-rclone-testing/demo \ | ||
--rclone-google-credentials ~/.config/google-cloud/smaht-dac-617e0480d8e2.json bcm_formatted_hapmapmix.xlsx | ||
> Note these files are currently (2024-05-31) in GCS: | ||
- gs://smaht-submitr-rclone-testing/demo/222TWJLT4-1-IDUDI0056v2_S2_L001_R2_001.fastq.gz | ||
- gs://smaht-submitr-rclone-testing/demo/222TWJLT4-1-IDUDI0055v2_S1_L001_R2_001.fastq.gz | ||
|
||
- Resume upload: | ||
> resume-uploads --env smaht-local --directory files <submission-uuid-or-upload-file-uuid-or-accession> | ||
|
||
- Get info (only - no submit or validate) related to metadata file: | ||
> submit-metadata-bundle --env smaht-local --info --refs --files --directory files bcm_formatted_hapmapmix.xlsx | ||
|
||
- Dump metadata (only - no submit or validate) as JSON: | ||
> submit-metadata-bundle --env smaht-local --json-only bcm_formatted_hapmapmix.xlsx | ||
|
||
- View known submission-centers/consortia: | ||
> submit-metadata-bundle --env smaht-local --submission-centers --consortia | ||
|
||
- List recent submissions (add --mine to see only yours): | ||
> list-submissions --env smaht-local | ||
|
||
- Get info on submission - with optional continue on to submission if the submission-uuid is for a validation: | ||
> check-submission --env smaht-local <submission-uuid> | ||
|
||
- Download latest HMS metadata template: | ||
> get-metadata-template <file-name-with-dot-xlsx-suffix> | ||
|
||
- View arbitrary portal object (for troubleshooting) | ||
> view-portal-object --env smaht-local <uuid-or-object-path> | ||
|
||
- Use rclone to copy smaht-local file to Google (for testing/troubleshooting): | ||
> rcloner copy <your-file> gs://smaht-submitr-rclone-testing/demo -gcs ~/.config/google-cloud/smaht-dac-617e0480d8e2.json | ||
|
||
- Use rclone to copy file from Google to local current directory (for testing/troubleshooting): | ||
> rcloner copy gs://smaht-submitr-rclone-testing/demo/<your-file> . -gcs ~/.config/google-cloud/smaht-dac-617e0480d8e2.json | ||
|
||
- Use rclone to get info about a file in Google (for testing/troubleshooting): | ||
> rcloner info gs://smaht-submitr-rclone-testing/demo/<some-file> . -gcs ~/.config/google-cloud/smaht-dac-617e0480d8e2.json | ||
|
||
- File bcm_formatted_hapmapmix.xlsx from William Feng on 2024-05-21 | ||
https://docs.google.com/spreadsheets/d/1qCm0bY-vG4a9uiaOvmKHZ12MvhmMKKRfEpgAm-7Hsh4/edit#gid=1645623888 | ||
https://hms-dbmi.slack.com/archives/D05LSGRQYV7/p1716239277185859 | ||
|
||
- Made some minor corrections to this spreadsheet locally | ||
- Removed blank row #3 in Sequencing sheet | ||
- Change values of target_read_length in Sequencing tab from '25-30 kb' and '15-20 kb' to 27500 and 17500 | ||
- Changed all submission-center prefixes in submitted_id values to be DAC (previously mixture of BCM, MAYO, WASHU, USWC) | ||
|
||
- Dependencies for this spreadsheet; in smaht-portal/src/encoded/tests/data/demo_inserts; | ||
also in dependencies directory here; manually load/upsert these with create-dependencies.sh. | ||
/Assay/bulk_rna_seq | ||
/Assay/bulk_wgs_pcr_free | ||
/FileFormat/bam | ||
/FileFormat/bam_bai | ||
/FileFormat/bam_pbi | ||
/FileFormat/fastq_gz | ||
/Sequencer/illumina_novaseq_6000 | ||
/Sequencer/illumina_novaseq_x | ||
/Sequencer/ont_promethion_24 | ||
/Sequencer/pacbio_revio_hifi | ||
/Sequencing/BCM_SEQUENCING_NOVASEQX-400X | ||
/Sequencing/BCM_SEQUENCING_ONT-100X | ||
/Sequencing/BCM_SEQUENCING_PACBIO-100X | ||
/Software/BCM_SOFTWARE_BCL2FASTQ2 | ||
/Software/BCM_SOFTWARE_DORADO | ||
/Software/BCM_SOFTWARE_LIMA | ||
/Software/BCM_SOFTWARE_MINKNOW | ||
/Software/BCM_SOFTWARE_REVIO-ICS | ||
/Software/BCM_SOFTWARE_SAMTOOLS | ||
/Software/BCM_SOFTWARE_SMRTLINK | ||
/SubmissionCenter/bcm_gcc | ||
/SubmissionCenter/mayo_tdd | ||
/SubmissionCenter/washu_gcc | ||
|
||
- Referenced files to upload in the files directory here (these have dummy/random content): | ||
222TWJLT4-1-IDUDI0055V2_S1_L001_R1_001.FASTQ.GZ | ||
222TWJLT4-1-IDUDI0056V2_S2_L001_R1_001.FASTQ.GZ | ||
222TWJLT4-1-IDUDI0057_S3_L001_R1_001.FASTQ.GZ | ||
222TWJLT4-1-IDUDI0055V2_S1_L001_R2_001.FASTQ.GZ | ||
222TWJLT4-1-IDUDI0056V2_S2_L001_R2_001.FASTQ.GZ | ||
> THIS ONE IS CURRENTLY (2024-05-29) in GCS: | ||
> gs://smaht-submitr-rclone-testing/demo/222TWJLT4-1-IDUDI0056v2_S2_L001_R2_001.fastq.gz |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
update-portal-object --env smaht-local --upsert dependencies |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
[ | ||
{ | ||
"code": "101", | ||
"title": "RNA-Seq", | ||
"status": "released", | ||
"accession": "SMAASDZA8VHK", | ||
"consortia": [ | ||
"358aed10-9b9d-4e26-ab84-4bd162da182b" | ||
], | ||
"identifier": "bulk_rna_seq", | ||
"submission_centers": [ | ||
"9626d82e-8110-4213-ac75-0a50adf890ff" | ||
], | ||
"uuid": "beb12f96-624b-4fb8-afd5-8c637f5c0b97" | ||
}, | ||
{ | ||
"code": "001", | ||
"title": "WGS, PCR-free", | ||
"status": "released", | ||
"accession": "SMAASOMSCCDC", | ||
"consortia": [ | ||
"358aed10-9b9d-4e26-ab84-4bd162da182b" | ||
], | ||
"identifier": "bulk_wgs_pcr_free", | ||
"submission_centers": [ | ||
"9626d82e-8110-4213-ac75-0a50adf890ff" | ||
], | ||
"uuid": "87fbe483-31c6-4ff7-8abd-043d185150af" | ||
} | ||
] |
Oops, something went wrong.