Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed TRANSREL-66. #56

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fixed TRANSREL-66. #56

wants to merge 1 commit into from

Conversation

xuatiknc
Copy link

Fix for https://jira.transmartfoundation.org/browse/TRANSREL-66

There is also an Oracle-related part in transmart-data repository.

@forus
Copy link

forus commented Oct 18, 2016

@xuatiknc Using temporal tables could be way to go.
Wouldn't using subqueries better fix for this issue?
Please see my comments here #49

@xuatiknc
Copy link
Author

xuatiknc commented Oct 18, 2016

@forus yes, if sets of ids, paths, and codes are already in the database then in theory there is no point in extracting the data and inserting it back into the database.

In practice it depends:

> select count(*) from qt_patient_set_collection;
1222512

The study I used for testing has a little over 1000 patients and >100K paths. Patient ids are extracted at line 91 in file ClinicalDataResourceService.groovy anyway. So if you use where ... patient_num in (select patient_num from qt_patient_set_collection where result_instance_id = 100) that's an extra full scan of a >1M rows table. The question is what is faster: a full scan of a >1M rows table or inserting ~1K rows into a temporary table and a full scan of a 100K rows temporary table? I do not know the answer, but my point is that the benefit of using temporary tables is that you will only work with the data that you actually need. You should not worry about millions of rows of legacy data that was accumulated over the years.

By the way shouldn't tables like qt_patient_set_collection be cleaned up somehow on the regular basis? That 1.2M number is the actual number of rows from one of the PROD instances running tranSMART version 1.2.4 . Are there any standard database clean-up procedures for tranSMART? I could not find this in the documentation, so I would appreciate any input.

@forus
Copy link

forus commented Dec 2, 2016

@xuatiknc Fair enough.
qt_patient_set_collection is newer cleaned automatically indeed. Although those patient sets records are one time use only, AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants