Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Legacy DB] Reduce the size of pattern_matching_rois table #25

Open
2 tasks
Tracked by #2
rassokhina-e opened this issue Sep 30, 2024 · 1 comment
Open
2 tasks
Tracked by #2

[Legacy DB] Reduce the size of pattern_matching_rois table #25

rassokhina-e opened this issue Sep 30, 2024 · 1 comment
Assignees

Comments

@rassokhina-e
Copy link
Contributor

rassokhina-e commented Sep 30, 2024

  • create new pattern_matching_rois_new table, see attached script
  • update lambda logic to put correct data to legacy db

Ant feedback:

FYI, there are a lot of data type changes in this new table to reduce the table size.

create table arbimon2.pattern_matching_rois_new (
    pattern_matching_roi_id   int unsigned auto_increment primary key,
    pattern_matching_id       mediumint unsigned not null,
    recording_id              int unsigned not null,
    species_id                mediumint unsigned not null,
    songtype_id               tinyint unsigned not null,
    x1                        float not null,
    y1                        smallint unsigned not null,
    x2                        float not null,
    y2                        smallint unsigned not null,
    uri_param1                tinyint unsigned not null,
    uri_param2                tinyint unsigned null,
    score                     float null,
    validated                 tinyint(1) null,
    cs_val_present            tinyint default 0 not null,
    cs_val_not_present        tinyint default 0 not null,
    consensus_validated       tinyint(1) null,
    expert_validated          tinyint(1) null,
    expert_validation_user_id smallint unsigned null,
    denorm_site_id            mediumint unsigned null,
    denorm_recording_datetime datetime null,
    denorm_recording_date     date null,
    constraint fk_pattern_matching_matches_1
        foreign key (pattern_matching_id) references arbimon2.pattern_matchings (pattern_matching_id),
    constraint fk_pattern_matching_matches_2
        foreign key (recording_id) references arbimon2.recordings (recording_id) on delete cascade,
    constraint fk_pattern_matching_matches_3
        foreign key (species_id) references arbimon2.species (species_id),
    constraint fk_pattern_matching_matches_4
        foreign key (songtype_id) references arbimon2.songtypes (songtype_id),
    constraint fk_pattern_matching_rois_1
        foreign key (denorm_site_id) references arbimon2.sites (site_id)
);

insert into pattern_matching_rois_new
select pattern_matching_roi_id, pattern_matching_id, recording_id, species_id, songtype_id, x1, y1, x2, y2, uri,
       cast(substring_index(substring_index(uri, '/', -1), '_', 1) as unsigned) uri_param1,
       case locate('_', substring_index(uri, '/', -1)) when 0 then null else cast(substring_index(substring_index(uri, '_', -1), '.', 1) as unsigned) end uri_param2,
       score, validated, cs_val_present, cs_val_not_present, consensus_validated, expert_validated, expert_validation_user_id,
       denorm_site_id, denorm_recording_datetime, denorm_recording_date
from pattern_matching_rois;
create table arbimon2.pattern_matching_rois_new (
    pattern_matching_roi_id   int unsigned auto_increment primary key,
    pattern_matching_id       int not null,
    recording_id              bigint unsigned not null,
    species_id                int not null,
    songtype_id               int not null,
    x1                        float not null,
    y1                        smallint unsigned not null,
    x2                        float not null,
    y2                        smallint unsigned not null,
    uri_param1                mediumint unsigned not null,
    uri_param2                tinyint unsigned null,
    score                     float null,
    validated                 tinyint(1) null,
    cs_val_present            tinyint default 0 not null,
    cs_val_not_present        tinyint default 0 not null,
    consensus_validated       tinyint(1) null,
    expert_validated          tinyint(1) null,
    expert_validation_user_id smallint unsigned null,
    denorm_site_id            int unsigned null,
    denorm_recording_datetime datetime null,
    denorm_recording_date     date null,
    constraint fk_pattern_matching_rois_2
        foreign key (pattern_matching_id) references arbimon2.pattern_matchings (pattern_matching_id),
    constraint fk_pattern_matching_rois_3
        foreign key (recording_id) references arbimon2.recordings (recording_id) on delete cascade,
    constraint fk_pattern_matching_rois_4
        foreign key (species_id) references arbimon2.species (species_id),
    constraint fk_pattern_matching_rois_5
        foreign key (songtype_id) references arbimon2.songtypes (songtype_id),
    constraint fk_pattern_matching_rois_6
        foreign key (denorm_site_id) references arbimon2.sites (site_id)
);
insert into pattern_matching_rois_new
select pattern_matching_roi_id, pattern_matching_id, recording_id, species_id, songtype_id, x1, y1, x2, y2,
       case locate('_', substring_index(uri, '/', -1)) when 0 then cast(substring_index(substring_index(uri, '/', -1), '.', 1) as unsigned) else cast(substring_index(substring_index(uri, '/', -1), '_', 1) as unsigned) end uri_param1,
       case locate('_', substring_index(uri, '/', -1)) when 0 then null else cast(substring_index(substring_index(uri, '_', -1), '.', 1) as unsigned) end uri_param2,
       score, validated, cs_val_present, cs_val_not_present, consensus_validated, expert_validated, expert_validation_user_id,
       denorm_site_id, denorm_recording_datetime, denorm_recording_date
from pattern_matching_rois;

@rassokhina-e rassokhina-e changed the title Reduce the size of legacy db [Legacy DB] Reduce the size of pattern_matching_rois table Sep 30, 2024
@rassokhina-e rassokhina-e transferred this issue from rfcx/arbimon-legacy Oct 1, 2024
@rassokhina-e
Copy link
Contributor Author

rassokhina-e commented Oct 4, 2024

Frongs:

here is the plan for pattern_matching_rois table

  1. investigate all the indexes, and note on which ones that we are using (we will need it after moving all the data)
  2. we will test removing indexes in test db to check if the size matter that much
  3. if the size is reduced pretty much then we will try Ant’s sql script to move data into new table.
  4. if it is still not work, we have to come up with the script that cloning the data in batches.

all indexes:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants