Skip to content

Commit

Permalink
Fix error in some cases Union find algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
animan01 committed Jun 28, 2020
1 parent 3adb611 commit 1adfd71
Show file tree
Hide file tree
Showing 4 changed files with 56 additions and 28 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,21 @@ Example of input data (based on the csv file):
```
ID,PARENT_ID,EMAIL,CARD,PHONE,TMP
1,NULL,email1,card1,phone1,
2,NULL,email2,card2,phone2,
2,NULL,email2,card1,phone2,
3,NULL,email3,card3,phone3,
4,NULL,email1,card2,phone4,
5,NULL,email5,card5,phone2,
6,NULL,email6,card6,phone6,
7,NULL,email3,card9,phone7,
8,NULL,email8,card10,phone8,
9,NULL,email9,card9,phone3,
10,NULL,email10,card10,phone10,
9,NULL,email9,card9,phone3,
10,NULL,email2,card10,phone10,
```

In the example of the element with **ID 10** it was associated with 2,8,4,1. Original duplicate 1. Brief visualization of dependencies:
- **ID1 => ID2 => ID10 => ID8**


Require
--
- php
Expand Down
9 changes: 6 additions & 3 deletions README_RU.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,20 @@ Problem/Motivation
```
ID,PARENT_ID,EMAIL,CARD,PHONE,TMP
1,NULL,email1,card1,phone1,
2,NULL,email2,card2,phone2,
2,NULL,email2,card1,phone2,
3,NULL,email3,card3,phone3,
4,NULL,email1,card2,phone4,
5,NULL,email5,card5,phone2,
6,NULL,email6,card6,phone6,
7,NULL,email3,card9,phone7,
8,NULL,email8,card10,phone8,
9,NULL,email9,card9,phone3,
10,NULL,email10,card10,phone10,
9,NULL,email9,card9,phone3,
10,NULL,email2,card10,phone10,
```

На примере элемента с **ID 10** его связали с 2,8,4,1. Оригинальный дубликат 1. Краткая визуализация зависимостей:
- **ID1 => ID2 => ID10 => ID8**

Require
--
- php
Expand Down
12 changes: 9 additions & 3 deletions README_UA.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,23 @@ Problem/Motivation
```
ID,PARENT_ID,EMAIL,CARD,PHONE,TMP
1,NULL,email1,card1,phone1,
2,NULL,email2,card2,phone2,
2,NULL,email2,card1,phone2,
3,NULL,email3,card3,phone3,
4,NULL,email1,card2,phone4,
5,NULL,email5,card5,phone2,
6,NULL,email6,card6,phone6,
7,NULL,email3,card9,phone7,
8,NULL,email8,card10,phone8,
9,NULL,email9,card9,phone3,
10,NULL,email10,card10,phone10,
9,NULL,email9,card9,phone3,
10,NULL,email2,card10,phone10,
```

На прикладі елементу з **ID 10** його зв'язали з 2,8,4,1. Оригінальний дублікат 1. Коротка візуалізація залежностей:
ID1 > ID2> ID10 > ID8
- **ID1 => ID2 => ID10 => ID8**



Require
--
- php
Expand Down
53 changes: 34 additions & 19 deletions index.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,23 @@
*/

// Define constants.
define('FIELDS', ['EMAIL', 'CARD', 'PHONE']);
define('DUPLICATES_FIELDS', ['EMAIL', 'CARD', 'PHONE']);

// Default example data.
$csv = 'ID,PARENT_ID,EMAIL,CARD,PHONE,TMP
1,NULL,email1,card1,phone1,
2,NULL,email2,card1,phone2,
3,NULL,email3,card3,phone3,
4,NULL,email1,card2,phone4,
4,NULL,email1,card2,phone4,
5,NULL,email5,card5,phone2,
6,NULL,email6,card6,phone6,
7,NULL,email3,card9,phone7,
8,NULL,email8,card10,phone8,
9,NULL,email9,card9,phone3,
10,NULL,email10,card10,phone10,';
9,NULL,email9,card9,phone3,
10,NULL,email2,card10,phone10,';

$rows = explode(PHP_EOL, $csv);
$fields_array = [];

// Prepare array data.
foreach ($rows as $key => $row) {
Expand All @@ -41,6 +42,7 @@
$csv_string = 'ID,PARENT_ID' . PHP_EOL;

$mapping_fields = [];
$grouping_key = [];

// Find duplicates and save to mapping.
foreach ($fields_array as $key => $array) {
Expand All @@ -51,32 +53,40 @@
}

// Set default value for each iteration.
$group = NULL;
$group = $group_key = NULL;
$group_to_merge = [];

// Grouping by fields.
foreach (FIELDS as $field) {
foreach (DUPLICATES_FIELDS as $field) {
$field_value = $array[$field];
if (array_key_exists($array[$field], $mapping_fields)) {
$group = $mapping_fields[$field_value];
$group_to_merge[] = $group;
$group_key = $mapping_fields[$field_value];
$group_to_merge[] = $group_key;
}
}

// Setting minimal group if have more one group ID.
if (count($group_to_merge) > 1) {
$group = min($group_to_merge);
// Setting group if do not have any duplicates.
if ($group_key === NULL) {
$grouping_key[] = $array['ID'];
$group_key = array_search($array['ID'], $grouping_key);
}
$group = $grouping_key[$group_key];

// Setting group if do not have any duplicates.
if ($group === NULL) {
$group = $array['ID'];
// Setting minimal group if have more one group ID.
if (count($group_to_merge) > 1) {
for ($i = 0; $i < count($group_to_merge); $i++) {
$merging_array[] = $grouping_key[$group_to_merge[$i]];
}
if (!empty($merging_array)) {
$group = min($merging_array);
$group_key = array_search($group, $grouping_key);
}
}

// Save fields to mapping.
$mapping_fields[$array['EMAIL']] = $group;
$mapping_fields[$array['CARD']] = $group;
$mapping_fields[$array['PHONE']] = $group;
$mapping_fields[$array['EMAIL']] = $group_key;
$mapping_fields[$array['CARD']] = $group_key;
$mapping_fields[$array['PHONE']] = $group_key;

}

Expand All @@ -85,8 +95,13 @@
if ($key === 0) {
continue;
}
// Searching PARENT_ID by email field. May be any field (like: CARD, PHONE).
$fields_array[$key]['PARENT_ID'] = $mapping_fields[$array['EMAIL']];

$parent_ids = NULL;
// Searching PARENT_ID by fields.
foreach (DUPLICATES_FIELDS as $field) {
$parent_ids[] = $grouping_key[$mapping_fields[$array[$field]]];
}
$fields_array[$key]['PARENT_ID'] = min($parent_ids);

// Prepare data from csv.
if ($key !== 0) {
Expand Down

0 comments on commit 1adfd71

Please sign in to comment.