Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIP Changes - Draft PR #983

Merged
merged 80 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
4f25d72
adding features
sania-16 Dec 4, 2024
da9b54b
Merge branch 'zinggAI:main' into workingdocs
sania-16 Dec 5, 2024
a111a3d
Merge branch 'main' of https://github.com/sania-16/zingg into working
sania-16 Dec 5, 2024
9a25242
Merge branch 'zinggAI:main' into workingdocs
sania-16 Dec 6, 2024
b3b297a
Merge branch 'main' of https://github.com/sania-16/zingg into working
sania-16 Dec 6, 2024
aba0c96
match type changes
sania-16 Dec 8, 2024
afdb198
refactoring
sania-16 Dec 8, 2024
ae76694
code refactoring
sania-16 Dec 9, 2024
4a64918
working changes
sania-16 Dec 10, 2024
7ad79c4
added check before setting checkpoint directory
Nitish1814 Dec 11, 2024
bc85251
working tests
sania-16 Dec 11, 2024
b9e72f2
fixing junits
sania-16 Dec 12, 2024
c192629
fixing junits
sania-16 Dec 12, 2024
9309e37
refactoring
sania-16 Dec 12, 2024
18452e3
Create Sample
Arjun-Zingg Dec 13, 2024
ef4e2db
Add files via upload
Arjun-Zingg Dec 13, 2024
f2a2625
Delete examples/Fabric/Sample
Arjun-Zingg Dec 13, 2024
2c923d2
fabric
Arjun-Zingg Dec 13, 2024
a357bf5
Delete examples/fabric
Arjun-Zingg Dec 13, 2024
88d7a68
Create fabric
Arjun-Zingg Dec 13, 2024
6d35db4
Delete examples/Fabric directory
Arjun-Zingg Dec 13, 2024
23b0be5
Add files via upload
Arjun-Zingg Dec 13, 2024
ca4c7e7
Merge pull request #988 from zinggAI/Arjun-Zingg-patch-1
sonalgoyal Dec 13, 2024
dcc7a91
refactoring changes
sania-16 Dec 13, 2024
10e9bae
Merge branch 'zinggAI:main' into working
sania-16 Dec 15, 2024
059e94e
Merge branch 'zinggAI:main' into workingdocs
sania-16 Dec 15, 2024
2f57793
test changes
sania-16 Dec 16, 2024
e6726ba
Merge branch 'working' of https://github.com/sania-16/zingg into working
sania-16 Dec 16, 2024
d8fc1f6
Update perfTestRunner.py
Arjun-Zingg Dec 16, 2024
9859f0e
Update load-test.yml
Arjun-Zingg Dec 16, 2024
8c11982
Update perfTestRunner.py
Arjun-Zingg Dec 16, 2024
e09c74b
Update perfTestRunner.py
Arjun-Zingg Dec 16, 2024
195340e
report generated
Dec 16, 2024
2425639
Update perfTestRunner.py
Arjun-Zingg Dec 16, 2024
f11f77b
Update load-test.yml
Arjun-Zingg Dec 16, 2024
bb70609
refactoring
sania-16 Dec 17, 2024
176804a
Merge pull request #989 from zinggAI/load_test
sonalgoyal Dec 17, 2024
1f2d833
Merge pull request #986 from Nitish1814/fabric-fix
sonalgoyal Dec 17, 2024
b92f07d
report generated
Dec 19, 2024
11a5fcc
working changes
sania-16 Dec 20, 2024
f224752
working changes
sania-16 Dec 20, 2024
5eb9598
report generated
Dec 22, 2024
f3bcc46
Merge branch 'zinggAI:main' into working
sania-16 Dec 24, 2024
07d1ac5
Merge branch 'zinggAI:main' into workingdocs
sania-16 Dec 24, 2024
3c548db
report generated
Dec 25, 2024
af7bde3
Merge branch 'zinggAI:main' into workingdocs
sania-16 Dec 27, 2024
6ecf80b
code clean
sania-16 Dec 27, 2024
159f428
Merge branch 'working' of https://github.com/sania-16/zingg into working
sania-16 Dec 27, 2024
a458d8a
Merge branch 'zinggAI:main' into working
sania-16 Dec 27, 2024
4e6d947
report generated
Dec 28, 2024
647390e
code cleanup
sania-16 Dec 28, 2024
2f9b57c
Merge branch 'working' of https://github.com/sania-16/zingg into working
sania-16 Dec 28, 2024
013eeeb
Merge pull request #991 from sania-16/workingdocs
sonalgoyal Dec 29, 2024
328d414
Merge branch 'zinggAI:main' into working
sania-16 Dec 30, 2024
b1e73a0
report generated
Dec 31, 2024
bfffee2
refactoring code
sania-16 Dec 31, 2024
d387f7b
Merge branch 'working' of https://github.com/sania-16/zingg into working
sania-16 Dec 31, 2024
d75daca
report generated
Jan 1, 2025
fb0f503
changes for preprocess
sania-16 Jan 2, 2025
d342f48
working changes
sania-16 Jan 2, 2025
22368da
preprocessor changes
sania-16 Jan 3, 2025
c69adbe
blocking tree changes (#924)
Nitish1814 Jan 3, 2025
a7837b3
Merge branch 'zinggAI:main' into working
sania-16 Jan 3, 2025
77fdb33
Ftd optimization (#994)
Nitish1814 Jan 3, 2025
cd50ead
report generated
Jan 4, 2025
58846a1
fix telemetry with wrong metric names
sonalgoyal Jan 5, 2025
76a7a0d
Merge branch 'zinggAI:main' into working
sania-16 Jan 6, 2025
fb45d94
fixing junits
sania-16 Jan 6, 2025
b09b31b
refactoring junits
sania-16 Jan 6, 2025
3202412
fixing junits
sania-16 Jan 6, 2025
31d7580
report generated
Jan 7, 2025
434e551
working changes
sania-16 Jan 8, 2025
48e1464
GITBOOK-2: No subject
sonalgoyal Jan 8, 2025
c922b09
GITBOOK-4: No subject
sonalgoyal Jan 8, 2025
5ee5de6
Merge branch 'zinggAI:main' into working
sania-16 Jan 8, 2025
0753628
refactoring
sania-16 Jan 8, 2025
276e4d3
Merge branch 'working' of https://github.com/sania-16/zingg into working
sania-16 Jan 8, 2025
8604da8
updating stopword docs
sania-16 Jan 9, 2025
58bac3a
code cleanup
sania-16 Jan 9, 2025
4f3b065
stopwords junit
sania-16 Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 26 additions & 6 deletions assembly/dependency-reduced-pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,32 @@
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-inline</artifactId>
<version>5.2.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>5.2.0</version>
<scope>test</scope>
<exclusions>
<exclusion>
<artifactId>byte-buddy</artifactId>
<groupId>net.bytebuddy</groupId>
</exclusion>
<exclusion>
<artifactId>byte-buddy-agent</artifactId>
<groupId>net.bytebuddy</groupId>
</exclusion>
<exclusion>
<artifactId>objenesis</artifactId>
<groupId>org.objenesis</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
Expand Down Expand Up @@ -113,12 +139,6 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.8.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ public void setLabelDataSampleSize(float labelDataSampleSize) throws ZinggClient
*/
@Override
public List<? extends FieldDefinition> getFieldDefinition() {
return fieldDefinition;
return this.fieldDefinition;
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ public class FieldDefUtil implements Serializable{

public List<? extends FieldDefinition> getFieldDefinitionDontUse(List<? extends FieldDefinition> fieldDefinition) {
return fieldDefinition.stream()
.filter(x->x.matchType.contains(MatchType.DONT_USE))
.filter(x->x.matchType.contains(MatchTypes.DONT_USE))
.collect(Collectors.toList());
}

public List<? extends FieldDefinition> getFieldDefinitionToUse(List<? extends FieldDefinition> fieldDefinition) {
return fieldDefinition.stream()
.filter(x->!x.matchType.contains(MatchType.DONT_USE))
.filter(x->!x.matchType.contains(MatchTypes.DONT_USE))
.collect(Collectors.toList());
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,15 @@
* @author sgoyal
*
*/
public class FieldDefinition implements Named,
Serializable {
public class FieldDefinition implements Named, Serializable {

private static final long serialVersionUID = 1L;

public static final Log LOG = LogFactory.getLog(FieldDefinition.class);

@JsonDeserialize(using = MatchTypeDeserializer.class)
@JsonSerialize(using = MatchTypeSerializer.class)
public List<MatchType> matchType;
public List<? extends IMatchType> matchType;

//@JsonSerialize(using = DataTypeSerializer.class)
public String dataType;
Expand All @@ -52,17 +51,21 @@
public FieldDefinition() {
}

public String getFields() { return fields; }
public String getFields() {
return fields;
}

public void setFields(String fields) { this.fields = fields;}
public void setFields(String fields) {
this.fields = fields;
}

/**
* Get the field type of the class
*
* @return the type
*/
public List<MatchType> getMatchType() {
return matchType;
public List<? extends IMatchType> getMatchType() {
return this.matchType;
}

/**
Expand All @@ -73,7 +76,7 @@
* the type to set
*/
@JsonDeserialize(using = MatchTypeDeserializer.class)
public void setMatchType(List<MatchType> type) {
public void setMatchType(List<? extends IMatchType> type) {
this.matchType = type; //MatchTypeDeserializer.getMatchTypeFromString(type);
}

Expand Down Expand Up @@ -113,16 +116,16 @@
}

public String getFieldName() {
return fieldName;
return this.fieldName;
}

public void setFieldName(String fieldName) {
this.fieldName = fieldName;
}

@JsonIgnore
public boolean isDontUse() {
return (matchType != null && matchType.contains(MatchType.DONT_USE));
return (matchType != null && matchType.contains(MatchTypes.DONT_USE));

Check warning

Code scanning / PMD

Useless parentheses. Warning

Useless parentheses.
}

@Override
Expand Down Expand Up @@ -185,34 +188,34 @@
}
}*/

public static class MatchTypeSerializer extends StdSerializer<List<MatchType>> {
public static class MatchTypeSerializer extends StdSerializer<List<IMatchType>> {
public MatchTypeSerializer() {
this(null);
}

public MatchTypeSerializer(Class<List<MatchType>> t) {
public MatchTypeSerializer(Class<List<IMatchType>> t) {
super(t);
}

@Override
public void serialize(List<MatchType> matchType, JsonGenerator jsonGen, SerializerProvider provider)
public void serialize(List<IMatchType> matchType, JsonGenerator jsonGen, SerializerProvider provider)
throws IOException, JsonProcessingException {
try {
jsonGen.writeObject(getStringFromMatchType(matchType));
jsonGen.writeObject(getStringFromMatchType((List<IMatchType>) matchType));
LOG.debug("Serializing custom type");
} catch (ZinggClientException e) {
throw new IOException(e);
}
}

public static String getStringFromMatchType(List<MatchType> matchType) throws ZinggClientException {
public static String getStringFromMatchType(List<IMatchType> matchType) throws ZinggClientException {
return String.join(",", matchType.stream()
.map(p -> p.value())
.map(p -> p.getName())
.collect(Collectors.toList()));
}
}

public static class MatchTypeDeserializer extends StdDeserializer<List<MatchType>> {
public static class MatchTypeDeserializer extends StdDeserializer<List<IMatchType>> {
private static final long serialVersionUID = 1L;

public MatchTypeDeserializer() {
Expand All @@ -222,24 +225,24 @@
super(t);
}
@Override
public List<MatchType> deserialize(JsonParser parser, DeserializationContext context)
public List<IMatchType> deserialize(JsonParser parser, DeserializationContext context)
throws IOException, JsonProcessingException {
ObjectMapper mapper = new ObjectMapper();
try{
mapper.enable(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY);
LOG.debug("Deserializing custom type");
return getMatchTypeFromString(mapper.readValue(parser, String.class));
}
catch(ZinggClientException e) {
catch(Exception | ZinggClientException e) {
throw new IOException(e);
}
}

public static List<MatchType> getMatchTypeFromString(String m) throws ZinggClientException{
List<MatchType> matchTypes = new ArrayList<MatchType>();
public static List<IMatchType> getMatchTypeFromString(String m) throws ZinggClientException, Exception{
List<IMatchType> matchTypes = new ArrayList<IMatchType>();
String[] matchTypeFromConfig = m.split(",");
for (String s: matchTypeFromConfig) {
MatchType mt = MatchType.getMatchType(s);
IMatchType mt = MatchTypes.getByName(s);
matchTypes.add(mt);
}
return matchTypes;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
package zingg.common.client;

public interface IMatchType extends Named {

public String toString();

Check warning

Code scanning / PMD

The method 'compare(String, String)' is missing an @Override annotation. Warning

The method 'toString()' is missing an @Override annotation.

Check warning

Code scanning / PMD

Unnecessary modifier 'private' on constructor 'LabelMatchType(Double, String)': enum constructors are implicitly private Warning

Unnecessary modifier 'public' on method 'toString': the method is declared in an interface type

}

106 changes: 44 additions & 62 deletions common/client/src/main/java/zingg/common/client/MatchType.java
Original file line number Diff line number Diff line change
@@ -1,86 +1,68 @@
package zingg.common.client;

import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonValue;

/**
* Field types used in defining the types of fields for matching. See the field
* definitions and the user guide for more details
*/

public enum MatchType implements Serializable {
/**
* Short words like first names and organizations with focus on first
* characters matching
*/
FUZZY("FUZZY"),

/**
* Fields needing exact matches
*/
EXACT("EXACT"),
public class MatchType implements IMatchType, Serializable{


/**
* Many times pin code is xxxxx-xxxx and has to be matched with xxxxx.
*/
PINCODE("PINCODE"),
private static final long serialVersionUID = 1L;
public String name;

/**
* an email type which is supposed to look at only the first part of the email and ignore the domain.
*/
EMAIL("EMAIL"),

/**
* Long descriptive text, usually more than a couple of words for example
* product descriptions
*/
TEXT("TEXT"),
public MatchType(){

Check warning

Code scanning / PMD

Document empty constructor Warning

Document empty constructor

}

/**
* Strings containing numbers which need to be same. Example in addresses,
* we dont want 4th street to match 5th street
* Matching numbers with deviations
*/
NUMERIC("NUMERIC"),
/*eg P301d, P00231*/
NUMERIC_WITH_UNITS("NUMBER_WITH_UNITS"),
NULL_OR_BLANK("NULL_OR_BLANK"),
ONLY_ALPHABETS_EXACT("ONLY_ALPHABETS_EXACT"),
ONLY_ALPHABETS_FUZZY("ONLY_ALPHABETS_FUZZY"),
DONT_USE("DONT_USE");
public MatchType(String n){
this.name = n;
MatchTypes.put(this);
}

private String value;
private static Map<String, MatchType> types;
@Override
public String getName() {
return this.name;
}

MatchType(String type) {
this.value = type;
@Override
public void setName(String name) {
this.name = name;
}

private static void init() {
types = new HashMap<String, MatchType>();
for (MatchType f : MatchType.values()) {
types.put(f.value, f);
}

@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}

@JsonCreator
public static MatchType getMatchType(String t) throws ZinggClientException{
if (types == null) {
init();
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;

Check warning

Code scanning / PMD

This statement should have braces Warning

This statement should have braces
if (obj == null)
return false;

Check warning

Code scanning / PMD

This statement should have braces Warning

This statement should have braces
if (getClass() != obj.getClass())
return false;

Check warning

Code scanning / PMD

This statement should have braces Warning

This statement should have braces
MatchType other = (MatchType) obj;
if (name == null) {
if (other.name != null){
return false;
}
}
else if (!name.equalsIgnoreCase(other.name)){
return false;
Fixed Show fixed Hide fixed
}
MatchType type = types.get(t.trim().toUpperCase());
if (type == null) throw new ZinggClientException("Unsupported Match Type: " + t);
return type;
return true;
}

@JsonValue
public String value() {
return value;
@Override
public String toString() {
return name;
}

}
Loading
Loading