Replies: 2 comments
-
I've pushed modified source (but no tests yet) to https://github.com/beargiles/jsoup/blob/jsoup-2224-wildcards/src/main/java/org/jsoup/safety/Safelist.java. The key additions (no javadoc) are private Map<String, Map<String, Pattern>> attributeWildcards = new LinkedHashMap<>();
public Safelist addAttributeWildcard(String tag, String wildcard) {
if (!attributeWildcards.containsKey(tag)) {
attributeWildcards.put(tag, new LinkedHashMap<>());
}
attributeWildcards.get(tag).put(wildcard, Pattern.compile("^" + wildcard + "$",
Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CASE));
return this;
}
public Safelist removeAttributeWildcard(String tag, String wildcard) {
if (attributeWildcards.containsKey(tag)) {
if (attributeWildcards.get(tag).containsKey(wildcard)) {
attributeWildcards.remove(wildcard);
}
// remove any empty entries
if (attributeWildcards.get(tag).isEmpty()) {
attributeWildcards.remove(tag);
}
}
return this;
}
public Safelist addGlobalAttributeWildcard(String wildcard) {
return addAttributeWildcard(All, wildcard);
}
public Safelist removeGlobalAttributeWildcard(String wildcard) {
return removeAttributeWildcard(All, wildcard);
}
public boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
TagName tag = TagName.valueOf(tagName);
AttributeKey key = AttributeKey.valueOf(attr.getKey());
// skipped...
// might be a wildcard, e.g., "data-.+"?
if (attributeWildcards.containsKey(tag)) {
for (Map.Entry<String, Pattern> entry : attributeWildcards.get(tag).entrySet()) {
if (entry.getValue().matcher(attr.getKey()).matches()) {
return true;
}
}
}
// might be a global wildcard, e.g., "data-.+"?
if (attributeWildcards.containsKey(All)) {
for (Map.Entry<String, Pattern> entry : attributeWildcards.get(All).entrySet()) {
if (entry.getValue().matcher(attr.getKey()).matches()) {
return true;
}
}
}
// no attributes defined for tag, try :all tag
return !tagName.equals(All) && isSafeAttribute(All, el, attr);
} |
Beta Was this translation helpful? Give feedback.
-
I have tests now... For completeness I should mention that another possibility is to quietly add this functionality to the existing This approach can use the same implementation as I'm submitting - we would just hide it in the existing method. One benefit to this is that it could be easily extended to support tag wildcards... but I think tag wildcards should be more functional. E.g., we could have |
Beta Was this translation helpful? Give feedback.
-
Applications have defined custom attributes for a long time, e.g.,
aria-*
, and with HTML5 (iirc) there's now an official standard to recognizedata-*
as a valid attribute. For examplearia-xyz
should becomedata-aria-xyz
.It's not practical to add all possible attributes - it's an undefined list - and we don't want to specify
:all
since that will include everything. The best solution appears to be adding an optional list ofjava.text.Pattern
objects to theSafelist
and adding a check for matches.I propose adding both per-tag and global attribute wildcards.
Code to follow.
Beta Was this translation helpful? Give feedback.
All reactions