Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add postconditions for variables and contexts: @post-condition #73

Open
rjelliffe opened this issue May 9, 2024 · 14 comments
Open

Add postconditions for variables and contexts: @post-condition #73

rjelliffe opened this issue May 9, 2024 · 14 comments
Labels
deferred Deferred until a future revision

Comments

@rjelliffe
Copy link
Member

rjelliffe commented May 9, 2024

(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)

It can be hard, especially for newcomers or rare users, to have confidence that a complex XPath is working the way it should. Indeed, as a matter of good software engineering, the more important ("risky") some code is, the more that you want to have some independent (i.e., redundant) check of it. This is of course well-known since Bertram Meyer, and a rationale for Schematron itself.

So would Schematron be better if it allowed internal assertions on its own Xpaths? I think so, and I think it can be trivially implemented (over XSLT) without neutralizing optimized-lazy evaluation. It would complement e.g. sch:let/@as, which allows a level of typing.

In concrete terms the proposal is that sch:let and sch:rule allow another attribute @post-condition which takes an Xpath expression that evaluates to boolean. The context for this XPath is the variable value or the rule context.

The evaluation of the @post-condition would not go into the SVRL (necessarily): the document's vaidity result is unchanged whether or not these post-conditions are enabled or not. It is intended to for developer information, confidence and debugging not for the end-user of the schema. It would generate implementation-dependent information e.g. on Standard Error output (e.g. xsl:message) or to a log file or for an IDE.

Here are two examples:

<sch:rule context="*[@id]"   post-condition="string-length(normalize-space(@id)) ne 0"  > ...

This example a rule select all elements that have an @id attribute. However, the developer expects that these all contain non-empty values: the post-condition makes this explicit. We don't want to use sch:assertions for this, because it is a programmer-world thing not a user-world thing.

<sch:let name="post-code-list" value="document('post-codes.xml')"  post-condition="/post-codes[@version='2024']"  />

In this, the document is read in. (And any exceptions are swallowed, or logged.) Then the condition is tested. If there was no document or the wrong one, the post-condition will fail and the failure logged. The implementation can warn the user there has been this problem (e.g. in this pattern) and not produce a result of "valid".

This is a partial fix for the problem that XPath functions can generate exceptions, but Schematron has no mechanism to cope. For example, if trying to parse a number and it is not a number, we put the code into a variable first. The parse fails and generates an exception which is swallowed or fails. Then we check the value using @post-condition so that we are not beholden to the way the engine implements exception handling.

<sch:let name="my-safe-number"  value="number(/*/@some-code)"  post-condition="number(.)" />

Another example: for helping with complex chains of variables:

<sch:let name="var1" value="//thing" />
<sch:let name="var2" value="$var1/child::*[1]" />
<sch:let name="var3" value="$var2/child::*[1]"  post-condition="count(.) = count($var1)"   />

which might be implemented as:

<xsl:variable name="var1"                         select="//thing" />
<xsl:variable name="var2"                         select="$var1/child::*[1]" />
<xsl:variable name="var3-23423423420" select="$var2/child::*[1]"  />
<xsl:variable name="var3"> 
        <xsl:if test="$var3-23423423420/count(.) = count($var1)">
              <xsl:message>Post-condition failed: .... </xsl:message>
          </xsl:if> 
       <xsl:copy-of select="$var3-23423423420" />
</xsl:value>  

(Not debugged. You get the idea. The double handling of var3 is to maintain lazy-evaluation.)

In this case, the developer believes it to be the case that every "thing" has a grandchild element, which simplifies the cases they need to make assertions for. But the developer wants to be able to check this during testing, and not make it something that invades the user's diagnostics. (They could do this using a dedicated phase too, if they wanted full diagnostics, but they might find that bad separation of concerns in their specific scenario.)

Regards
Rick

@rjelliffe
Copy link
Member Author

rjelliffe commented May 12, 2024

Alternate names to @post-condition might be @confirm or @expect or @Assume.

@AndrewSales
Copy link
Collaborator

I see this proposal as relating to two slightly different, but related, things: unit testing and exception handling.

There are mature testing frameworks, such as XSpec, where this kind of thing can already be accommodated. It's good practice and probably better for the programmer to amass a set of test cases that cause exceptions to be raised.

If you are worried about the exception handling provided by the implementation you are using, you can write your own function and handle exceptions (differently - perhaps more gracefully) there. XSpec can also test if your functions are working correctly, of course.

@rjelliffe
Copy link
Member Author

rjelliffe commented May 14, 2024 via email

@AndrewSales
Copy link
Collaborator

Then we disagree fundamentally about the purpose of unit testing.
The idea of disabling the tests once the system is mature alarms me, since in my day job I am dealing with unpredictable, human-authored input that can vary greatly. We write our last test when we have fixed the last bug.

I have worked on several systems where we had to
test against all previous inputs (tens of thousands of documents) and even
then would find uncoped-with scenarios in the next incoming set if
documents.

Me too, and I continue to. It is the way of things, which this proposal can't change.

I don't think what you propose is a bad idea, just that it doesn't solve the problem. I think it is a problem in any case that can only be mitigated.
You open with the challenges of a complex XPath, but that post-condition XPath is only going to get more complex as it needs to accommodate more scenarios. It would provide a sense of security no more true than corresponding unit tests would.
I would, as I say, address this with additional test cases to describe unforeseen scenarios as they arise, and amend the schema to reflect them as needed.

This is a partial fix for the problem that XPath functions can generate exceptions, but Schematron has no mechanism to cope.

Well, there is if...then...else... error(...) approach, but the standard discourages the use of error(). Perhaps we need some runtime linkage that does allow user-defined exceptions to be handled by the implementation and consistently reported as SVRL...

@rjelliffe
Copy link
Member Author

rjelliffe commented May 14, 2024 via email

@AndrewSales
Copy link
Collaborator

AndrewSales commented May 15, 2024

A unit test says "given some specific input X expect specific output Y". An assertion says "for every possible A, some invariant B should hold." Not the same things.

I'm well aware of the difference.

As I said above, I don't think this kind of assertion addresses the issue of unpredictable input.

Assertions in other languages can typically be enabled or disabled at execution time, and if enabled, will often halt processing. If we do have assertions, I think implementations ought to be configurable in this respect.

A common case I've come across is a runtime error where an atomic value was expected by a function, but a sequence was passed instead. This can occur also e.g. in message construction, with <value-of/>. Would we want assertions in such places too?

I think it would be good to refine the expected behaviour and prospective reporting of errors, if this is to be standardised.

I'd be interested in input from the wider community about this as a feature. XML Prague and the Schematron Users Meetup are around the corner, which is one suitable forum.

@rjelliffe
Copy link
Member Author

rjelliffe commented May 15, 2024 via email

@rjelliffe
Copy link
Member Author

rjelliffe commented May 15, 2024 via email

@AndrewSales
Copy link
Collaborator

AndrewSales commented May 16, 2024

But I still don't understand Andrew's point, sorry, unless he is saying a developer using this may not cover all cases, or be a matter of discipline: that's life, isn't it?

I mean that the perceived utility of an assert will run out quickly for all but the most simple cases and most predictable input.

A real example from just the other day. I was testing out a new rule which worked in isolation but threw a divide-by-zero error when incorporated into the target schema. The cause was my test cases were toys that omitted otherwise required structures. The IDE was able to take me to the point of failure in the XSLT for the compiled schema.

Would an assert have helped me? Possibly, but I found the cause from my IDE anyway. Would I have wanted to put one everywhere in a sizeable schema where division was used? Probably not. If real-world input had caused this, I'd've added an extra condition to the relevant XPath and moved on. Here, I adjusted my test cases and moved on.

Static type checking of signatures within Xpaths, like OxygenXml does, is a different issue, I think. That is where @as connects better.

Not static: dynamic. Using string() or concat() for example, where an argument is a sequence because there is unexpectedly more than one of something that needs reporting in the message generated.

To an extent, yes.

It would be critically important to be clear how this feature would affect validity, if it all. I'm not asking for all of this information here and now, I am just noting the need.

@tgraham-antenna
Copy link
Member

A unit test says "given some specific input X expect specific output Y". An assertion says "for every possible A, some invariant B should hold." Not the same things.

I'm well aware of the difference.

As I said above, I don't think this kind of assertion addresses the issue of unpredictable input.

I suggest that using 'assertion' as an unqualified term here risks confusion with sch:assert (at least for me).

Assertions in other languages can typically be enabled or disabled at execution time, and if enabled, will often halt processing. If we do have assertions, I think implementations ought to be configurable in this respect.

IME, the [programming language] assertions that can be disabled at execution time tend to be put just before the ordinary code that tries to do something reasonable with the same invalid value (e.g., return early with a null value) so that the developer gets the rude shock where the problem occurs and the user gets let down gently.

(I'm not sure how well the reasonable return value idiom translates to Schematron, where there aren't visible return values as such, just the presence or absence of messages from sch:assert and sch:report.)

It seems to me that @rjelliffe wants the Schematron to (also) be able to deliver the rude shocks, while @AndrewSales would (mostly) leave it to the unit tests. Plus, I think there's general agreement that there will always be one more bug when some user somewhere tries something unexpected. (Antenna House Formatter once had a bug with Latin superscripts in Bulgarian text. Who could have predicted that?)

It might be that myriad unit tests could all fail to exercise something that could be caught by checking a value that is calculated within the sch:rule. (At this point I don't know why you would do anything other than <sch:assert role="debug">, or similar, for it.)

It might also be that a check within the sch:rule never fails anyway, maybe because the checked condition also fails earlier structural validation so the Schematron never sees those documents or because there's an error in the XPaths used in the sch:rule.

So there might be a place for both (though still not seeing the need for a lot of extra machinery for programming language-style debugging assertions).

A common case I've come across is a runtime error where an atomic value was expected by a function, but a sequence was passed instead. This can occur also e.g. in message construction, with <value-of/>. Would we want assertions in such places too?

I think it would be good to refine the expected behaviour and prospective reporting of errors, if this is to be standardised.

True, although the other approach is to let implementers try things and then standardise what succeeds.

I'd be interested in input from the wider community about this as a feature. XML Prague and the Schematron Users Meetup are around the corner, which is one suitable forum.

Indeed.

@rjelliffe
Copy link
Member Author

rjelliffe commented May 17, 2024 via email

@rjelliffe
Copy link
Member Author

rjelliffe commented May 17, 2024 via email

@rjelliffe
Copy link
Member Author

rjelliffe commented May 17, 2024 via email

@AndrewSales AndrewSales added the 2025 A change made in preparing the 2025 edition label Jul 2, 2024
@AndrewSales AndrewSales removed the 2025 A change made in preparing the 2025 edition label Aug 19, 2024
@AndrewSales
Copy link
Collaborator

Removing the 2025 label.
@rjelliffe , if you could see your way to replacing all the instances here of ***@***.*** with what you intended before the next edition is in preparation, it would be appreciated.

@AndrewSales AndrewSales added the deferred Deferred until a future revision label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deferred Deferred until a future revision
Projects
None yet
Development

No branches or pull requests

3 participants