Property-Based Testing –– Call for Opinions, Comments

Property-Based Testing –– Call for Opinions, Comments

After checking out the tests in our code repository, I noticed an exclusive reliance on UnitTests. This led me to consider the potential benefits of incorporating property-based testing into the testing framework.

I’d like your opinion1

Property-based testing offers a different paradigm compared to traditional unit testing, which typically focuses on verifying specific input-output scenarios. It entails specifying general properties that our code should always satisfy, regardless of the input. Tools like QuickCheck can then generate a broad range of arbitrary inputs to thoroughly validate these properties, ensuring our code adheres to the defined principles consistently.

In some quarks, I used a form of property-based testing using a very simple approach. This was before I discovered QuickCheck. Nevertheless, this experience suggests that UnitTests could potentially serve as a framework that could also be used for property-based testing, especially if we develop custom generators and shrinking strategies for our data types beforehand.

I’m curious to hear from the community. Is there a strong preference for UnitTests, or has anyone experimented with property-based testing before? Is there some code done already?

What do you think about exploring this direction? I will leave some materials for evaluation. Please leave your comments and opinions below.

Consider reviewing John Hughes’s paper and these informative video presentations:

  1. John Hughes paper
  2. Code Checking Automation
  3. Introduction to Property-Based Testing - YouTube
  4. Advanced Property-Based Testing Techniques - YouTube
  5. Property-Based Testing in Practice - YouTube

Key takeaways from the QuickCheck research include:

  • The ‘Arbitrary’ class plays a critical role in selecting test-data generators based on type.
  • The ‘Coarbitrary’ class is pivotal for generating random functions.
  • Designing generators for structured inputs, such as substitution maps, requires a deliberate strategy to ensure the generated values are not only legitimate but also capable of uncovering significant cases.
  • The ease of use and approachability of QuickCheck significantly contributes to its appeal and potential for broad application.
  • The concept of shrinking is vital for refining test data values, and enhancing the effectiveness of property-based testing.

Articles

Around 5 years ago there was some pushback against any randomness in the SC test suite. That pushback was very forceful but limited to 1 or 2 voices, iirc. Those people are no loner contributors to the project. So I think it’s fair game again!

Personally, I think it’s a great idea. QuickCheck-style testing was the future 10 years ago and now it’s just the present. (Wait, 10 years ago? 20? Nearly 25!) We can have much more robust tests and at the same time declare our understanding of code’s behavior in a machine-checkable way.

1 Like

Thanks! Those Lambda folks are all on top of their game.

Hey, could you jog my memory on what the beef with property-based testing was? Curious to get the specifics.

The obvious one is that testing is not deterministic anymore which makes a testing-suite unreliable, which is exactly what you don’t want from a testing suite running in CI.

Property based testing seems to favor writing less testing code in trade for writing less explicit tests. Although writing tests is often not the most fun part about writing code, it is one of the most sustainable ways of writing code (and forces you to write less code/functionality because it needs to be tested). For me, the motivation of writing less test code does not outweigh the benefit of having isolated and explicit tests in the long term, which is the current focus of the lib.

Additionally, instead of lightweight primitives we just throw objects around everywhere which additionally impose lots of different states (this could be seen as a flaw in construction, most likely because the code was not written with having in mind that it needs to be tested which leads to different designs).
The construction of such complex objects in an automatic and mutating manner is non-trivial and creates more complexity and implicitness within the testing suite, violating the separation of concerns approach of a unit test.

Additionally, within sclang we don’t know which branches were covered by tests and which were not - without this tracing property testing becomes even more opaque because we don’t know which branches have run and which not. This can create a false sense of security.

I use property testing a lot when having to store primitive data in databases where it is very helpful to check an arbitrary kind of input, but in sclang this seems less of a problem and the complexity arises from the different states of objects that need to be managed and passed around, which property testing would not solve.

edit: I was not around for objecting to this idea some years ago, this is only derived from my personal experience with property testing in other projects.

2 Likes

You’ve touched on some interesting points.

First off, there’s no need to pit unit tests against property tests; they’re not mutually exclusive and can work together seamlessly.

The essence of property testing isn’t about coding less. If anything, it demands more effort. This is because you need to meticulously define the Arbitrary Data Type instance to match its corresponding type and properties accurately.

The key perspective here is not to see it as merely “generating data randomly.” It’s about clearly defining your proposition (your type and your properties), where the function and tests act as your proof, capable of handling any given value.

If it is correct, it will not give a different result each time you run the CI. Quite the contrary, it will be consistent.

On the other hand, unit tests tend to require less code, but they come with a catch: there’s no assurance that the scenarios they cover are the most critical ones.

Firstly, the classes and types must align with their definitions. Without verifying this foundation, debugging the side effects generated by the user becomes significantly more challenging. (You accidentally just made a case against side effects? :slight_smile: )

One thing that property tests can help with, which UnitTests can’t, is to help you shrink your data and identify exactly the buggy cases, in whatever environment. Don’t think just about the CI, it can be a tool to bisect difficult-to-catch errors once you have the proper definitions.

The obvious one is that testing is not deterministic anymore which makes a testing-suite unreliable, which is exactly what you don’t want from a testing suite running in CI.

The method I generally use is that any failure case found with randomness becomes a unit test. Therefore property tests are exploring new parts of the space, and unit tests check any previously-known problems.

Property based testing seems to favor writing less testing code in trade for writing less explicit tests.

I don’t think this is usually true. Typically a test suite will have many unit tests and then a few property tests generalizing the unit tests. For example, writing a parser you might have several unit tests of tricky inputs and their correct outputs, and then a test that for any given value x, (parse (prettyPrint x)) == x.

One can think that test results are “unpredictable” and can’t be used because of some CI issues, but that’s not the case. They believe these tests are not well-organized. In fact, the goal is to carefully check what needs to be looked at, paying attention only to appropriate values for the specific property. This helps find issues that might not be obvious at first. Then, the process includes improving or “shrinking” the tests, which leads to a better and deeper understanding of the problem. Shrinking is also a form of “debugging.”

I’d agree that when testing parsing property testing has benefits, but how much parsing or data sanitation is done in sclang?

We already have an unstable CI (I think b/c of timing issues on the CI server?) and I think adding more unknown variables through mutating tests will not result in more confident builds, which just leads to a halt in development.
Even if we keep the random seed fixed it could lead to problems as the internal test state can change through change in code invocation.

Additionally, a good property testing library is a big endeavor, it adds more complexity to maintain as it would become a core component of the language as tests are relying on it. Python’s property testing lib hypothesis has 14k commits.

Maybe an alternative approach could be to create a Quark which can generate random objects for local testing, which then can be used to explore edge cases which then can be turned into unit tests.

I wonder if the primary focus of your concerns might be the Continuous Integration (CI) process here. It’s possible that the test isn’t the root cause of the issues we’re observing there. We don’t need to discuss this here.

It’s worth noting that other Python libraries often consist of approximately 300 lines. QuickCheck, in particular, was designed to be a lightweight tool, as highlighted in the article mentioned in the first post. Considering its size, a rewrite of this python implementation could be a simple option. Alternatively, it’s also possible to utilize quickcheck itself for API testingv(from haskell), although setting this up might be slightly more complex and not practical for us at all.

Our conversation (at least right now) is about incorporating property-based testing and ensuring the long-term quality of our project. Let’s focus on the topic, please.

I think the CI process is one of the most crucial parts of the stability of any bigger project. If 3 years down the road some test starts to break because of RNG or testing-library changes, nobody will bother to fix it in this and instead delete it in favor of a passing CI, which in the end leads to creating more holes in the testing coverage. A deterministic unit test will not break because of RNG or needs a library for running - they are isolated and atomic, which are really good properties for long term maintenance. If for some reason the property testing library breaks, all tests which rely on this library will also break, making it one of the most crucial parts of the whole library code.

In your post I only see Quickcheck, so I am not sure about Quicktest, but Quickcheck itself is a massive Haskell library (4k loc?) which is not trivial to port to sclang. I think adding Haskell as dev dependency will also not be a viable solution, as even transferring an existing Python script to an external python dependency is currently blocked in a PR because of dependency concerns - and haskell is way more difficult to setup and it would be needed to setup by anybody who wants to contribute a small change on the lib.

I think the most interesting aspect is where the sclang lib would benefit from a property test which can currently not be covered by unit tests, but currently I don’t see them, but would be really curious to see those.

I use QuickCheck daily using Haskell. The ideas in the article are very clear, it doesn’t need to be massive. QuickCheck can be used in this way I described, but I just mentioned I didn’t consider it practical for us. Please, let’s focus on the topic, not everything else. Mixing different topics will just bring up an old problem that will block other discussions.

1 Like

This discussion is particularly relevant for some test cases in supercollider, sometimes in situations we didn’t even consider, at first, we could test:

The best case scenario when testing a piece of software is when we have a reference implementation to compare against. Often however such a reference is not available, begging the question how to test a function if we cannot verify what that function computes exactly. In this episode we will consider how to define properties to verify the implementation of Dijkstra’s shortest path algorithm we discussed in Episode 20; you may wish to watch that episode first, but it’s not required: we will mostly treat the algorithm as a black box for the sake of testing it.

I did a simple proof-of-concept.

Also implemented property-based tests with signals and a basic implementation for any class. I already implemented property combinators (different ways to compose a property). I can show more if there is some interest at all.

So, the basic concepts are not as complex. Many things can be improved and implemented (strategies for shrinking for example), easier ways to derive properties from types and objects, other ways to create property definitions, and testing of stateful systems (testing a stateful system compared to a model, very useful for some things in sc).

The classic list reversibility property (verbose with 10k tests):

// Define a property for testing list reversibility
~reversibilityProperty = Property.new({ |list| list == list.reverse.reverse }, "Reversibility of list");

// Define a generator for lists of integers
~listGenerator = PListGen.new(PRandInt.new(-2147483646, 2147483645, inf), PRandInt.new(1, 1000, inf), inf);

// Create and run the property-based test
~testRunner = PropertyBasedTest.new(~reversibilityProperty, ~listGenerator, 10000, verbose: true);

~testRunner.run

Just wondering: should the random stuff use a seed (and the seed be logged) so that a sudden failure can reliably be reproduced?

1 Like

No questions about it. Can be easily done.