The idea I’m describing here is not new and most programmers are familiar with the concept but it may be unfamiliar to beginners. Basically we’ll compare our program against a correct program for the same problem and try to find a test case on which the two programs give different outputs. It’s kind of like black box testing. You may already have several questions in mind:
- How would we get our hands on a correct program for the problem at hand?
- If we already had a correct program, why would we even bother to debug an incorrect program?
- How do we generate inputs that will likely lead to different outputs from the two programs and be able to understand why the outputs are different?
The answers to the first two questions are related. There are several ways we can get a correct program for a given problem:
- We may be able to write a correct program ourselves using a different (perhaps less efficient) approach but this program may not be efficient enough to be accepted by the judge or maybe we just want to learn the new approach.
- We may be able to ask a friend for their accepted solution to the problem or get a solution from the web but we may not want to ruin the fun of solving the problem ourselves by reading their solution or maybe the correct program is not written properly to be easily understood.
- If nothing else, we can implement a brute-force solution to the problem so that we can at least test the correctness of our program on small test cases.
Whatever be the case, let’s assume that we have a correct solution to the program to compare our program against. Now, the question arises how do we generate test cases:
- We don’t want test cases to be huge because then we won’t be able to understand why our program gives a different output. We won’t be able to debug it. So we want the test cases to be small enough to be manually solvable. That’s why a brute force solution may be just what we needed as the correct program.
- We obviously want test cases to be consistent with the problem description but don’t want to generate all test cases ourselves because we probably already have checked correctness of our program on test cases we could think of. We would like a program to generate those test cases for us.
- We would like those test cases to be generated which are more likely to fail our program.
We can write a program that generates small random test cases, consistent with the problem description, with the probabilities skewed (if necessary) so that it generates more of those test cases which are more likely to fail our program. This description would probably be sufficient for an experienced programmer but for the sake of completeness, I’ll present a python script as a template for writing such a test oracle.
This script assumes that we have two executables, one for the correct program and another for the incorrect one, and that the executables read input from a file and write output to another file. The programs submitted on online judges usually read from the standard input and not from files but we can redirect the standard IO to files for this script to work. In C/C++, we can redirect the standard IO to files by writing these lines:
freopen("input.txt", "r", stdin) // redirects standard input to input.txt
freopen("output.txt", "w", stdout) // redirects standard output to output.txt
I hope you found this post useful. Please share your thoughts in comments.