The Programming Contest
To date software engineering research has been based on relatively small samples of programs; at most a few tens of programs have been used in controlled experiments to test hypotheses. Ideally far more programs---written to a common specification---are needed to undertake statistical analyses, and many different specifications are needed to demonstrate results are generally applicable.
The UVa Online Judge Website is an initiative of Miguel Revilla of the University of Valladolid. It contains problems to which everyone can submit solutions. The solutions are programs written in C, C++, Java or Pascal. The correctness of the programs is automatically judged by the ``Online Judge". Most authors submit solutions until their solution is judged as being correct. There are many thousands of authors and together they have produced more than 3,000,000 solutions to the approximately 1500 problems on the website.
From the perspective of algorithm design, the programming contest is a treasure trove. There appear to be numerous ways to solve the same problem. But also for software reliability engineers this is the case: there are even more ways to not solve the problem. Most authors' first submission is incorrect. They take some trials to---in most cases---finally arrive at the correct solution.
The millions of submissions give us the opportunity to research various topics in software reliability engineering. To date, most energy has been invested in the analysis of the effectiveness of diversity in some of its varieties.
Problems Analysed
The following list shows a subset of the problems that have been analysed, together with the inputs to these analyses and some of the results.- 00100 - 3n+1.
- 00102 - Binpacking.
- 00106 - Fermat.
- 00108 - Maximum Sum.
- 00116 - Unidirectional TSP.
- 00145 - Gondwanaland Telecom.
- 00147 - Dollars.
- 00149 - Forests.
- 00191 - Intersection.
- 00160 - Factors and Factorials.
- 00701 - Archeologist.
- 10003 - Cutting Sticks.
- 10035 - Primary Arithmetic.
- 10038 - Jolly Jumpers.
- 10083 - Division.
- 10139 - Factovisors.
- 10154 - Weights and Measures.
- 10182 - Bee Maja.
- 10183 - Fibs.
- 10200 - Prime Time.
- 10271 - Chopsticks.
- 10453 - MakePalindrome.
Research Topics
The programming contest gives the opportunity research many topics of interest to software reliability engineerings. We have done work/are doing work on the following topics:- The effectiveness of language diversity.
- The shape of difficulty functions.
- The effectiveness of plausibility checks.
- Shapes of failure regions.
- The relation between personality and programming behaviour.
- Optimal testing strategies.
- The application of the Eckhardt and Lee model.
- The application of the Littlewood and Miller model.
Conjectures
The Programming Contest also gives the opportunity to test conjectures. Many of those conjectures are normally taken for granted, because they seem to be self evident. But why not challenge them? At the moment we have done work/are doing work on the following conjectures:- A Pascal/C pair is more reliable than a C/C or a Pascal/Pascal Pair. TRUE.
- More reliable programs are longer. FALSE, there is a large variation in length between programs, but there is no difference between correct and incorrect programs, nor is there a correlation with PFD.
- Reliable programs benefit from plausibility checks. TRUE, reliable programs profit as much from plausibility checks as unreliable programs.
- Software metrics like McCabe's Cyclomatic Complexity and Halstead's Volume are predictors of software dependability for a given specification. FALSE. Our research shows no correlation at all between these metrics and defect count or PFD. These metrics correlate with Lines of Code.
- Lines of Code is a predictor of the defect count, when the specification is not given. TRUE. There is a correlation, albeit not very strong. Longer programs have more defects, but the more important factor is the complexity inherent to the specification.
The Test Harness
A test harness has been devoloped to systematically analyse problems in the Programming Contest. The test harness contains programs for the following tasks:- Extraction of source codes from e-mails.
- Compilation of the source codes.
- Running the source codes.
- Determining equivalence classes and score functions.
- A range of statistical functions in R for analysis of the data.
Publications
Meulen, M.J.P. van der, Revilla, M., "Experiences with the Design of a Run-Time Check", Lecture Notes in Computer Science, vol. 4166, pp. 302-15, Computer Safety, Reliability, and Security, 25th International Conference, Safecomp 2006, Gdansk, Poland, September 27-29, 2006.
Meulen, M.J.P. van der, Revilla, M., "The effectiveness of Choice of Programming Language as a Diversity Seeking Decision", 5th European Dependable Computing Conference (EDDC-5), Budapest, Hungary, April 2005, (Dal Cin, M., Kaaniche, M., Pataricza, A., Eds.), pp. 199-209, Springer, Lecture Notes on Computer Science, 2005.
Meulen, M.J.P. van der, Strigini, L., Revilla, M., "On the Effectiveness of Run-Time Checks", Lecture Notes in Computer Science, Safecomp, 28-30 September 2005, Halden, Norway, (Gran, B.A., Winther, R., Eds.), vol. 3866, pp. 151-64, Springer, 2005.
Bentley, J.G.W., Bishop, P.G., Meulen, M.J.P. van der, "An Empirical Exploration of the Difficulty Function", Safecomp, Potsdam, 21-24 Sep. 2004, Potsdam, Germany, pp. 60-71, Springer Verlag, 2004.
Meulen, M.J.P. van der, Bishop, P.G., Revilla, M., "An Exploration of Software Faults and Failure Behaviour in a Large Population of Programs", ISSRE 2004, Rennes, France, pp. 101-12, 2004.
Last updated: 20 February 2007
