The advent of variability management and generator technology enables users to derive individual system variants from a configurable code base by selecting desired configuration options. This approach gives rise to the generation of possibly billions of variants, which, however, cannot be efficiently analyzed for bugs and other properties with classic analysis techniques. To address this issue, researchers and practitioners developed sampling heuristics and, recently, variability-aware analysis techniques. While sampling reduces the analysis effort significantly, the information obtained is necessarily incomplete, and it is unknown whether state-of-the-art sampling techniques scale to billions of variants. Variability-aware analysis techniques process the configurable code base directly, exploiting similarities among individual variants with the goal of reducing analysis effort. However, while promising, variability-aware analysis techniques have been applied mostly only to small academic examples. To learn about the mutual strengths and weaknesses of variability-aware and sample-based analysis techniques, we compared the two by means of seven concrete control-flow and data-flow analyses, applied to five real-world subject systems: Busybox, OpenSSL, SQLite, Linux, and uClibc. In particular, we compare the efficiency (analysis execution time) of the analyses and their effectiveness (potential bugs found). Overall, variability-aware analysis outperforms most sampling techniques with respect to efficiency and effectiveness.
This paper presents systematic literature review that enquires into the maturity of FLTs evaluation in terms of baseline comparison, homogeneity of empirical designs and finally, the reproducibility of FLTs and their evaluation. It identifies different issues that substantially affect the ability of researchers and practitioners when trying to identify the best-of-breed FLT or, in the case of researchers, when trying to replicate existing FLT-evaluation studies. The results show that a 95% of the existing research in this field present novel FLTs, and that only just over half of the examined FLTs have been evaluated through formal empirical methods. In addition, only 8% of the studies compared the FLT to openly available, baseline techniques. Another characteristic of the reviewed literature is the 255 different subject systems, 60 metrics, 210 benchmarks and plethora of user input formats in FLT evaluations, which also negatively affects the comparability of FLTs. Finally, there is a lack of reproducible FLTs in the field, disallowing researchers from re-creating the FLT for comparison studies. Cumulatively, these conditions make it difficult to find answers to questions like which are the best FLTs?". Paper concludes by providing guidelines for empirical evaluation of FLTs that may help towards empirical standardization.
Uncertainty in timing properties (e.g., detection time of external events) is a common reality in embedded software systems since these systems interact with complex physical environments. Such time uncertainty leads to non-determinism. For example, as a result of time uncertainty, time-triggered operations may either generate different valid outputs across different executions, or experience failures (e.g., results not being generated in the expected time window) that occur only occasionally over many executions. For these reasons, time uncertainty makes the generation of effective test oracles for timing requirements a challenging task. To address the above challenge, we propose STUIOS (Stochastic Testing with Unique Input Output Sequences), an approach for the automated generation of stochastic oracles that verify the capability of a software system to fulfill timing constraints in the presence of time uncertainty. Such stochastic oracles entail the statistical analysis of repeated test case executions based on test output probabilities predicted by means of statistical model checking. Results from two industrial case studies in the automotive domain demonstrate that this approach improves the fault detection effectiveness of tests suites derived from timed automata, compared to traditional approaches.
Multi-level modelling promotes flexibility in modelling by enabling the use of several meta-levels instead of just two, as is the case in mainstream two-level modelling approaches. While this approach leads to simpler models for some scenarios, it introduces an additional degree of freedom as designers are able to decide the level where an element should reside, having to ascertain the suitability of such decisions. In this respect, model refactorings have been successfully applied in the context of two level-modelling to rearrange the elements of a model while preserving its meaning. Thus, in this paper, we propose their extension to tackle the refactoring of multi-level models in order to help designers in rearranging elements across and within levels and exploring the consequences. We show a classification and catalogue of multi-level refactorings, and provide support in our MetaDepth tool. Finally, we present an experiment based on model mutation that validates the predicted semantic side effects of the refactorings on the basis of more than 210.000 refactoring applications.