Towards Better Evolutionary Program Repair: An Integrated Approach
Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities, and even understanding the system's communication topology, can be difficult. Three key tasks frequently performed during distributed system analysis but poorly supported by current tools are: (1) understanding the relative ordering of events, (2) searching for specific patterns of interaction between hosts, and (3) identifying structural similarities and differences between pairs of executions. This paper presents a new method, consisting of XVector and ShiViz, that support analysis of distributed systems. XVector instruments systems to capture the happens-before relation between events. ShiViz visualizes the distributed system executions as interactive time-space diagrams to support the three above tasks. We evaluated ShiViz to measure how it aids developers performing the three tasks, including a controlled experiment and two case studies. Participants using ShiViz answered statistically significantly more system comprehension questions correctly than control groups with a very large effect size, and all participants found ShiViz helpful in their analyses of complex distributed system executions.
Search-Based Software Engineering (SBSE) researchers who apply multi-objective search algorithms (MOSAs) often assess the quality of solutions produced by MOSAs with one or more quality indicators (QIs). However, SBSE lacks evidence providing insights on commonly used QIs, especially about agreements among them and their relations with SBSE problems and applied MOSAs. To this end, we conducted an extensive empirical evaluation to provide insights on commonly used QIs in the context of SBSE, by studying agreements among QIs with and without considering differences of SBSE problems and MOSAs. In addition, by defining a systematic process based on three common ways of comparing MOSAs in SBSE, we present additional observations that were automatically produced based on the results of our empirical evaluation. These observations can be used by SBSE researchers to gain a better understanding of the commonly used QIs in SBSE and their agreements, and also be useful for QI designers to design new QIs with such a comprehensive view of agreements among the studied QIs.
A Software Product Line (SPL) is a set of products that are built from a number of features, the set of valid products being defined by a feature model. Typically, it does not make sense to test all of the products defined by an SPL and so in testing one needs to choose a set of products to test (test selection) and, ideally, derive a good order in which to test them (test prioritisation). This paper introduces a new technique for solving the test selection and prioritisation problems. The approach, the grid-based evolution strategy (GrES), considers a number of fitness functions that assess how good a selection or prioritisation is and aims to optimise on all of these. The problem tackled is thus a many-objective optimisation problem. We introduce a new approach, in which all of the fitness functions are considered but one (pairwise coverage) is seen as the most important. We also derive a novel evolution strategy on the basis of domain knowledge. The results of the evaluation, on randomly generated and realistic feature models, were promising, with GrES outperforming previously proposed techniques and a range of many-objective optimisation algorithms.
Grown software systems often contain code that is not necessary anymore. Unnecessary code wastes resources during development and maintenance, for example, when preparing code for migration or certification. Running a profiler may reveal code that is not used in production, but it is often time-consuming to obtain representative data this way. We investigate to what extent a static analysis approach, which is based on code stability and code centrality, is able to identify unnecessary code and whether its recommendations are relevant in practice. To study the feasibility and usefulness of our static approach, we conducted a study involving 14 open-source and closed-source software systems. As there is no perfect oracle for unnecessary code, we compared recommendations of our approach with historical cleanup actions, runtime usage data, and feedback from 25 developers of 5 software projects. Our study shows that recommendations generated from stability and centrality information point to unnecessary code. Developers confirmed that 34% of recommendations were indeed unnecessary and deleted 20% of the recommendations shortly after our interviews. Overall, our results suggest that static analysis can provide quick feedback on unnecessary code and is useful in practice.
Learning-based classification dominates malware detectors for Android. However, due to the evolution of the Android ecosystem, existing such techniques are limited by their reliance on new malware samples, which may not be timely available, and constant retraining, which are often costly. A practical detector needs not only to be accurate on particular datasets but, more critically, to be able to sustain its capabilities over time without frequent retraining. We propose and study the sustainability problem for learning-based app classifiers. We define sustainability metrics and compare them among five state-of-the-art malware detectors. We further developed DroidSpan, a novel classification system based on a new behavioral profile that capture sensitive access distribution. We evaluated the sustainability of DroidSpan versus the five detectors on longitudinal datasets across eight years, which include 13,627 benign apps and 12,755 malware. We showed that DroidSpan significantly outperformed these baselines in sustainability at reasonable costs, by 6?32% for same-period detection and 21?37% for over-time detection. The main takeaway, which also explains the superiority of DroidSpan, is that the use of features consistently differentiating malware from benign apps over time is essential for sustainable learning-based malware detection, and that these features can be learned from app evolution studies.
We define two complementary approaches to monitor decentralized systems. The first relies on those with a centralized specification, i.e, when the specification is written for the behavior of the entire system. To do so, our approach introduces a data-structure that i) keeps track of the execution of an automaton, ii) has predictable parameters and size, and iii) guarantees strong eventual consistency. The second approach defines decentralized specifications wherein multiple specifications are provided for separate parts of the system. We study two properties of decentralized specifications pertaining to monitorability and compatibility between specification and architecture. We also present a general algorithm for monitoring decentralized specifications. We map three existing algorithms to our approaches and provide a framework for analyzing their behavior. Furthermore, we introduce THEMIS, a framework for designing such decentralized algorithms and simulating their behavior. We show the usage of THEMIS to compare multiple algorithms and verify the trends predicted by the analysis by studying two scenarios: a synthetic benchmark and a real example.
We address the problem of engineering a sociotechnical (STS) system with respect to its stakeholders' requirements. We motivate a two-tier conception on an STS comprising (i) a technical tier that provides control mechanisms and describes what actions are allowed by the software components; and (ii) a social tier that characterizes the stakeholders' expectations of each other in terms of norms. Specifically, we adopt agents as computational entities, each representing a different stakeholder. Unlike previous approaches, our framework, Desen, incorporates the social dimension into the formal verification process. Thus, Desen supports agents potentially violating applicable norms-a consequence of their autonomy. In addition to formal requirements verification via model checking, Desen supports refinement of system specifications via design patterns to meet stated (and changing) requirements. We demonstrate how Desen carries out refinement on a scenario involving information sharing in a hospital during an emergency. We show via a human-subject study that a design process based on our patterns is helpful for participants who are inexperienced in conceptual modeling and norms.
In the model-driven design of embedded systems, how to generate code from high-level control models seamlessly and correctly is challenging, as control models are normally modeled as hybrid systems, which are involved with continuous evolution, discrete jumps, and the complicated entanglement between them, while code only contains discrete actions. In this paper, we investigate the code generation from a formal control model, given by Hybrid CSP (HCSP), to SystemC. We first introduce the notion of approximate bisimulation, that will be used as a criterion to check the consistency between two different systems, especially between the original control model and the final generated code. We prove that it is decidable whether two HCSP processes are approximately bisimilar in bounded time and unbounded time, respectively. For both the cases, we define the discretization of HCSP processes and prove that the original HCSP model and its discretization are approximately bisimilar. Furthermore, based on the discretization, we define a transfer function to map a discretized HCSP model into SystemC code such that they are bisimilar. We finally implement a tool to automatically do the translation from HCSP processes to SystemC code, and show our approach by some case studies.
Generic programming is a key paradigm for developing reusable software components. The inherent support for generic constructs is therefore important in programming languages. As for C++, the generic construct -- templates, has been supported since the language was released. However, little is currently known about how C++ templates are actually used in developing real software. In this study, we conduct an experiment to investigate the use of templates in practice. First, we conduct a survey to understand developers perception of templates. Then, we analyze 1267 historical revisions of 50 open-source software systems, consisting of 566 million lines of C++ code, to collect the data of the practical use of templates. Finally, we perform statistical analysis on the collected data and get many interesting results. We uncover the following important findings: (1) the new template features are not more often used than their old substitutes; (2) user-defined templates do not significantly play a role to reduce code replications in the client code; and (3) freestanding function templates in most software systems do not practically play a role to reduce generic function-like macros. These findings should be helpful for practitioners to understand and use template