Title Year Published by Document Type

UIED: A hybrid tool for GUI element detection

2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Conference Paper
Graphical User Interface (GUI) elements detection is critical for many GUI automation and GUI testing tasks. Acquiring the accurate positions and classes of GUI elements is also the very first step to conduct GUI reverse engineering or perform GUI testing. In this paper, we implement a User Iterface Element Detection (UIED), a toolkit designed to provide user with a simple and easy-to-use platform to achieve accurate GUI element detection. UIED integrates multiple detection methods including old-fashioned computer vision (CV) approaches and deep learning models to handle diverse and complicated GUI images. Besides, it equips with a novel customized GUI element detection methods to produce state-of-the-art detection results. Our tool enables the user to change and edit the detection result in an interactive dashboard. Finally, it exports the detected UI elements in the GUI image to design files that can be further edited in popular UI design tools such as Sketch and Photoshop. UIED is evaluated to be capable of accurate detection and useful for downstream works. Tool URL: <a>http://uied.online</a> Github Link: <a>https://github.com/MulongXie/UIED</a> © 2020 ACM.
10.1145/3368089.3417940

Navigation and exploration in 3D-game automated play testing

2020 A-TEST 2020 - Proceedings of the 11th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, Co-located with ESEC/FSE 2020 Conference Paper
To enable automated software testing, the ability to automatically navigate to a state of interest and to explore all, or at least sufficient number of, instances of such a state is fundamental. When testing a computer game the problem has an extra dimension, namely the virtual world where the game is played on. This world often plays a dominant role in constraining which logical states are reachable, and how to reach them. So, any automated testing algorithm for computer games will inevitably need a layer that deals with navigation on a virtual world. Unlike e.g. navigating through the GUI of a typical web-based application, navigating over a virtual world is much more challenging. This paper discusses how concepts from geometry and graph-based path finding can be applied in the context of game testing to solve the problem of automated navigation and exploration. As a proof of concept, the paper also briefly discusses the implementation of the proposed approach. © 2020 ACM.
10.1145/3412452.3423570

FrUITeR: A framework for evaluating UI test reuse

2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Conference Paper
UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR, a framework that automatically evaluates UI test reuse in a reproducible way. We apply FrUITeR to existing test-reuse techniques on a uniform benchmark we established, resulting in 11,917 test reuse cases from 20 apps. We report several key findings aimed at improving UI test reuse that are missed by existing work. © 2020 ACM.
10.1145/3368089.3409708

An empirical analysis of test input generation tools for android apps through a sequence of events

2020 Symmetry Article
Graphical User Interface (GUI) testing of Android apps has gained considerable interest from the industries and research community due to its excellent capability to verify the operational requirements of GUI components. To date, most of the existing GUI testing tools for Android apps are capable of generating test inputs by using different approaches and improve the Android apps’ code coverage and fault detection performance. Many previous studies have evaluated the code coverage and crash detection performances of GUI testing tools in the literature. However, very few studies have investigated the effectiveness of the test input generation tools, especially in the events sequence length of the overall test coverage and crash detection. The event sequence length generally shows the number of steps required by the test input generation tools to detect a crash. It is critical to highlight its effectiveness due to its significant effects on time, testing effort, and computational cost. Thus, this study evaluated the effectiveness of six test input generation tools for Android apps that support the system events generation on 50 Android apps. The generation tools were evaluated and compared based on the activity coverage, method coverage, and capability in detecting crashes. Through a critical analysis of the results, this study identifies the diversity and similarity of test input generation tools for Android apps to provide a clear picture of the current state of the art. The results revealed that a long events sequence performed better than a shorter events sequence. However, a long events sequence led to a minor positive effect on the coverage and crash detection. Moreover, the study showed that the tools achieved less than 40% of the method coverage and 67% of the activity coverage. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
10.3390/sym12111894

PLATOOL: A Functional Test Generation Tool for Mobile Applications

2020 ACM International Conference Proceeding Series Conference Paper
Mobile applications are ubiquitous nowadays and their testing is a central activity for quality assurance. Software testing is considered an important activity in this context. Application testers are faced with several classes of events in this domain including GUI and system events, such as sensor-related events. While GUI events have been systematically explored in mobile application testing literature, system events have received less attention. A possible difficulty faced by mobile application testers is the identification and generation of input data for system events. This paper presents PLATOOL for assisting mobile application testers to deal with common events of the mobile applications during the automation of functional tests. Our preliminary results indicate that PLATOOL is able to generate and execute useful functional tests to support testing of mobile applications. © 2020 ACM.
10.1145/3422392.3422508

Models in Graphical User Interface Testing: Study Design

2020 2020 Turkish National Software Engineering Symposium, UYMS 2020 - Proceedings Conference Paper
Model-based GUI testing is an important concept in Software GUI testing. Manual testing is a time-consuming labor and heavily error-prone. It has several well-accepted models that Software Testing community has been working and contributing to them for many years. This paper reviews different models used in model-based GUI testing and presents a case study with a proposed approach for how to convert several well-accepted models to ESG (Event Sequence Graphs) to generate test cases and execute them with an aim to consolidate past and future works in a single model. © 2020 IEEE.
10.1109/UYMS50627.2020.9247072

Testing apps with real-world inputs

2020 Proceedings - 2020 IEEE/ACM 1st International Conference on Automation of Software Test, AST 2020 Conference Paper
To test mobile apps, one requires realistic and coherent test inputs. The Link approach for Web testing has shown that knowledge bases such as DBPedia can be a reliable source of semantically coherent inputs. In this paper, we adapt and extend the Link approach towards test generation for mobile applications: (1) We identify and match descriptive labels with input fields, based on the Gestalt principles of human perception; (2) We then use natural language processing techniques to extract the concept associated with the label; (3) We use this concept to query a knowledge base for candidate input values; (4) We cluster the UI elements according to their functionality into input and actions, filling the input elements first and then interacting with the actions. Our evaluation shows that leveraging knowledge bases for testing mobile apps with realistic inputs is effective. On average, our approach covered 9% more statements than randomly generated text inputs. © 2020 Association for Computing Machinery.
10.1145/3387903.3389310

Bug! Falha! Bachi! Fallo! Défaut!! What about internationalization testing in the software industry?

2020 International Symposium on Empirical Software Engineering and Measurement Conference Paper
Background. Testing is an essential activity in the software development life cycle. Nowadays, testing activities are widely spread along the software development process, since software products are continuously tested to meet the user's expectations and to compete in global markets. In this context, internationalization testing is defined as the practice focused on determining that a software works properly in a specific language and in a particular region. Aims. This study aims to explore the particularities of internationalization testing in the software industry and discuss the importance of this practice from the point of view of professionals working in this context. Method. We developed an exploratory qualitative study and conducted interviews with professionals from an international software company, in order to understand three aspects of internationalization testing: general characteristics and importance of this practice, particularities of the process, and the role of test automation in this context. Results. An amount of 13 professionals participated in this study. Results demonstrated that internationalization testing is mostly related to aspects of graphical user interfaces. In this context, truncation and mistranslations are the main faults observed, and test automation might be difficult to implement and maintain due to amount validations that are human-dependent. Conclusion. Internationalization testing is an important practice to guarantee the quality of software products developed for global markets. However, this aspect of software testing remains unpopular or unfamiliar among professionals. This study is a step forward in the process of informing and enlightening academic researchers and practitioners in industry about this theme. © 2020 IEEE Computer Society. All rights reserved.
10.1145/3382494.3422167

AppTestMigrator: A Tool for Automated Test Migration for Android Apps *

2020 Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering: Companion, ICSE-Companion 2020 Conference Paper
The use of mobile apps is increasingly widespread, and much effort is put into testing these apps to make sure they behave as intended. In this demo, we present AppTestMigrator, a technique and tool for migrating test cases between apps with similar functionality. The intuition behind AppTestMigrator is that many apps share similarities in their functionality, and these similarities often result in conceptually similar user interfaces (through which that functionality is accessed). AppTestMigrator attempts to automatically transform the sequence of events and oracles in a test case for an app (source app) to events and oracles for another app (target app). The results of our preliminary evaluation show the effectiveness of AppTestMigrator in migrating test cases between mobile apps with similar functionality.Video URL: Https://youtu.be/WQnfEcwYqa4 © 2020 ACM.
10.1145/3377812.3382149

On the Industrial Applicability of Augmented Testing: An Empirical Study

2020 Proceedings - 2020 IEEE 13th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2020 Conference Paper
Testing applications with graphical user Interfaces (GUI) is an important but also a time-consuming task in practice. Tools and frameworks for GUI test automation can make the test execution more efficient and lower the manual labor required for regression testing. However, the test scripts used for automated GUI-based testing still require a substantial development effort and are often reported as sensitive to change, leading to frequent and costly maintenance. The efficiency of development, maintenance, and evolution of such tests are thereby dependent on the readability of scripts and the ease-of-use of test tools/frameworks in which the test scripts are defined. To address these shortcomings in existing state-of-practice techniques, a novel technique referred to as Augmented Testing (AT) has been proposed. AT is defined as testing the System Under Test (SUT) through an Augmented GUI that superimposes information on top of the SUT GUI. The Augmented GUI can provide the user with hints, test data, or other support while also observing and recording the tester's interactions. For this study, a prototype tool, called Scout, has been used that adheres to the AT concept that is evaluated in an industrial empirical study. In the evaluation, quasi-experiments and questionnaire surveys are performed in two workshops, with 12 practitioners from two Swedish companies (Ericsson and Inceptive). Results show that Scout can be used to create equivalent test cases faster, with statistical significance, than creating automated scripts in two popular state-of-practice tools. The study concludes that AT has cost-value benefits, applies to industrial-grade software, and overcomes several deficiencies of state-of-practice GUI testing technologies in terms of ease-of-use. © 2020 IEEE.
10.1109/ICSTW50294.2020.00065

Session-Based Recommender Systems for Action Selection in GUI Test Generation

2020 Proceedings - 2020 IEEE 13th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2020 Conference Paper
Test generation at the graphical user interface (GUI) level has proven to be an effective method to reveal faults. When doing so, a test generator has to repeatably decide what action to execute given the current state of the system under test (SUT). This problem of action selection usually involves random choice, which is often referred to as monkey testing. Some approaches leverage other techniques to improve the overall effectiveness, but only a few try to create human-like actions - or even entire action sequences. We have built a novel session-based recommender system that can guide test generation. This allows us to mimic past user behavior, reaching states that require complex interactions. We present preliminary results from an empirical study, where we use GitHub as the SUT. These results show that recommender systems appear to be well-suited for action selection, and that the approach can significantly contribute to the improvement of GUI-based test generation. © 2020 IEEE.
10.1109/ICSTW50294.2020.00066

Model-Based Testing of GUI Applications Featuring Dynamic Instanciation of Widgets

2020 Proceedings - 2020 IEEE 13th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2020 Conference Paper
The testing of applications with a Graphical User Interface (GUI) is a complex activity because of the infinity of possible event sequences. In the field of GUI Testing, model-based approaches based on reverse engineering of GUI application have been proposed to generate test cases. Unfortunately, evidences show that these techniques do not support some of the features of modern GUI applications. These features include dynamic widgets instantiation or advanced interaction techniques (e.g. multitouch). In this paper, we propose to build models of the applications from requirements, as it is standard practice in Model-Based Testing. To do so, we identified ICO (Interactive Cooperative Object) as one of the modelling techniques allowing the description of complex GUI behavior. We demonstrate that this notation is suitable for generating test cases targeting complex GUI applications in a process derived from the standard ModelBased Testing process. © 2020 IEEE.
10.1109/ICSTW50294.2020.00029

Translation from Visual to Layout-based Android Test Cases: A Proof of Concept

2020 Proceedings - 2020 IEEE 13th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2020 Conference Paper
Context: 2nd generation (Layout-based) and 3rd generation (Visual) GUI testing are two approaches for testing mobile GUIs, both with individual benefits and drawbacks. Previous research has presented approaches to translate 2nd generation scripts to 3rd generation scripts but not the vice versa. Goal: The objective of this work is to provide Proof of Concept of the effectiveness of automatic translation between existing 3rd generation test scripts to 2nd generation test scripts. Method: A tool architecture is presented and implemented in a tool capable of translating most 3rd generation interactions with the GUI of an Android app into 2nd generation instructions and oracles for the Espresso testing tool.Results: We validate our approach on two test suites of our own creation, consisting of 30 test cases each. The measured success rate of the translation is 96.7% (58 working test cases out of 60 applications of the translator). Conclusion: The study provides support for the feasibility of a translation-based approach from 3rd to 2nd generation test cases. However, additional work is needed to make the approach applicable in real-world scenarios or larger open-source test suites. © 2020 IEEE.
10.1109/ICSTW50294.2020.00027

Speeding up GUI Testing by On-Device Test Generation

2020 Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 Conference Paper
When generating GUI tests for Android apps, it typically is a separate test computer that generates interactions, which are then executed on an actual Android device. While this approach is efficient in the sense that apps and interactions execute quickly, the communication overhead between test computer and device slows down testing considerably. In this work, we present DD-2, a test generator for Android that tests other apps on the device using Android accessibility services. In our experiments, DD-2 has shown to be 3.2 times faster than its computer-device counterpart, while sharing the same source code. © 2020 ACM.
10.1145/3324884.3415302

Seven Reasons Why: An In-Depth Study of the Limitations of Random Test Input Generation for Android

2020 Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 Conference Paper
Experience paper: Testing of mobile apps is time-consuming and requires a great deal of manual effort. For this reason, industry and academic researchers have proposed a number of test input generation techniques for automating app testing. Although useful, these techniques have weaknesses and limitations that often prevent them from achieving high coverage. We believe that one of the reasons for these limitations is that tool developers tend to focus mainly on improving the strategy the techniques employ to explore app behavior, whereas limited effort has been put into investigating other ways to improve the performance of these techniques. To address this problem, and get a better understanding of the limitations of input-generation techniques for mobile apps, we conducted an in-depth study of the limitations of MONKEy-arguably the most widely used tool for automated testing of Android apps. Specifically, in our study, we manually analyzed Monkey's performance on a benchmark of 64 apps to identify the common limitations that prevent the tool from achieving better coverage results. We then assessed the coverage improvement that Monkey could achieve if these limitations were eliminated. In our analysis of the results, we also discuss whether other existing test input generation tools suffer from these common limitations and provide insights on how they could address them. © 2020 ACM.
10.1145/3324884.3416567

Plug the Database Play with Automatic Testing: Improving System Testing by Exploiting Persistent Data

2020 Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 Conference Paper
A key challenge in automatic Web testing is the generation of syntactically and semantically valid input values that can exercise the many functionalities that impose constraints on the validity of the inputs. Existing test case generation techniques either rely on manually curated catalogs of values, or extract values from external data sources, such as the Web or publicly available knowledge bases. Unfortunately, relying on manual effort is generally too expensive for most practical applications, while domain-specific and application-specific data can be hardly found either on the Web or in general purpose knowledge bases.This paper proposes DBINPuTs, a novel approach that reuses the data from the database of the target Web application, to automatically identify domain-specific and application-specific inputs, and effectively fulfill the validity constraints present in the tested Web pages. DBINPUTS can properly cope with system testing and maintenance testing efforts, since databases are naturally and inexpensively available in those phases. To extract valid inputs from the application databases, DBINPUTS exploits the syntactic and semantic similarity between the identifiers of the input fields and the ones in the tables of the database, automatically resolving the mismatch between the user interface and the schema of the database. Our experiments provide initial evidence that DBINPUTS can outperform both random input selection and Link, a competing approach for searching inputs from knowledge bases. © 2020 ACM.
10.1145/3324884.3416561

Mobile testing: New challenges and perceived difficulties from developers of the Italian industry

2020 IT Professional Article
Automated Graphical User Interface (GUI) testing is a fundamental part of the Verification and Validation process of every software, but it is often linked to notable maintenance costs, especially for mobile applications. The literature reports a general lack of automated testing adoption among mobile developers in the industry. In this article, we present the outcome of seven interviews centered on how companies automate the testing process of mobile applications. The interviews confirmed that automated testing is still not widely adopted and rarely formalized by industry, with manual testing being still the primary form of testing. Test fragility and evolution of the user interface are seen as a relevant issue by developers, with a cost of around 30% of the overall maintenance performed on test suites. Some clear shared needs emerged during our interviews that can be considered hints for the added research effort from academia in meeting the needs of industry. © 1999-2012 IEEE.
10.1109/MITP.2019.2942810

Automating GUI testing with image-based deep reinforcement learning

2020 Proceedings - 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2020 Conference Paper
Users interact with modern applications and devices through graphical user interfaces (GUIs). To ensure intuitive and easy usability, the GUIs need to be tested, where developers aim at finding possible bugs and inconsistent functionality. Manual GUI testing requires time and effort, and thus, its efficiency can be improved with automation. Conventional automation tools for GUI testing reduce the burden of manual testing but also introduce challenges in the maintenance of test cases. In order to overcome these issues, we propose a deep-reinforcement-learning-based (DRL) solution for automated and adaptive GUI testing. Specifically, we propose and evaluate the performance of an image-based DRL solution. We adapt the asynchronous advantage actor-critic (A3C) algorithm to GUI testing inspired by how a human uses a GUI. We feed screenshots of the GUI as the input and let the algorithm decide how to interact with GUI components. We observe that our solution can achieve up to six times higher exploration efficiency compared to selected baseline algorithms. Moreover, our solution is more efficient than inexperienced human users and almost as efficient as an experienced human user in our experimental GUI testing scenario. For these reasons, image-based DRL exploration can be considered as a viable GUI testing method. © 2020 IEEE.
10.1109/ACSOS49614.2020.00038

Automated classification of actions in bug reports of mobile apps

2020 ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis Conference Paper
When users encounter problems with mobile apps, they may commit such problems to developers as bug reports. To facilitate the processing of bug reports, researchers proposed approaches to validate the reported issues automatically according to the steps to reproduce specified in bug reports. Although such approaches have achieved high success rate in reproducing the reported issues, they often rely on a predefined vocabulary to identify and classify actions in bug reports. However, such manually constructed vocabulary and classification have significant limitations. It is challenging for the vocabulary to cover all potential action words because users may describe the same action with different words. Besides that, classification of actions solely based on the action words could be inaccurate because the same action word, appearing in different contexts, may have different meaning and thus belongs to different action categories. To this end, in this paper we propose an automated approach, called MaCa, to identify and classify action words in Mobile apps' bug reports. For a given bug report, it first identifies action words based on natural language processing. For each of the resulting action words, MaCa extracts its contexts, i.e., its enclosing segment, the associated UI target, and the type of its target element by both natural language processing and static analysis of the associated app. The action word and its contexts are then fed into a machine learning based classifier that predicts the category of the given action word in the given context. To train the classifier, we manually labelled 1,202 actions words from 525 bug reports that are associated with 207 apps. Our evaluation results on manually labelled data suggested that MaCa was accurate with high accuracy varying from 95% to 96.7%. We also investigated to what extent MaCa could further improve existing approaches (i.e., Yakusu and ReCDroid) in reproducing bug reports. Our evaluation results suggested that integrating MaCa into existing approaches significantly improved the success rates of ReCDroid and Yakusu by 22.7% = (69.2%-56.4%)/56.4% and 22.9%= (62.7%-51%)/51%, respectively. © 2020 ACM.
10.1145/3395363.3397355

Crowdsourced requirements generation for automatic testing via knowledge graph

2020 ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis Conference Paper
Crowdsourced testing provides an effective way to deal with the problem of Android system fragmentation, as well as the application scenario diversity faced by Android testing. The generation of test requirements is a significant part of crowdsourced testing. However, manually generating crowdsourced testing requirements is tedious, which requires the issuers to have the domain knowledge of the Android application under test. To solve these problems, we have developed a tool named KARA, short for Knowledge Graph Aided Crowdsourced Requirements Generation for Android Testing. KARA first analyzes the result of automatic testing on the Android application, through which the operation sequences can be obtained. Then, the knowledge graph of the target application is constructed in a manner of pay-as-you-go. Finally, KARA utilizes knowledge graph and the automatic testing result to generate crowdsourced testing requirements with domain knowledge. Experiments prove that the test requirements generated by KARA are well understandable, and KARA can improve the quality of crowdsourced testing. The demo video can be found at https://youtu.be/kE-dOiekWWM. © 2020 ACM.
10.1145/3395363.3404363

Reinforcement learning based curiosity-driven testing of Android applications

2020 ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis Conference Paper
Mobile applications play an important role in our daily life, while it still remains a challenge to guarantee their correctness. Model-based and systematic approaches have been applied to Android GUI testing. However, they do not show significant advantages over random approaches because of limitations such as imprecise models and poor scalability. In this paper, we propose Q-testing, a reinforcement learning based approach which benefits from both random and model-based approaches to automated testing of Android applications. Q-testing explores the Android apps with a curiosity-driven strategy that utilizes a memory set to record part of previously visited states and guides the testing towards unfamiliar functionalities. A state comparison module, which is a neural network trained by plenty of collected samples, is novelly employed to divide different states at the granularity of functional scenarios. It can determine the reinforcement learning reward in Q-testing and help the curiosity-driven strategy explore different functionalities efficiently. We conduct experiments on 50 open-source applications where Q-testing outperforms the state-of-the-art and state-of-practice Android GUI testing tools in terms of code coverage and fault detection. So far, 22 of our reported faults have been confirmed, among which 7 have been fixed. © 2020 ACM.
10.1145/3395363.3397354

Making Android apps monkey-friendly

2020 Proceedings - 2020 IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems, MOBILESoft 2020 Conference Paper
Monkey testing is a random testing technique in which a stream of pseudo-random events are automatically fired on the GUI of the application under test, usually with the purpose of robustness testing or responsiveness analysis. A line of research is dedicated to addressing the limitations of monkey testing for Android apps. However, all the existing works try to improve the underlying algorithms or techniques used by the monkey testing tools. In this vision paper, we propose the idea of improving the effectiveness of monkey testing by automatically refactoring the application under test. We provide two sample scenarios in which this idea can be used to address limitations of monkey testing for Android applications. © 2020 ACM.
10.1145/3387905.3388609

Multiple-entry testing of android applications by constructing activity launching contexts

2020 Proceedings - International Conference on Software Engineering Conference Paper
Existing GUI testing approaches of Android apps usually test apps from a single entry. In this way, the marginal activities far away from the default entry are difficult to be covered. The marginal activities may fail to be launched due to requiring a great number of activity transitions or involving complex user operations, leading to uneven coverage on activity components. Besides, since the test space of GUI programs is infinite, it is difficult to test activities under complete launching contexts using single-entry testing approaches. In this paper, we address these issues by constructing activity launching contexts and proposing a multiple-entry testing framework. We perform an inter-procedural, flow-, context- and pathsensitive analysis to build activity launching models and generate complete launching contexts. By activity exposing and static analysis, we could launch activities directly under various contexts without performing long event sequence on GUI. Besides, to achieve an in-depth exploration, we design an adaptive exploration framework which supports the multiple-entry exploration and dynamically assigns weights to entries in each turn. Our approach is implemented in a tool called Fax, with an activity launching strategy Faxla and an exploration strategy Faxex . The experiments on 20 real-world apps show that Faxla can cover 96.4% and successfully launch 60.6% activities, based on which Faxex further achieves a relatively 19.7% improvement on method coverage compared with the most popular tool Monkey. Our tool also behaves well in revealing hidden bugs. Fax can trigger over seven hundred unique crashes, including 180 Errors and 539 Warnings, which is significantly higher than those of other tools. Among the 46 bugs reported to developers on Github, 33 have been fixed up to now. © 2020 Association for Computing Machinery.
10.1145/3377811.3380347

Automated GUI Layout Refactoring to Improve Monkey Testing of Android Applications

2020 Proceedings of RTEST 2020 - 3rd CSI/CPSSI International Symposium on Real-Time and Embedded Systems and Technologies Conference Paper
A line of research in the software testing community is dedicated to proposing effective testing techniques for finding defects in Android applications. Monkey testing is one of the promising techniques, mainly because of having low setup cost, good reusability across different applications, and being successful in challenging the application under test with corner cases. Despite its benefits, monkey testing suffers from well-known weaknesses, including widget obliviousness and state obliviousness. The former means that the monkey does not utilize specific knowledge about the behavior of the elements in the GUI of the application under test. The latter means that the monkey simply generates random events without taking into consideration the current state of the application and whether those events are helpful in that state, from the point of view of revealing potential faults. As a result, different methods are presented in the literature to improve monkey testing of Android applications. In this paper, we have proposed a novel technique to improve effectiveness of monkey testing by alleviating the widget obliviousness issue. This technique is based on the idea of automated GUI layout refactoring to improve effectiveness of the monkey. We have implemented the proposed technique and have used it to conduct a case study on a real-world Android application. The results demonstrate that the proposed technique is promising in improving effectiveness of monkey testing by increasing the interaction of the monkey with the GUI elements that are associated with the more complex functionalities of the application. © 2020 IEEE.
10.1109/RTEST49666.2020.9140106

Configuring Appium for iOS Applications and Test Automation in Multiple Devices

2020 ACM International Conference Proceeding Series Conference Paper
With the ever-expanding of mobile technologies, maintaining software quality becomes a challenging job as a high volume of analyzes and high arrangements of features ought to be tested. Today, organizations are investing an expanding measure of energy and assets in guaranteeing the application is completely tried for the best client experience and ideal execution by the application. Automation in testing could be a great solution in this regard. Though there are mere tools for testing iOS applications, an open source mobile testing tool Appium is one of those. The purpose of this study is to discuss the detailed configuration of Appium for testing iOS applications and to address one of the major limitations of testing iOS applications using Appium, that is to test in multiple iOS devices using one Mac machine. This will support the iOS mobile industry to improve the quality of user experience by guiding the step by step set up of Appium for testing in commercial level and making it more cost effective. © 2020 ACM.
10.1145/3399871.3399883

Unsupervised detection of changes in usage-phases of a mobile app

2020 Applied Sciences (Switzerland) Article
Under the fierce competition and budget constraints, most mobile apps are launched without sufficient tests. Thus, there exists a great demand for automated app testing. Recent developments in various machine learning techniques have made automated app testing a promising alternative to manual testing. This work proposes novel approaches for one of the core functionalities of automated app testing: the detection of changes in usage-phases of a mobile app. Because of the flexibility of app development languages and the lack of standards, each mobile app is very different from other apps. Furthermore, the graphical user interfaces for similar functionalities are rarely consistent or similar. Thus, we propose methods detecting usage-phase changes through object recognition and metrics utilizing graphs and generative models. Contrary to the existing change detection methods requiring learning models, the proposed methods eliminate the burden of training models. This elimination of training is suitable for mobile app testing whose typical usage-phase is composed of less than 10 screenshots. Our experimental results on commercial mobile apps show promising improvement over the state-of-the-practice method based on SIFT (scale-invariant feature transform). © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
10.3390/app10103656

Human-like UI Automation through Automatic Exploration

2020 ACM International Conference Proceeding Series Conference Paper
Most UI testing tools for mobile games are designed to help us create and run the test cases with scripts. However, these scripts must be manually updated for new test cases, which increases the test cost. In this paper, we propose a method to implement humanlike UI automation through automatic exploration in mobile games. Our method can automatically explore most UIs by recognizing and operating the UI elements similar to manual UI testing. First, we design a lightweight convolutional neural network to detect the buttons in the UI image captured from the mobile phone. Next, we build a directed graph model to store the visited UIs during automatic exploration. Finally, according to our exploration strategy, we choose one button from the UI image and send a click action to the mobile phone. Our method obtains over 85% UI and button coverage rates on three popular mobile games. © 2020 ACM.
10.1145/3436286.3436297

Sentinel: generating GUI tests for sensor leaks in Android and Android wear apps

2020 Software Quality Journal Article
Due to the widespread use of Android devices and apps, it is important to develop tools and techniques to improve app quality and performance. Our work focuses on a problem related to hardware sensors on Android devices: the failure to disable unneeded sensors, which leads to sensor leaks and thus battery drain. We propose the Sentinel testing tool to uncover such leaks. The tool performs static analysis of app code and produces a model which maps GUI events to callback methods that affect sensor behavior. Edges in the model are labeled with symbols representing the acquiring/releasing of sensors and the opening/closing of UI windows. The model is traversed to identify paths that are likely to exhibit sensor leaks during run-time execution based on two context-free languages over the symbol alphabet. The reported paths are then used to generate test cases. The execution of each test case tracks the run-time behavior of sensors and reports observed leaks. This approach has been applied to both open-sourced and closed-sourced regular Android applications as well as watch faces for Android Wear smartwatches. Our experimental results indicate that Sentinel effectively detects sensor leaks, while focusing the testing efforts on a very small subset of possible GUI event sequences. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
10.1007/s11219-019-09484-z

Automation of selection of a pool of graphical interface regression tests for multi module information systems

2020 CEUR Workshop Proceedings Conference Paper
Features of using the regression test selection method for automated testing of the graphical user interface in the development of information systems that consist of a set of modules are considered. The source of the need to create additional test environments required in the development of multi-module information systems that are using databases is specified. The three most popular approaches to organizing test environments - Copying, Scaling, and Scaling with synthetic data generation - are considered. The positive and negative sides are considered in terms of implementation, using, and resources spent on creating and maintaining resources, as well as in terms of the reliability of the results obtained in the process of testing models created using these approaches. The positive aspects of checking the quality of complex multi-module information systems from the point of view of the graphical user interface by various testing methods and, in particular, in the process of performing regression testing are presented. The positive aspects of using regression testing automation in conditions of lack of resources using various software platforms are indicated. The advantages of using the dynamic selection method for regression tests for automated testing are also given, as well as recommendations for implementing the selection method in existing and beginning projects. © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

A Technique for Parallel GUI Testing of Android Applications

2020 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Conference Paper
There is a large need for effective and efficient testing processes and tools for mobile applications, due to their continuous evolution and to the sensitivity of their users to failures. Industries and researchers focus their effort to the realization of effective fully automatic testing techniques for mobile applications. Many of the proposed testing techniques lack in efficiency because their algorithms cannot be executed in parallel. In particular, Active Learning testing techniques usually relay on sequential algorithms. In this paper we propose a Active Learning technique for the fully automatic exploration and testing of Android applications, that parallelizes and improves a general algorithm proposed in the literature. The novel parallel algorithm has been implemented in the context of a prototype tool exploiting a component-based architecture, and has been experimentally evaluated on 3 open source Android applications by varying different deployment configurations. The measured results have shown the feasibility of the proposed technique and an average saving in testing time between 33% (deploying two testing resources) and about 80% (deploying 12 testing resources). © 2020, IFIP International Federation for Information Processing.
10.1007/978-3-030-64881-7_11

Deploying TESTAR to Enable Remote Testing in an Industrial CI Pipeline: A Case-Based Evaluation

2020 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Conference Paper
Companies are facing constant pressure towards shorter release cycles while still maintaining a high level of quality. Agile development, continuous integration and testing are commonly used quality assurance techniques applied in industry. Increasing the level of test automation is a key ingredient to address the short release cycles. Testing at the graphical user interface (GUI) level is challenging to automate, and therefore many companies still do this manually. To help find solutions for better GUI test automation, academics are researching scriptless GUI testing to complement the script-based approach. In order to better match industrial problems with academic results, more academia-industry collaborations for case-based evaluations are needed. This paper describes such an initiative to improve, transfer and integrate an academic scriptless GUI testing tool TESTAR into the CI pipeline of a Spanish company Prodevelop. The paper describes the steps taken, the outcome, the challenges, and some lessons learned for successful industry-academia collaboration. © 2020, Springer Nature Switzerland AG.
10.1007/978-3-030-61362-4_31

Sequence Mining for Automatic Generation of Software Tests from GUI Event Traces

2020 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Conference Paper
In today’s software industry, systems are constantly changing. To maintain their quality and to prevent failures at controlled costs is a challenge. One way to foster quality is through thorough and systematic testing. Therefore, the definition of adequate tests is crucial for saving time, cost and effort. This paper presents a framework that generates software test cases automatically based on user interaction data. We propose a data-driven software test generation solution that combines the use of frequent sequence mining and Markov chain modeling. We assess the quality of the generated test cases by empirically evaluating their coverage with respect to observed user interactions and code. We also measure the plausibility of the distribution of the events in the generated test sets using the Kullback-Leibler divergence. © 2020, Springer Nature Switzerland AG.
10.1007/978-3-030-62365-4_49

Automated Test Selection for Android Apps Based on APK and Activity Classification

2020 IEEE Access Article
Several techniques exist for mobile test automation, from script-based techniques to automated test generation based on GUI models. Most techniques fall short in being adopted extensively by practitioners because of the very costly definition (and maintenance) of test cases. We present a novel testing framework for Android apps that allows a developer to write effective test scripts without having to know the implementation details and the user interface of the app under test. The main goal of the framework is to generate adaptive tests that can be executed on a significant number of apps, or different releases of the same app, without manual editing of the tests. The frameworks consists of: (1) a Test Scripting Language, that allows the tester to write generic test scripts tailored to activity and app categories; (2) a State Graph Modeler, that creates a model of the app's GUI, identifying activities (i.e., screens) and widgets; (3) an app classifier that determines the type of application under test; (4) an activity classifier that determines the purpose of each screen; (5) a test adapter that executes test scripts that are compatible with the specific app and activity, automatically tailoring the test scripts to the classes of the app and the activities under test. We evaluated empirically the components of our testing framework. The classifiers were able to outperform available approaches in the literature. The developed testing frameworkwas able to correctly adapt high-level test cases to 28 out of 32 applications, and to reduce the LOCs of the test scripts of around 90%.We conclude that machine learning can be fruitfully applied to the creation of high-level, adaptive test cases for Android apps. Our framework is modular in nature and allows expansions through the addition of new commands to be executed on the classified apps and activities. © 2020 BMJ Publishing Group. All rights reserved.
10.1109/ACCESS.2020.3029735

Comparing the effectiveness of capture and replay against automatic input generation for Android graphical user interface testing

2020 Software Testing Verification and Reliability Conference Paper
Exploratory testing and fully automated testing tools represent two viable and cheap alternatives to traditional test-case-based approaches for graphical user interface (GUI) testing of Android apps. The former can be executed by capture and replay tools that directly translate execution scenarios registered by testers in test cases, without requiring preliminary test-case design and advanced programming/testing skills. The latter tools are able to test Android GUIs without tester intervention. Even if these two strategies are widely employed, to the best of our knowledge, no empirical investigation has been performed to compare their performance and obtain useful insights for a project manager to establish an effective testing strategy. In this paper, we present two experiments we carried out to compare the effectiveness of exploratory testing approaches using a capture and replay tool (Robotium Recorder) against three freely available automatic testing tools (AndroidRipper, Sapienz, and Google Robo). The first experiment involved 20 computer engineering students who were asked to record testing executions, under strict temporal limits and no access to the source code. Results were slightly better than those of fully automated tools, but not in a conclusive way. In the second experiment, the same students were asked to improve the achieved testing coverage by exploiting the source code and the coverage obtained in the previous tests, without strict temporal constraints. The results of this second experiment showed that students outperformed the automated tools especially for long/complex execution scenarios. The obtained findings provide useful indications for deciding testing strategies that combine manual exploratory testing and automated testing. © 2020 John Wiley & Sons, Ltd.
10.1002/stvr.1754

Functional test generation from UI test scenarios using reinforcement learning for android applications

2020 Software Testing Verification and Reliability Conference Paper
With the ever-growing Android graphical user interface (GUI) application market, there have been many studies on automated test generation for Android GUI applications. These studies successfully demonstrate how to detect fatal exceptions and achieve high coverage with fully automated test generation engines. However, it is unclear how many GUI functions these engines manage to test. The current best practice for the functional testing of Android GUI applications is to design user interface (UI) test scenarios with a non-technical and human-readable language such as Gherkin and implement Java/Kotlin methods for every statement of all the UI test scenarios. Writing tests for UI test scenarios is hard, especially when some scenario statements are high-level and declarative, so it is not clear what actions should the generated test perform. We propose the Fully Automated Reinforcement LEArning-Driven specification-based test generator for Android (FARLEAD-Android). FARLEAD-Android first translates the UI test scenario to a GUI-level formal specification as a linear-time temporal logic (LTL) formula. The LTL formula guides the test generation and acts as a specified test oracle. By dynamically executing the application under test (AUT), and monitoring the LTL formula, FARLEAD-Android learns how to produce a witness for the UI test scenario, using reinforcement learning (RL). Our evaluation shows that FARLEAD-Android is more effective and achieves higher performance in generating tests for UI test scenarios than three known engines: Random, Monkey and QBEa. To the best of our knowledge, FARLEAD-Android is the first fully automated mobile GUI testing engine that uses formal specifications. © 2020 John Wiley & Sons, Ltd.
10.1002/stvr.1752

A GA-based approach to automatic test data generation for ASP. NET web applications

2020 IAENG International Journal of Computer Science Article
One of the major challenges in software testing is the generation of test data automatically that satisfy a specified adequacy criterion. This paper presents a GA-based approach and a supporting tool for data-flow test data generation for ASP. NET web applications. The proposed tool accepts as input the web application under test, instruments it, and performs static analysis to compute the definition-use pairs. The proposed GA conducts its search by constructing new test data from previously generated test data that are evaluated as effective test data. In this GA, the chromosome is a collection of user interface control objects, where each control is considered as a gene. Therefore, novel crossover and mutation operators are developed to manipulate the chromosome, which are called block crossover and control-based mutation operators. The proposed GA accepts as input the instrumented version, the list of definition-use pairs to be covered, and input controls related information. The tool produces a set of test cases, the set of definition-use pairs covered by each test case, and a list of uncovered definition-use pairs, if any. Also the paper presents a case study to illustrate how the tool works. Finally, it presents the results of the empirical evaluation that is performed to evaluate the effectiveness of the generated test data in exposing web application errors. © International Association of Engineers.

Scriptless Testing at the GUI Level in an Industrial Setting

2020 Lecture Notes in Business Information Processing Conference Paper
TESTAR is a traversal-based and scriptless tool for test automation at the Graphical User Interface (GUI) level. It is different from existing test approaches because no test cases need to be defined before testing. Instead, the tests are generated during the execution, on-the-fly. This paper presents an empirical case study in a realistic industrial context where we compare TESTAR to a manual test approach of a web-based application in the rail sector. Both qualitative and quantitative research methods are used to investigate learnability, effectiveness, efficiency, and satisfaction. The results show that TESTAR was able to detect more faults and higher functional test coverage than the used manual test approach. As far as efficiency is concerned, the preparation time of both test approaches is identical, but TESTAR can realize test execution without the use of human resources. Finally, TESTAR turns out to be a learnable test approach. As a result of the study described in this paper, TESTAR technology was successfully transferred and the company will use both test approaches in a complementary way in the future. © 2020, Springer Nature Switzerland AG.
10.1007/978-3-030-50316-1_16

Artificial intelligence in automated system for web-interfaces visual testing

2020 CEUR Workshop Proceedings Conference Paper
In this paper, the authors consider an artificial intelligence technique of providing visual testing, and also the developed system that is integrated into functional automated test suites. Thus carried out monitoring and analyzing of visual changes in the graphical interface of the application under test. A proposed tool is supposed to resolve the existing problems of the traditional snapshot visual testing. Graphical user interface (GUI) testing is a very important testing step for quality control of software applications. The GUI is the central node in the test application, from where all functions are accessed. Thus, it is difficult to thoroughly test programs through their graphical interface, especially because they are designed to work with humans, not machines. In addition, they are inherently non-static interfaces, prone to constant changes caused by functionality upgrades, improved usability, changing requirements or changed contexts. This complicates the development and maintenance of test cases without resorting to time-consuming and costly manual testing. A proposed automated system for web-interfaces visual testing uses computer vision technology as an artificial intelligence technique for visual comparison. A comparative analysis is carried out with the developed interface for testing (in particular, a web page) and the expected mockup with the location of visual elements on the page for example, an interface from the customer). When designing an automated system for web-interfaces visual testing, the programming languages Python, JavaScript, library TensorFlow, testing framework Cypress, and database MySQL were used. © 2020 for this paper by its authors.

Automated testing in robotic process automation projects

2020 Journal of Software: Evolution and Process Article
Robotic process automation (RPA) has received increasing attention in recent years. It enables task automation by software components, which interact with user interfaces in a similar way to that of humans. An RPA project life cycle is closely resembling a software project one. However, in certain contexts (e.g., business process outsourcing), a testing environment is not always available. Thus, deploying the robots in the production environment entails high risk. To mitigate it, an innovative approach to automatically generate a testing environment and a test suite for an RPA project is presented. The activities of the humans whose processes are to be robotized are monitored and a UI log is confirmed. On one side, the test environment is generated as a fake application, which mimics the real environment by leveraging the UI log information. The control flow of the application is governed by an invisible control layer that decides which image to show depending on the interface actions that it receives. On the other side, the test case checks whether the robot can reproduce the behaviour of the UI log. Promising results were obtained and a number of limitations were identified such that it may be applied in more realistic domains. © 2020 John Wiley & Sons, Ltd.
10.1002/smr.2259

TEGDroid: Test case generation approach for android apps considering context and GUI events

2020 International Journal on Advanced Science, Engineering and Information Technology Article
The advancement in mobile technologies has led to the production of mobile devices (e.g. smartphone) with rich innovative features. This has enabled the development of mobile applications that offer users an advanced and extremely localized context-aware content. The recent dependence of people on mobile applications for various computational needs poses a significant concern on the quality of mobile applications. In order to build a high quality and more reliable applications, there is a need for effective testing techniques to test the applications. Most existing testing technique focuses on GUI events only without sufficient support for context events. This makes it difficult to identify other defects in the changes that can be inclined by context in which an application runs. This paper presents an approach named TEGDroid for generating test case for Android Apps considering both context and GUI Events. The GUI and context events are identified through the static analysis of bytecode, and the analysis of app's permission from the XML file. An experiment was performed on real world mobile apps to evaluate TEGDroid. Our experimental results show that TEGDroid is effective in identifying context events and had 65%-91% coverage across the eight selected applications. To evaluate the fault detection capability of this approach, mutation testing was performed by introducing mutants to the applications. Results from the mutation analysis shows that 100% of the mutants were killed. This indicates that TEGDroid have the capability to detect faults in mobile apps. © 2020 Insight Society.
10.18517/ijaseit.10.1.10194

Maintainability of Automatic Acceptance Tests for Web Applications—A Case Study Comparing Two Approaches to Organizing Code of Test Cases

2020 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Conference Paper
[Context] Agile software development calls for test automation since it is critical for continuous development and delivery. However, automation is a challenging task especially for tests of user interface, which can be very expensive. [Problem] There are two extreme approaches of structuring the code of test duties for web-applicating, i.e., linear scripting and keyword-driven scripting technique employing the page object pattern. The goal of this research is to compare them focusing on the maintainability aspect. [Method] We develop and maintain two automatic test suites implementing the same test cases for a mature open-source system using these two approaches. For each approach, we measure the size of the testing codebase and the number of lines of code that need to be modified to keep the test suites passing and valid through five releases of the system. [Results] We observed that the total number of physical lines was higher for the keyword-driven approach than for the linear scripting one. However, the number of programmatical lines of code was smaller for the former. The number of lines of code that had to be modified to maintain the tests was lower for the keyword-driven scripting test suite than for the linear-scripting one. We found the linear-scripting technique was more difficult to maintain because the scripts consist only of low-level code directly interacting with a web browser making it hard to understand the purpose and broader context of the interaction they implement. [Conclusions] We conclude that test suites created using the keyword-driven approach are easier to maintain and more suitable for most of the projects. However, the results show that the linear scripting approach could be considered as a less expensive alternative for small projects that are not likely to be frequently modified in the future. © 2020, Springer Nature Switzerland AG.
10.1007/978-3-030-38919-2_37

Machine learning-based dynamic analysis of Android apps with improved code coverage

2019 Eurasip Journal on Information Security Article
This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach. © 2019, The Author(s).
10.1186/s13635-019-0087-1

Machine Learning and Constraint Solving for Automated Form Testing

2019 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Conference Paper
In recent years there has been a focus on the automatic generation of test cases using white box testing techniques, however the same cannot be said for the generation of test cases at the system-level from natural language system requirements. Some of the white-box techniques include: the use of constraint solvers for the automatic generation of test inputs at the white box level; the use of control flow graphs generated from code; and the use of path generation and symbolic execution to generate test inputs and test for path feasibility. Techniques such as boundary value analysis (BVA) may also be used for generating stronger test suites. However, for black box testing we rely on specifications or implicit requirements and spend considerable time and effort designing and executing test cases. This paper presents an approach that leverages natural language processing and machine learning techniques to capture black box system behavior in the form of constraints. Constraint solvers are then used to generate test cases using BVA and equivalence class partitioning. We also conduct a proof of concept that applies this approach to a simplified task management application and an enterprise job recruiting application. © 2019 IEEE.
10.1109/ISSRE.2019.00030

Textout: Detecting Text-Layout Bugs in Mobile Apps via Visualization-Oriented Learning

2019 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Conference Paper
Layout bugs commonly exist in mobile apps. Due to the fragmentation issues of smartphones, a layout bug may occur only on particular versions of smartphones. It is quite challenging to detect such bugs for state-of-the-art commercial automated testing platforms, although they can test an app with thousands of different smartphones in parallel. The main reason is that typical layout bugs neither crash an app nor generate any error messages. In this paper, we present our work for detecting text-layout bugs, which account for a large portion of layout bugs. We model text-layout bug detection as a classification problem. This then allows us to address it with sophisticated image processing and machine learning techniques. To this end, we propose an approach which we call Textout. Textout takes screenshots as its input and adopts a specifically-tailored text detection method and a convolutional neural network (CNN) classifier to perform automatic text-layout bug detection. We collect 33,102 text-region images as our training dataset and verify the effectiveness of our tool with 1,481 text-region images collected from real-world apps. Textout achieves an AUC (area under the curve) of 0.956 on the test dataset and shows an acceptable overhead. The dataset is open-source released for follow-up research. © 2019 IEEE.
10.1109/ISSRE.2019.00032

Systematically Testing and Diagnosing Responsiveness for Android Apps

2019 Proceedings - 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019 Conference Paper
App responsiveness is the most intuitive interpretation of app performance from user's perspective. Traditional performance profilers only focus on one kind of program activities (e.g., CPU profiling), while the cause for slow responsiveness is diverse or even due to the joint effect of multiple kinds. Also, various test configurations, such as device hardware and wireless connectivity can have dramatic impact on particular program activities and indirectly affect app responsiveness. Conventional mobile testing lacks mechanisms to reveal configuration-sensitive bugs. In this paper, we propose AppSPIN, a tool to automatically diagnose app responsiveness bugs and systematically explore configuration-sensitive bugs. AppSPIN instruments the app to collect program events and UI responsiveness. The instrumented app is exercised with automated monkey testers and AppSPIN correlates excessive and lengthy program events with bad responsiveness detected at runtime. The diagnosis process also synthesizes the major resource bottleneck for the app. After one test run, AppSPIN automatically alters the test configuration to with most bottlenecked resource to further explore responsiveness bugs happened only with particular test configurations. Our preliminary experiments with 30 real-world apps show that AppSPIN can detect 123 responsiveness bugs and successfully diagnose the cause for 87% cases, within an average of 15-minute test time. Also with altered test configurations, AppSPIN uncovers a notable number of new bugs within four extra test runs. © 2019 IEEE.
10.1109/ICSME.2019.00077

Fragility of layout-based and visual gui test scripts: An assessment study on a hybrid mobile application

2019 A-TEST 2019 - Proceedings of the 10th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, co-located with ESEC/FSE 2019 Conference Paper
Albeit different approaches exist for automated GUI testing of hybrid mobile applications, the practice appears to be not so commonly adopted by developers. A possible reason for such a low diffusion can be the fragility of the techniques, i.e. the frequent need for maintaining test cases when the GUI of the app is changed. In this paper, we perform an assessment of the maintenance needed by test cases for a hybrid mobile app, and the related fragility causes. We evaluated a small test suite with a Layout-based testing tool (Appium) and a Visual one (EyeAutomate) and observed the changes needed by tests during the co-evolution with the GUI of the app. We found that 20% Layout-based test methods and 30% Visual test methods had to be modified at least once, and that each release induced fragilities in 3-4% of the test methods. Fragility of GUI tests can induce relevant maintenance efforts in test suites of large applications. Several principal causes for fragilities have been identified for the tested hybrid application, and guidelines for developers are deduced from them. Copyright © A-TEST 2019 - ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, co-located with ESEC/FSE 2019.All right reserved.
10.1145/3340433.3342824

Event trace reduction for effective bug replay of Android apps via differential GUI state analysis

2019 ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Conference Paper
Existing Android testing tools, such as Monkey, generate a large quantity and a wide variety of user events to expose latent GUI bugs in Android apps. However, even if a bug is found, a majority of the events thus generated are often redundant and bug-irrelevant. In addition, it is also time-consuming for developers to localize and replay the bug given a long and tedious event sequence (trace). This paper presents ECHO, an event trace reduction tool for effective bug replay by using a new differential GUI state analysis. Given a sequence of events (trace), ECHO aims at removing bug-irrelevant events by exploiting the differential behavior between the GUI states collected when their corresponding events are triggered. During dynamic testing, ECHO injects at most one lightweight inspection event after every event to collect its corresponding GUI state. A new adaptive model is proposed to selectively inject inspection events based on sliding windows to differentiate the GUI states on-the-fly in a single testing process. The experimental results show that ECHO improves the effectiveness of bug replay by removing 85.11% redundant events on average while also revealing the same bugs as those detected when full event sequences are used. © 2019 ACM.
10.1145/3338906.3341183

Improving random GUI testing with image-based widget detection

2019 ISSTA 2019 - Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis Conference Paper
Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing. © 2019 Association for Computing Machinery.
10.1145/3293882.3330551

Learning user interface element interactions

2019 ISSTA 2019 - Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis Conference Paper
When generating tests for graphical user interfaces, one central problem is to identify how individual UI elements can be interacted with—clicking, long- or right-clicking, swiping, dragging, typing, or more. We present an approach based on reinforcement learning that automatically learns which interactions can be used for which elements, and uses this information to guide test generation. We model the problem as an instance of the multi-armed bandit problem (MAB problem) from probability theory, and show how its traditional solutions work on test generation, with and without relying on previous knowledge. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
10.1145/3293882.3330569

Understanding ineffective events and reducing test sequences for android applications

2019 Proceedings - 2019 13th International Symposium on Theoretical Aspects of Software Engineering, TASE 2019 Conference Paper
Monkey, which is integrated with the Android system, becomes the most widely used test input generation tool, owing to the simplicity, effectiveness and good compatibility. However, Monkey is based on coordinates of screen and oblivious to the widgets and the GUI states, which results in a great many ineffective events that have no contribution to the test. To address the major drawbacks, this paper parses the events of 200 test sequences generated by Monkey into human-readable scripts and manually investigate the effects of these events. We find three types of patterns on the ineffective events, including no-ops, single and combination of effect-free ones, and summarize them into ten rules for sequence reduction. Then, we implement a tool CHARD to match these patterns in real-world traces and prune the redundant events. The evaluation on 923 traces from various apps covering 16 categories shows that CHARD can process 1,000 events in a few seconds and identifies 41.3% events as ineffective ones. Meanwhile, the reduced sequence keeps the same functionality with the original one that can trigger the same behaviors. Our work can be applied to lessen the diagnose effort for record-and-replay, and as a preprocessing step for other works on analyzing sequences. For instance, CHARD can remove 72.6% ineffective events and saves 67.6% time of delta debugging in our experiments. © 2019 IEEE.
10.1109/TASE.2019.00012