Test exploratory testing!

Exploratory testing, which was promoted at the beginning of the 90’s by the Context-Driven School, has become a frequently used terminology. However, beyond the buzzword, the methodology is sometimes misunderstood.
How to define it and how to implement it? What are its advantages and disadvantages? When to use it?

11 min readMar 30, 2022

The often-mistaken idea of exploratory testing

During the last Squash Users Club, we presented some features we could add to the Squash suite (the presentation is available here). Among them, we talked about supporting exploratory test management: defining the charter, planning the test session, recording its results and reporting. The feedback was very mixed: some attendees had the feeling that exploratory testing is a return to “in a hurry” testing. But, no, exploratory testing is not chaotic, ad hoc or simple monkey testing (even if the latter is part of the exploratory tester’s toolbox). Exploratory tests are structured tests that contribute to the improvement of the product. This post tries to explain what exploratory testing is, how to organize it and what are its benefits and limitations.

What is exploratory testing

Exploratory testing consists in discovering the behavior of the application, defining the next tests to be done according to this behavior and running them.

There is a permanent loop between the discovery of the application, the design of the next test and its execution:

A personal example

I was working on an ERP. A developer from the team was charged to reimplement the page allowing to see the logs of all the last crons (background tasks) executed, but I know that, given the mass of data, we have risks of having performance problems. So I decide to do some exploratory testing.

I discover the page. I know what has been specified, a specification that roughly defines the layout of the page and the information to be displayed. I check that the page respects it.

I click on a cron in the list on the left. Its logs are displayed on the right.

I wonder what happens to a cron that generates a lot of logs. I select one of them. The logs are well displayed: scrollbar present, no problem, I can read everything.

But while scrolling in the table to find this cron, I heard the fan of my laptop starting. This intrigues me. I’m going to test the browser load. I bring up the Windows resource monitor and scroll through the list. Chrome consumes a lot of CPU, the load drops a few seconds after I stop scrolling.

I find this suspicious. I will test a long scroll. I do the test by going through all the crons of the morning. After about 40 seconds of scrolling, JavaScript crashes, running out of memory. I record the anomaly.

I resume the test by analyzing the sorting, the filtering… and their performances. I can’t find any other problem.

This real example is simplistic, the tests designed are quickly performed, the session was short (about 20 minutes in all) and personal. But it shows the cycle of behavior discovery → next test design → test execution → behavior discovery… which is the principle of exploratory testing.

How to do exploratory testing

A frequently used method for organizing exploratory testing is to do sessions.

Definition of the charter

Before the session, the test team leader defines a charter for the session. This charter specifies:

the purpose of the session: a sentence summarizing the mission of the people participating in the session
the date and time slot of the session
the people participating
the scope of the session
- What are the features and/or non-functional requirements to be tested
- Are there any specific points to be tested? If so, define them in the form of a checklist.
- Are there any points that are outside the scope of the session and therefore not to be looked at?
testing tactics: how should the tests be distributed among the participants? Everyone can play a different persona. Everyone can take a different usage workflow and study its variations. Some teams use the Six Thinking Hats method…
the test environment: which version of the SUT is deployed on which instance? which browsers to use? on which OS?

It is possible, and recommended, to add any information that can help in the design of the tests during the session: pointers to sources of information (user documentation, specifications or user stories, ISO or RFC standards, etc.), anomalies found in the past that affect the scope of the session, feedback from developers that certain functions are not working properly.

Sample Charter

Friday 17/07 morning session 8am-12:30pm.
Objective:
Test the app on different browsers to verify internationalization mechanisms.
Scope:
We will base ourselves on the requirements. As they are short, I have copied them here.
As a reminder, the only commonly implemented locales are listed in the requirement UI_L10N: The software must support the following locales: en-US, en-GB, es-ES and fr-FR.
You must test
*the offline pages whose behavior is defined in the requirement UI_I18N_OFF_LINE: For out of session pages (login, password recovery, credits…), the chosen locale must be the implemented locale that has the highest priority in the browser’s prioritized list of locales. If the browser’s list does not contain any locale supported by the application, the application must use en-US.
*the pages in session whose behavior is defined in the requirement UI_I18N_SESSION: When the user is logged into the application, the texts must be in the locale defined by the user’s preferences.
You must check:
*the language
*the amount format
*the hour format
*the day format
*the first day of the week
*the first week of the year
Each of the supported locales should have been tested at least once both in and out of session.
Feel free to check other things depending on the locale or language. We recently fixed some space issues before punctuation (English vs. French) for dynamically generated messages, there is probably more to discover along the way.
Participants:
The whole team. Mathilde (the PO) is there as support in case of questions.
Dispatch:
Andrew: Chrome + Firefox in English GB and US
Arnold: Chrome + Edge in Spanish
Léa: Safari in all languages (try to check that locale changes are ok)
Théo: Chrome in all languages (focus on admin pages)
Me: Chrome + Firefox + Edge in French and English GB/US
Test environment:
Tests to be done on the P3 instance (check that it is the end of sprint 19 version).
Test with Chrome latest, Firefox latest, Edge latest and Safari latest.

During the session

The charter was communicated to the participants before the session. However, it is often useful for the test team leader to give a briefing at the beginning of the session to make sure everyone understands the purpose of the session and to answer any questions. Some teams define the testing tactics during this briefing (rather than having the leader define them in the charter).

It is preferable that the team is co-located in the same room, or via a chat, during the session. This way, everyone can exchange on their progress: “I think I’ll check that the units are in imperial on the US version, has anyone already done it?” “I have this error message in the JS console, does it tell you anything?”…

In order to be able to exploit the results of the session, each participant must record a description of what he is testing with the appropriate level of detail. It is not necessary to systematically detail all the actions performed, otherwise the tester spends most of his time writing what he is testing instead of testing. The goal is to have an idea of what was tested and what was left out. In some cases, the precise details of the actions will need to be reported (e.g. if the testing tactic is to scan a subset of the combinatorial possibilities of a feature, the details of the combinations tested will need to be reported).

When anomalies are found, they are qualified and recorded.

It is recommended that a person responsible for the features is available during the session (e.g. the Product Owner for teams working in Scrum). Sometimes the test team sees “weird” things and does not know if they should be considered as anomalies. If such a person is not available, the question can be noted in the session report to be asked later by the tester or recorded in the bug tracking system for projects that accept such practices.

After the session

After the session, the team can make a quick review and build an overview of which parts of the software were correct, which parts had problems, what types of problems were found, which testing techniques uncovered them, what other techniques could have been used… This will help in the design of the tests in the next session.

It may happen that the team thinks that the session did not cover enough use cases because the scope was too large for the time allocated, because some features were found to be defective and should be tested more… Also, when reviewing the notes from the session, the test team leader may feel that some points were not explored. If scheduling and priorities allow, a second session can be arranged.

The questions recorded during the session should be managed as soon as the people who can answer them are available.

The notes should be archived. It can be useful for future analysis: this newly reported bug is on a workflow that I think was tested two months ago, I can re-read the notes to check and quickly deduce that it was not on the previous version.

Many teams collect metrics (e.g. number of anomalies found per hour of testing, number of anomalies per severity…). These metrics can be used to provide management with a vision of the efficiency, or inefficiency, of exploratory testing. They can also be used to evaluate the evolution of the software quality, but this is often difficult: does the number of bugs found per hour increase because the quality deteriorates, because the testers are more efficient at finding anomalies or because the last sessions tested more risky features?

The benefits of exploratory testing

After an initial phase where a tester who has never done exploratory testing feels a bit lost, I have always seen test teams very motivated by this type of activity: first, the intellectual challenge is much greater than acting as a human-robot replaying the same scenario for the nth time; second, testers feel empowered by the fact that they are trusted to design the tests by themselves

Exploratory testing allows a much wider range of use cases to be covered than if the same tests based on specific scenarios are repeated for each testing session.

Exploratory tests allow you to get a first idea of the quality level of a new feature before launching the execution of the entire test plan. If the quality level is too low, serious bugs will have to be fixed first (some teams use a notion of “smoke tests” for the same purpose).

Exploratory testing allows you to test certain aspects of the software without having to pay the cost of writing manual scenarios or automatic tests. A common situation (e.g. for a startup that is betting on the advantage of being first in the field) is that a team may not have enough time to test everything they would/should like in a structured way, but still do not want to skip over some non-priority items, they can test these at a lower cost by using exploratory testing.

On big applications where some testers specialize on certain functional parts, exploratory testing allows them to keep an overview of the product. During the session, they have the chance to see features they are less familiar with and testers specializing in these features can share some of their expertise.

For teams with a very high functional coverage rate thanks to their automated tests, there is still a need for humans to look at the product. Automated test oracles don’t detect everything, for example, white text on a white background will probably fall through (because the Selenium test checks that the text is displayed, but not its color). Exploratory tests are a way to organize this human view.

The disadvantages of exploratory testing

Testers must be accurate in reporting anomalies. They cannot refer to an existing test case. The usage workflow must be detailed enough to be reproducible by the other team members (project manager, product owner, developers, testers). Therefore, writing bug reports is time consuming.

Whatever the test methodology — exploratory or otherwise — testers must qualify each anomaly: In which scenarios is it present? Is the impact always the same? Are there frequent scenarios and/or severe impacts that show that this anomaly must be corrected quickly?

This qualification of anomalies is particularly important during exploratory testing where it is common for the tester to find something “odd” that is just a small anomaly, and then, upon qualifying/exploring, realize that the problem may be serious. So, this qualification is also time consuming, but it is necessary for the team to manage the fixes with the right levels of priority.

The tests conducted and, consequently, the anomalies found depend on each individual in the team. Depending on his or her culture, experience, knowledge of the functional domain, etc., everyone, and therefore the team as a whole, will perform different tests. If you do an exploratory testing session with other people, you will not get the same results. There is no repeatability.

It is important that team members understand as much as possible how the software under test is used by real users. Otherwise, they will be less critical of its usability.
Experienced testers are generally better and faster at qualifying anomalies. They also have more methodological knowledge to imagine the next tests they will do.

Exploratory testing does not impose a list of scenarios to pass. It is possible that a basic functional scenario was not executed during the session. Therefore, you cannot rely on exploratory testing alone to be sure that the software under test supports all the basic workflows required. They cannot be the only criterion for deciding on a production release.

Some testers tend to focus on technical problems, they get caught up in the fun of crashing the software, even if it means using convoluted scenarios (if I enter a string of more than 512 Emoji characters in the address field, I generate an ArrayIndexOutOfBoundsException), instead of seeing serious usability problems (filling my cart without having an account, then creating my account at checkout time, I end up with an empty cart). If you have the right resources, it is more efficient to detect user input validation issues through automated fuzz testing or a scan performed by the security team. If not, user input validation can be part of exploratory testing, but without falling into the syndrome of spending most of the time on it.

Conclusion

Exploratory testing is not a miracle solution. But it is an interesting instrument in the methodological toolbox of a test team. I can only encourage you to do a few sessions. Define in which framework you want to do them and a first version of the process, depending on the type of software you are testing, its life cycle (agile or waterfall), the panel of tests (manual and automatic) you already have in place (the interest of retesting the same thing in exploratory testing may be limited compared to filling in gaps on tests you are not doing), the experience of the team members… Do a few retrospectives after the sessions to adapt your process according to the feedback of your team. Then, finally, after several sessions, decide if this testing technique seems to be a good way to improve the team’s work and if you should implement it permanently.

Laurent Mazuré

Project Director

Translation: Thibaut Lefaucheur