AI in Squash?

Blog Henix
7 min readNov 26, 2024

--

A look back at the presentation given at the Club Qualité Logicielle on 5/11/2024

For the 31st edition of the Club Qualité Logicielle and its Club Utilisateurs Squash, nearly 80 participants gathered at Voie 15 (Paris) to attend the conferences and presentations scheduled throughout the day. This was an opportunity to discuss DevSecOps, best practices and test tools. The afternoon session closed with a discussion of the integration of generative AI into Squash.

AI in Squash: why bring it in?

The digital age has seen a number of revolutions, all of which are taking place in rapid succession:

  • 1990s: thanks to the advent of the Internet, information spreads far and wide. E-commerce and web client applications took hold, pushing back geographical boundaries.
  • 2000s: the arrival of social networks pushes information and knowledge towards interactivity. This is also the beginning of the era of cloud computing and Software as a Service (SaaS) offerings.
  • 2010s: new collaborative modes emerge. Collaborative commerce, blockchain, bitcoin… Intelligence becomes collective, with information sharing fostering knowledge sharing. And agility spreads.
  • 2020s: generative artificial intelligence [1] [2] and natural language processing techniques are becoming more and more widespread, in all sectors and at high speed. These technologies enable the search, analysis, synthesis and generation of information from a gigantic mass of data, thanks to semantics (the creation of meaningful links between documents).

To help the tester, it seemed essential to us at Henix to study this new paradigm and its uses, in order to integrate AI into Squash. All that remained was to define how and for what uses.

Mapping of relevant use cases

We began by mapping the relevant use cases for this integration, i.e. those where generative AI would bring immediate or cumulative benefit.

We were particularly careful not to limit ourselves to test automation use cases. Generative AI is a gas pedal. It’s the cumulative effect of gains across all testing stages and practices that will ultimately lead to more test automation. All the more so as certain automation use cases (code generation) will be addressed at IS level for other practices (development, for example).

This study phase involved mapping and selecting the first use case, based on:

  • Criticality of the use case
  • Potential gains
  • Complexity of implementation (generation of datasets, need to manage data correctly)

Here is the result of this mapping:

First use case: identifying and generating test cases with AI

The main reason for choosing this first use case was the desire to quickly put in place an easy-to-implement first step, in Agile’s Minimum Viable Product (MVP) mode, so as to:

  • Experiment with the technical integration of these new technologies; in this case, Large Language Models (LLMs);
  • Obtain rapid feedback from our users.

In addition, the choice of this use case also offered the following benefits:

  • Identifying and writing the test cases needed to cover a requirement is at the heart of the tester’s job. Helping testers with this task directly facilitates their day-to-day work. And there’s an immediate benefit: by increasing productivity thanks to AI, there’s a direct return on investment.
  • Ensuring the consistency of test assets and their structure is a major challenge for the testing business. Promoting the standardization of test cases using generative tools, which generate test cases according to a given framework, is an “upstream” way of addressing this issue. The “downstream” way is to support the review of test assets, see mapping above.

The aim is to generate test cases (TC) covering a requirement, and for each TC the corresponding scenario.

The principle: set up an interface through which a user can request the generation of TCs from a requirement in the requirements repository.
By querying the language model (using a pre-configured prompt), the interface proposes test cases to the user.
The user can then validate any or all of the generated tests, which are then stored in the test repository.

Advantages: the interface is a facade for the language model, masking the complexity of interaction with the model and making it easier for users to master.
It remains agnostic and works with all models (GPT4, Claude 3.5 Sonnet, Llama, Mistral Large…) and hosts (OpenAI, Anthropic, Together, Hugging Face…).

Limitations: the requirement used for generation must be self-supporting, or the model language will be subject to “hallucinations” to compensate for missing information. Answers from a generic model cannot take into account the company’s “context” (existing test cases, related documents and requirements).

Putting Squash into practice

Squash version 7.0, released in June 2024, now offers:

Compatibility with all models (AI server declaration and configuration in Squash)
Project-based activation

AI identification and generation of test cases from a requirement

Searching for AI-generated test cases

Here is an example of a prompt generating a test case using the GPT-4 model

An initial assessment of the experiment

We need to go beyond the limits of generic models and the limitations of an editor.

The major limitation of test case generation via a requirement comes from the hallucinations of generic language models when the requirement is not self-supporting. This information is often carried by other requirements and/or other documents at company level.

The challenge is therefore to refine the results produced by generative AI:

  • Managing test case generation
    By allowing the user to modify his prompt or give optional instructions for the generation of the TC;
    By using Chat to allow the user to communicate with the model to guide the refinement of the result.
  • Completing the information provided to the model
    Provide the model with additional elements: images contained in the requirement; peripheral requirements, or even texts, images, appendices… supplied either manually by the user, or automatically (e.g. RAG). Or enable the model to “request” more information (using its tool invocation capabilities).

As far as the context is concerned, we are currently exploring the following avenues to overcome these limitations:

  • Retrieval-Augmented Generation (RAG)
    This advanced AI approach combines information retrieval and text generation, in two stages:
    - Retrieval: dynamic search for relevant, contextual information in a vast knowledge base (data, documents…)
    - Generation: using the retrieved snippets as additional context to refine the answer.
  • Context caching
    Its principle: take advantage of the fact that the limit linked to the number of cached tokens tends no longer to be a limiting factor. The interface then caches a selection or even all requirements. For more information: https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview?hl=en
    Cost: the cost is related to setting up and/or refreshing the cache. But it can be shared.
  • Tools
    Its principle: provide hooks so that the Large Language Model (LLM) can “request the info” it’s missing. For more information: https://python.langchain.com/docs/how_to/#tools
    Cost: the cost is linked to the implementation of the endpoint. How do you provide the relevant information when the LLM asks for it?

One difficulty for us, as software publishers, that other consumers of Generative AI don’t necessarily have, lies in building up substantial test assets that can serve as viable test data for these AI functionalities. Similarly, feedback from “real” users is crucial in assessing the relevance of proposed solutions, which are only viable if they bring value and gains to testers.

The integration of AI into Squash is in its infancy, and there’s still a long way to go! In particular, we are currently working on facilitating the writing of test cases in semi-structured language (Gherkin) along two axes:

  • Intelligent auto-completion via semantic search in the action library
  • Maintenance and search for potential duplicates in this library.

We will publish a new article as soon as the next step has been taken.

Don’t hesitate to subscribe to the Squash newsletter for the latest news on the progress of this major project.

To find out more:
Discover Squash:

Consult Squash’s online documentation:

For the Squash Team

Florian GAUTIER
Automation & DevOps Practice manager

[1] In this text, for ease of reading, we will use the terms Generative Artificial Intelligence (GAI), Artificial Intelligence (AI), Large Language Model (LLM) interchangeably and sometimes a little abusively. In this case, Generative AI refers to the tools used to generate another document (text, image, video, etc.) from one. Similarly, the term Large Language Model (LLM) should be used when referring to the algorithmic engine (deep neural networks coupled with embedding engines — vector representation of the words making up a document).

[2] The aim of this document is not to provide an overview of all Artificial Intelligence (AI) methods, but rather to focus on the contributions made by the sub-group known as Generative AI to testers.

--

--

Blog Henix
Blog Henix

Written by Blog Henix

Henix est une ESN indépendante pure player de la qualité logicielle avec 3 expertises : le conseil / réalisation, la formation et l’édition de Squash. henix.com

No responses yet