MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests

arXiv — cs.LGTuesday, November 25, 2025 at 5:00:00 AM
  • MASTEST, a multi-agent system utilizing large language models (LLMs), has been developed to enhance the testing of RESTful APIs, crucial for cloud-native application quality assurance. The system automates the entire API testing workflow, from generating test scenarios based on OpenAPI specifications to executing tests and analyzing responses for correctness and coverage.
  • This advancement is significant as it integrates LLMs with programmed agents, potentially increasing the efficiency and accuracy of API testing processes. Furthermore, it allows human testers to review and refine the generated test artifacts, ensuring high-quality outcomes.
  • The emergence of LLMs in testing and evaluation tasks reflects a growing trend in leveraging AI for automation in various domains. While some studies highlight the effectiveness of LLMs in grading and assessment, contrasting findings raise concerns about their reliability in certain contexts, indicating a need for careful integration of AI tools in critical testing environments.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding
PositiveArtificial Intelligence
A new framework called Perception Loop Reasoning (PLR) has been introduced to enhance video understanding by addressing the limitations of existing Video Reasoning LLMs, which often rely on a flawed single-step perception paradigm. This framework integrates a loop-based approach with an anti-hallucination reward system to improve the accuracy and reliability of video analysis.
Toward Trustworthy Difficulty Assessments: Large Language Models as Judges in Programming and Synthetic Tasks
NegativeArtificial Intelligence
Large Language Models (LLMs) like GPT-4o have been evaluated for their effectiveness in assessing the difficulty of programming tasks, specifically through a comparison with a Light-GBM ensemble model. The study revealed that Light-GBM achieved 86% accuracy in classifying LeetCode problems, while GPT-4o only reached 37.75%, indicating significant limitations in LLMs for structured assessments.
Bridging Symbolic Control and Neural Reasoning in LLM Agents: The Structured Cognitive Loop
PositiveArtificial Intelligence
A new architecture called Structured Cognitive Loop (SCL) has been introduced to address fundamental issues in large language model agents, such as entangled reasoning and memory volatility. SCL separates cognition into five distinct phases: Retrieval, Cognition, Control, Action, and Memory, while employing Soft Symbolic Control to enhance explainability and controllability. Empirical tests show SCL achieves zero policy violations and maintains decision traceability.
Lessons from Studying Two-Hop Latent Reasoning
NeutralArtificial Intelligence
Recent research has focused on the latent reasoning capabilities of large language models (LLMs), specifically through a study on two-hop question answering. The investigation revealed that LLMs, including Llama 3 and GPT-4o, struggle with this basic reasoning task without employing chain-of-thought (CoT) techniques, which are essential for complex agentic tasks.
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation
PositiveArtificial Intelligence
A new approach called TRIM has been introduced to address the high inference costs associated with Large Language Models (LLMs). This method optimizes language generation by allowing LLMs to omit semantically irrelevant words during inference, followed by reconstruction of the output using a smaller, cost-effective model. Experimental results indicate an average token saving of 19.4% for GPT-4o with minimal impact on evaluation metrics.
Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models
NeutralArtificial Intelligence
A recent study evaluated the safety of four leading multimodal large language models (MLLMs) under adversarial conditions, revealing significant differences in their vulnerability to harmful prompts. The models tested included GPT-4o, Claude Sonnet 3.5, Pixtral 12B, and Qwen VL Plus, with Pixtral 12B showing a harmful response rate of approximately 62%, while Claude Sonnet 3.5 demonstrated the highest resistance at around 10%.
SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System
PositiveArtificial Intelligence
Recent advancements in multimodal large language models (MLLMs) and video agent systems have led to the development of SciEducator, an innovative multi-agent system designed for scientific video comprehension and education. This system utilizes the Deming Cycle's iterative approach to enhance the understanding of complex scientific processes through tailored multimodal educational content.
Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering
NeutralArtificial Intelligence
Recent research has evaluated the performance of large vision language models (VLMs) in answering medical questions based on visual information, specifically using the EuropeMedQA Italian dataset. Four models were tested: Claude Sonnet 4.5, GPT-4o, GPT-5-mini, and Gemini 2.0 flash exp. The findings indicate varying degrees of visual grounding, with GPT-4o showing the most significant drop in accuracy when visual information was altered.