Learn from the Testing Experts
26th February, 2026
PUNE
Featured Speaker
Container-First Testing: Bringing Production to JUnit for Microservices at Scale
Modern microservice architectures demand testing against real infrastructure — yet most teams still rely on brittle mock-based setups that hide failures until late stages. TestContainers brings production-like infrastructure inside JUnit, spinning up real databases, messaging systems, and cloud services on demand for every test run. This container-first approach delivers deterministic, CI-friendly, and infinitely scalable microservice testing without shared environment dependencies. The session will highlight proven patterns, pitfalls to avoid, and a practical migration roadmap for teams moving from mocks to containerized testing. Attendees will walk away with techniques to boost reliability and release velocity without compromising innovation.
Takeaways from this talk
- Realistic API Testing: Utilizing real databases and cloud services for testing, rather than mocks or in-memory alternatives.
- Ephemeral Environments: Creating clean, throwaway, and containerized test environments, ensuring test isolation and repeatability.
- Integration: Seamlessly integrating Testcontainers with popular testing frameworks and CI/CD tools like JUnit, Spring Boot Test, RestAssured, GitHub Actions, and Jenkins.
- Service Communication Testing: Addressing testing for components like Kafka and RabbitMQ within the microservice landscape.
Quality Assurance for AI Agents: Context-Driven Evaluation at Scale
Introduction
AI agents are increasingly autonomous and entrusted to function effectively within complex environments. This growing reliance on AI systems necessitates a transition from traditional output-focused evaluation methods to adaptive testing approaches. Adaptive testing is designed to capture how AI agents behave and respond under dynamic, real-world conditions, thereby providing a more holistic understanding of their operational reliability.
Context-Aware Evaluation Approach
To address the evolving challenges of AI agent assessment, we advocate for a context-aware evaluation methodology grounded in Responsible AI principles. This approach emphasises rigorous evaluation standards, contextual validation, and robust safety assurance measures. It is not limited to assessing the final outputs of AI agents; instead, it also scrutinises their reasoning processes, the effectiveness of prompting techniques, and cost-related parameters such as token-efficient structured outputs. The evaluation further considers scenarios in which agents must make principled refusals—particularly when faced with uncertainty or policy constraints.
Unified Evaluation Paradigms
Our proposed approach integrates human-in-the-loop reviews, LLM-as-a-judge scoring, and coded evaluation techniques into scalable testing paradigms. This unified approach enables testing teams to systematically mitigate risks commonly encountered in AI systems, including hallucinations, conflicting instructions, and context loss. Through these combined strategies, teams can deliver measurable outcomes such as accuracy, robustness, appropriate refusal behaviour, and regulatory compliance.
Empowering Global Testing Organisations
This strategic reorientation empowers global testing organisations to certify the autonomy of AI agents with confidence. By implementing these context-driven evaluation practices, organisations can ensure that agent behaviour remains predictable, governable, and ready for production—meeting the evolving needs of enterprises worldwide.
Takeaways from this talk
- Transition from output-focused testing to comprehensive context-driven evaluation of agent reasoning, prompt efficacy, and principled refusal mechanisms.
- Identify model limitations early, including reasoning deficiencies and token inefficiency in structured outputs.
- Leverage hybrid evaluation paradigms—human-in-the-loop, LLM-as-a-judge, and coded benchmarks—for scalable, quantifiable quality assurance.
- Empower testing teams with proven methodologies to deliver reliable, cost-optimized, production-ready AI agents.
Testing Enough in GenAI: Balancing Automation Velocity with Human Judgment
With GenAI reshaping software testing, automation coverage is accelerating at unprecedented speed. Yet, as AI takes on more responsibility for generating test data, writing scripts, creating scenarios, and even deciding what to test, a fundamental question becomes more important than ever: What does “enough testing” mean when AI is helping decide? This session challenges the traditional coverage-centric mindset and introduces a human-centric framework for determining sufficiency in AI-powered automation. Instead of measuring success only by execution volume, it emphasizes risk, user intent, ethical considerations, business value, security exposure, model-drift sensitivity, and explainability. Through real examples and decision heuristics, participants will learn how to strike the right balance between AI-driven automation and purposeful exploratory and experiential testing performed by humans. The talk highlights how testers can evolve from “test executors” to “quality strategists” in the GenAI era ensuring not just fast releases, but safe, responsible, and meaningful releases.
Takeaways from this talk
- Re-defining sufficiency in AI-assisted testing
- Decision matrix for balancing automated vs human testing
- Metrics for value-oriented rather than volume-oriented coverage
- How testers can evolve into human-centric quality strategists


