Developer
Ankur Sharma
ankusharma@google.com
Performance
Key patterns and highlights from this developer's activity.
Breakdown of growth, maintenance, and fixes effort over time.
Bugs introduced vs. fixed over time.
Reclassifies engineering effort based on bug attribution. Commits that introduced bugs are retrospectively counted as poor investments.
Investment Quality reclassifies engineering effort based on bug attribution data. Commits identified as buggy origins (those that introduced bugs later fixed by someone) have their grow and maintenance time moved into the Wasted Time category. Their waste (fix commits) remains counted as productive. All other commits retain their standard classification: grow is productive, maintenance is maintenance, and waste (fixes) is productive.
The standard model classifies commits as Growth, Maintenance, or Fixes. Investment Quality adds a quality lens: a commit that introduced a bug is retrospectively counted as a poor investment — the engineering time spent on it was wasted because it ultimately required additional fix work. Fix commits (Fixes in the standard model) are reframed as productive, because fixing bugs is valuable work.
Currently computed client-side from commit and bug attribution data. Ideal server-side endpoint:
POST /v1/organizations/{orgId}/investment-quality
Content-Type: application/json
Request:
{
"startTime": "2025-01-01T00:00:00Z",
"endTime": "2025-12-31T23:59:59Z",
"bucketSize": "BUCKET_SIZE_MONTH",
"groupBy": ["repository_id" | "deliverer_email"]
}
Response:
{
"productivePct": 74,
"maintenancePct": 18,
"wastedPct": 8,
"buckets": [
{
"bucketStart": "2025-01-01T00:00:00Z",
"productive": 4.2,
"maintenance": 1.8,
"wasted": 0.6
}
]
}Latest analyzed commits from this developer.
| Hash | Message | Date | Files |
|---|
Commit activity distribution by hour and day of week. Shows when this developer is most active.
Developers who frequently work on the same files and symbols. Higher score means stronger code collaboration.
| Effort |
|---|
| f75de593 | This commit provides a **bug fix** for the **In memory session service** by resolving an intermittent `RuntimeError: dictionary changed size during iteration`. Specifically, the `_list_sessions_impl` function within `src/google/adk/sessions/in_memory_session_service.py` is modified to evaluate dictionary keys and values into an isolated snapshot sequence before iteration. This change prevents the dictionary from being altered during the loop, thereby eliminating the runtime error. The fix significantly improves the stability and reliability of the **session management subsystem** when listing active sessions. | Mar 31 | 1 | waste |
| 38bfb447 | This commit introduces **new evaluation metrics** for multi-turn agent interactions: **Multi-Turn Trajectory Quality V1** and **Multi-Turn Tool Use Quality V1**. These **new capabilities** enhance the `google.adk.evaluation` framework by providing a more granular assessment of agent performance. The **Multi-Turn Trajectory Quality V1** metric evaluates the *path* an agent takes to achieve a goal, distinct from just overall task success, while the **Multi-Turn Tool Use Quality V1** metric assesses the effectiveness and appropriateness of tool usage, delegating to the Vertex Gen AI Eval SDK. This integration provides a deeper understanding of agent behavior in complex, multi-step scenarios. | Mar 16 | 7 | grow |
| 9a75c068 | This commit introduces a **new capability** to evaluate **multi-turn task success** within the `google.adk.evaluation` module. A new `MultiTurnTaskSuccessV1Evaluator` is added, which delegates the evaluation responsibility to the **Vertex Gen AI Eval SDK** for assessing multi-turn conversations. This involves updating the metric registry and info providers, and implementing a dedicated `_MultiTurnVertexiAiEvalFacade` to manage the interaction with the external SDK for multi-turn scenarios. This enhancement provides a robust method for evaluating the end-to-end success of conversational AI agents across multiple turns. | Mar 16 | 8 | grow |
| 8374d9bf | This commit **refactors** the **Vertex AI evaluation facade** within the `google.adk.evaluation` module to improve its extensibility and organization. The existing `_VertexAiEvalFacade` class is transformed into an abstract base class, centralizing common client initialization and helper methods for various evaluation types. A new concrete class, `_SingleTurnVertexAiEvalFacade`, is introduced to specifically handle single-turn evaluations, inheriting from the new abstract parent. This **refactoring** allows for easier implementation of future specialized Vertex AI evaluation scenarios, such as multi-turn conversations. Downstream modules like `response_evaluator.py` and `safety_evaluator.py` are updated to instantiate the new `_SingleTurnVertexAiEvalFacade`, ensuring continued functionality. | Mar 7 | 4 | maint |
| 43d6075e | This commit introduces a **new capability** to the **Vertex AI evaluation facade**, enabling the Vertex AI client to be initialized using an API key. Specifically, the `_perform_eval` method in `src/google/adk/evaluation/vertex_ai_eval_facade.py` now supports retrieving an API key from environment variables, offering an alternative to the existing project and location-based authentication. This enhancement provides greater flexibility for users to authenticate with **Vertex AI evaluation** services, simplifying setup in various deployment environments. Comprehensive unit tests have been added to ensure the robustness of this new initialization method, covering API key usage, project/location scenarios, and appropriate error handling for missing credentials. | Jan 26 | 2 | maint |
| c222a45e | This commit **refines the regular expression** within the **`.github/workflows/check-file-contents.yml`** workflow to prevent over-matching. By adding **word boundary anchors**, the regex now strictly identifies references to 'cli', avoiding false positives for similar-looking strings like `_client_labels_utils`. This **chore** specifically **fixes an existing issue** where the previous regex was too broad. The update ensures more accurate content checks, improving the reliability of the automated workflow by only flagging genuine 'cli' references. | Jan 16 | 1 | maint |
| 960b2067 | This commit **bumps the project version** for the **ADK (Android Development Kit)** from `v1.19.0` to `v1.20.0`. It updates the `__version__` string within the `src/google/adk/version.py` file as part of a **new release preparation**. This **maintenance chore** signifies the availability of a new stable version, impacting all users and downstream projects relying on the ADK. | Dec 3 | 2 | maint |
| 2a1a41d3 | This commit introduces a **new capability** to automatically tag model calls originating from the **evaluation service** with specific client labels. It adds a new utility module, `_client_labels_utils.py`, which provides functions like `client_label_context` to manage and apply temporary client labels, including a default "Eval Client" label. The **Google LLM client** (`google_llm.py`) was refactored to integrate this new labeling mechanism, ensuring that all inference and metric generation calls made during evaluations are properly identified for improved tracking and observability. This enhancement allows for clearer distinction and analysis of model usage during evaluation processes. | Dec 1 | 5 | maint |
| dc3f60cc | This commit **refactors** the **`LocalEvalService`** to facilitate the injection of a `memory_service`. Specifically, an optional `memory_service` parameter is added to the `LocalEvalService` constructor (`__init__`) and subsequently **plumbed** to the **`EvaluationGenerator`** upon its creation. This **chore** ensures that the `EvaluationGenerator` within the `adk.evaluation` subsystem can access a dedicated memory management service, laying the groundwork for future enhancements in evaluation state handling or data caching. The change primarily impacts the internal dependency flow between these core evaluation components. | Nov 19 | 1 | maint |
| b2c45f8d | This commit **enhances error messaging** for `RESOURCE_EXHAUSTED` (HTTP 429) errors originating from the Gemini API within the **Google LLM integration**. It introduces a new custom exception, `_ResourceExhaustedError`, which the `generate_content_async` method now raises to provide more specific and actionable information to users. This **improves the developer experience** by clarifying rate limit issues, with accompanying unit tests ensuring the correct propagation of this specific error while re-raising other client errors. | Nov 18 | 2 | waste |
| 696852a2 | This commit introduces a **new capability** to enhance the **resilience of LLM-based evaluations** by adding default retry options to `llm_request` calls. A new utility module, `src/google/adk/evaluation/_retry_options_utils.py`, defines these fallback options and a plugin, which is integrated into the `EvaluationGenerator`. This ensures that various **evaluation modules**, including `HallucinationsV1Evaluator`, `LlmAsJudge`, and `LlmBackedUserSimulator`, automatically benefit from improved robustness against temporary model failures. This **enhancement** provides a default safety net for LLM interactions during evaluations, while explicitly honoring any retry configurations already specified by the developer. | Nov 14 | 6 | grow |
| e2d3b2d8 | This commit **introduces new matching capabilities** for tool call trajectories within the `ToolTrajectoryAvgScore Metric`, specifically adding `IN_ORDER` and `ANY_ORDER` match types. It defines a new `ToolTrajectoryCriterion` class in `src/google/adk/evaluation/eval_metrics.py` to configure these match types. The core matching logic for `IN_ORDER` and `ANY_ORDER` comparisons is implemented and integrated into the `src/google/adk/evaluation/trajectory_evaluator.py` module. This **new feature** significantly **enhances the flexibility** of evaluating tool usage by allowing more nuanced comparisons of tool call sequences, with comprehensive unit tests ensuring correctness. | Nov 12 | 3 | maint |
| b1ff85fb | This commit introduces a **new capability** to control logging verbosity for **ADK CLI evaluation commands**. It adds a `--log_level` option to commands such as `cli_eval`, `cli_create_eval_set`, and `cli_add_eval_case` within `src/google/adk/cli/cli_tools_click.py`. This enhancement allows users to specify the desired logging level, directly integrating with and configuring the **ADK logger**. The change provides greater flexibility for debugging and monitoring by enabling users to tailor the amount of information output during ADK evaluation processes. | Oct 31 | 1 | grow |
| 5cb35db9 | This commit introduces a **readability improvement** to the **CLI evaluation results rendering** within the `adk` project. The `pretty_print_eval_result` function in `src/google/adk/cli/cli_eval.py` is modified to **avoid rendering DataFrame columns that contain only `None` values**. This **maintenance update** specifically addresses cases where columns like "expected response" or "expected tool calls" are empty for user-simulated conversations, preventing unnecessary clutter. The change ensures a cleaner and more focused output, significantly **enhancing the readability** of detailed evaluation results for users. | Oct 30 | 1 | waste |
| 72a8d8d8 | This commit **fixes** an issue in the **CLI tool** by making the `session_input_file` argument **required** for the `cli_add_eval_case` command within the `google.adk` project. Previously, it was possible to create an evaluation case without this critical input, which is necessary for running evals and retrieving session data. This **refinement** ensures the integrity of the **evaluation case creation workflow** by enforcing the presence of essential session configuration. The **test suite** has been updated to reflect this new mandatory requirement, removing tests for optional input and adding it to existing tests. | Oct 29 | 2 | maint |
| 1aa9bb13 | This commit **refactors** the project's integration with the **Vertex AI SDK** by centralizing its dependencies through an internal proxy. Specifically, the **`vertex_ai_rag_retrieval` tool** now imports its `rag` components via the internal `src/google/adk/dependencies/vertexai.py` module, rather than directly from `vertexai.preview`. This **maintenance** change also updates the internal `dependencies.vertexai` module to expose `example_stores` from `vertexai.preview`. The overall impact is to establish a consistent internal interface for **RAG (Retrieval Augmented Generation)** and **example store** functionalities, improving maintainability and control over external SDK usage. | Oct 28 | 3 | maint |
| b17c8f19 | This commit **refactors** the **ADK evaluation framework** by making the `expected_invocation` field **optional** across the core `evaluator` interface and its associated data structures, such as `EvalMetricResultPerInvocation`. This change accommodates existing metrics that do not rely on expected invocations and improves support for conversation scenarios where this field might not be applicable. Various specific evaluators within the `google.adk.evaluation` module, including `final_response_match`, `hallucinations`, and `llm_as_judge`, have been updated to gracefully handle this optionality, either by accepting `None` or explicitly requiring the field when necessary for their specific logic. The `local_eval_service` and `vertex_ai_eval_facade` were also updated to support this more flexible evaluation paradigm, enhancing the system's adaptability to diverse evaluation requirements. | Oct 28 | 15 | maint |
| 00d147d6 | This commit provides a **bug fix** to the **evaluation configuration module** within the ADK. Specifically, the `get_evaluation_criteria_or_default` function in `src/google/adk/evaluation/eval_config.py` has been updated to explicitly check if the evaluation configuration file exists before attempting to read it. This change prevents `FileNotFoundError` exceptions, making the **evaluation pipeline** more robust when the configuration file is absent. New unit tests have also been added to verify this improved error handling. | Oct 27 | 2 | waste |
| bb32a635 | This commit **improves debuggability** within the **evaluation service** by enhancing error logging during the inference step. It modifies the `_run_inference` function in `src/google/adk/evaluation/local_eval_service.py` to capture and include full traceback information in error logs when inference fails. This **maintenance chore** provides developers with crucial debugging context, making it significantly easier to diagnose and resolve issues that previously obscured the root cause of failures. The change ensures that comprehensive error details are available, preventing the loss of vital traceback information. | Oct 23 | 1 | maint |
| 955632ce | This commit introduces a **new feature** that enhances the **agent evaluation system** by allowing the `agent_evaluator` to recognize and load agents from Python modules whose filenames end with `.agent`. Specifically, the `_get_agent_for_eval` function in `src/google/adk/evaluation/agent_evaluator.py` was modified to support this new naming convention. This change provides greater flexibility in organizing and identifying agent modules, enabling developers to use `.agent` as a distinct suffix for agent definitions. An accompanying integration test in `tests/integration/test_single_agent.py` validates the correct evaluation of agents using this new module naming scheme. | Oct 23 | 2 | maint |
This commit provides a **bug fix** for the **In memory session service** by resolving an intermittent `RuntimeError: dictionary changed size during iteration`. Specifically, the `_list_sessions_impl` function within `src/google/adk/sessions/in_memory_session_service.py` is modified to evaluate dictionary keys and values into an isolated snapshot sequence before iteration. This change prevents the dictionary from being altered during the loop, thereby eliminating the runtime error. The fix significantly improves the stability and reliability of the **session management subsystem** when listing active sessions.
This commit introduces **new evaluation metrics** for multi-turn agent interactions: **Multi-Turn Trajectory Quality V1** and **Multi-Turn Tool Use Quality V1**. These **new capabilities** enhance the `google.adk.evaluation` framework by providing a more granular assessment of agent performance. The **Multi-Turn Trajectory Quality V1** metric evaluates the *path* an agent takes to achieve a goal, distinct from just overall task success, while the **Multi-Turn Tool Use Quality V1** metric assesses the effectiveness and appropriateness of tool usage, delegating to the Vertex Gen AI Eval SDK. This integration provides a deeper understanding of agent behavior in complex, multi-step scenarios.
This commit introduces a **new capability** to evaluate **multi-turn task success** within the `google.adk.evaluation` module. A new `MultiTurnTaskSuccessV1Evaluator` is added, which delegates the evaluation responsibility to the **Vertex Gen AI Eval SDK** for assessing multi-turn conversations. This involves updating the metric registry and info providers, and implementing a dedicated `_MultiTurnVertexiAiEvalFacade` to manage the interaction with the external SDK for multi-turn scenarios. This enhancement provides a robust method for evaluating the end-to-end success of conversational AI agents across multiple turns.
This commit **refactors** the **Vertex AI evaluation facade** within the `google.adk.evaluation` module to improve its extensibility and organization. The existing `_VertexAiEvalFacade` class is transformed into an abstract base class, centralizing common client initialization and helper methods for various evaluation types. A new concrete class, `_SingleTurnVertexAiEvalFacade`, is introduced to specifically handle single-turn evaluations, inheriting from the new abstract parent. This **refactoring** allows for easier implementation of future specialized Vertex AI evaluation scenarios, such as multi-turn conversations. Downstream modules like `response_evaluator.py` and `safety_evaluator.py` are updated to instantiate the new `_SingleTurnVertexAiEvalFacade`, ensuring continued functionality.
This commit introduces a **new capability** to the **Vertex AI evaluation facade**, enabling the Vertex AI client to be initialized using an API key. Specifically, the `_perform_eval` method in `src/google/adk/evaluation/vertex_ai_eval_facade.py` now supports retrieving an API key from environment variables, offering an alternative to the existing project and location-based authentication. This enhancement provides greater flexibility for users to authenticate with **Vertex AI evaluation** services, simplifying setup in various deployment environments. Comprehensive unit tests have been added to ensure the robustness of this new initialization method, covering API key usage, project/location scenarios, and appropriate error handling for missing credentials.
This commit **refines the regular expression** within the **`.github/workflows/check-file-contents.yml`** workflow to prevent over-matching. By adding **word boundary anchors**, the regex now strictly identifies references to 'cli', avoiding false positives for similar-looking strings like `_client_labels_utils`. This **chore** specifically **fixes an existing issue** where the previous regex was too broad. The update ensures more accurate content checks, improving the reliability of the automated workflow by only flagging genuine 'cli' references.
This commit **bumps the project version** for the **ADK (Android Development Kit)** from `v1.19.0` to `v1.20.0`. It updates the `__version__` string within the `src/google/adk/version.py` file as part of a **new release preparation**. This **maintenance chore** signifies the availability of a new stable version, impacting all users and downstream projects relying on the ADK.
This commit introduces a **new capability** to automatically tag model calls originating from the **evaluation service** with specific client labels. It adds a new utility module, `_client_labels_utils.py`, which provides functions like `client_label_context` to manage and apply temporary client labels, including a default "Eval Client" label. The **Google LLM client** (`google_llm.py`) was refactored to integrate this new labeling mechanism, ensuring that all inference and metric generation calls made during evaluations are properly identified for improved tracking and observability. This enhancement allows for clearer distinction and analysis of model usage during evaluation processes.
This commit **refactors** the **`LocalEvalService`** to facilitate the injection of a `memory_service`. Specifically, an optional `memory_service` parameter is added to the `LocalEvalService` constructor (`__init__`) and subsequently **plumbed** to the **`EvaluationGenerator`** upon its creation. This **chore** ensures that the `EvaluationGenerator` within the `adk.evaluation` subsystem can access a dedicated memory management service, laying the groundwork for future enhancements in evaluation state handling or data caching. The change primarily impacts the internal dependency flow between these core evaluation components.
This commit **enhances error messaging** for `RESOURCE_EXHAUSTED` (HTTP 429) errors originating from the Gemini API within the **Google LLM integration**. It introduces a new custom exception, `_ResourceExhaustedError`, which the `generate_content_async` method now raises to provide more specific and actionable information to users. This **improves the developer experience** by clarifying rate limit issues, with accompanying unit tests ensuring the correct propagation of this specific error while re-raising other client errors.
This commit introduces a **new capability** to enhance the **resilience of LLM-based evaluations** by adding default retry options to `llm_request` calls. A new utility module, `src/google/adk/evaluation/_retry_options_utils.py`, defines these fallback options and a plugin, which is integrated into the `EvaluationGenerator`. This ensures that various **evaluation modules**, including `HallucinationsV1Evaluator`, `LlmAsJudge`, and `LlmBackedUserSimulator`, automatically benefit from improved robustness against temporary model failures. This **enhancement** provides a default safety net for LLM interactions during evaluations, while explicitly honoring any retry configurations already specified by the developer.
This commit **introduces new matching capabilities** for tool call trajectories within the `ToolTrajectoryAvgScore Metric`, specifically adding `IN_ORDER` and `ANY_ORDER` match types. It defines a new `ToolTrajectoryCriterion` class in `src/google/adk/evaluation/eval_metrics.py` to configure these match types. The core matching logic for `IN_ORDER` and `ANY_ORDER` comparisons is implemented and integrated into the `src/google/adk/evaluation/trajectory_evaluator.py` module. This **new feature** significantly **enhances the flexibility** of evaluating tool usage by allowing more nuanced comparisons of tool call sequences, with comprehensive unit tests ensuring correctness.
This commit introduces a **new capability** to control logging verbosity for **ADK CLI evaluation commands**. It adds a `--log_level` option to commands such as `cli_eval`, `cli_create_eval_set`, and `cli_add_eval_case` within `src/google/adk/cli/cli_tools_click.py`. This enhancement allows users to specify the desired logging level, directly integrating with and configuring the **ADK logger**. The change provides greater flexibility for debugging and monitoring by enabling users to tailor the amount of information output during ADK evaluation processes.
This commit introduces a **readability improvement** to the **CLI evaluation results rendering** within the `adk` project. The `pretty_print_eval_result` function in `src/google/adk/cli/cli_eval.py` is modified to **avoid rendering DataFrame columns that contain only `None` values**. This **maintenance update** specifically addresses cases where columns like "expected response" or "expected tool calls" are empty for user-simulated conversations, preventing unnecessary clutter. The change ensures a cleaner and more focused output, significantly **enhancing the readability** of detailed evaluation results for users.
This commit **fixes** an issue in the **CLI tool** by making the `session_input_file` argument **required** for the `cli_add_eval_case` command within the `google.adk` project. Previously, it was possible to create an evaluation case without this critical input, which is necessary for running evals and retrieving session data. This **refinement** ensures the integrity of the **evaluation case creation workflow** by enforcing the presence of essential session configuration. The **test suite** has been updated to reflect this new mandatory requirement, removing tests for optional input and adding it to existing tests.
This commit **refactors** the project's integration with the **Vertex AI SDK** by centralizing its dependencies through an internal proxy. Specifically, the **`vertex_ai_rag_retrieval` tool** now imports its `rag` components via the internal `src/google/adk/dependencies/vertexai.py` module, rather than directly from `vertexai.preview`. This **maintenance** change also updates the internal `dependencies.vertexai` module to expose `example_stores` from `vertexai.preview`. The overall impact is to establish a consistent internal interface for **RAG (Retrieval Augmented Generation)** and **example store** functionalities, improving maintainability and control over external SDK usage.
This commit **refactors** the **ADK evaluation framework** by making the `expected_invocation` field **optional** across the core `evaluator` interface and its associated data structures, such as `EvalMetricResultPerInvocation`. This change accommodates existing metrics that do not rely on expected invocations and improves support for conversation scenarios where this field might not be applicable. Various specific evaluators within the `google.adk.evaluation` module, including `final_response_match`, `hallucinations`, and `llm_as_judge`, have been updated to gracefully handle this optionality, either by accepting `None` or explicitly requiring the field when necessary for their specific logic. The `local_eval_service` and `vertex_ai_eval_facade` were also updated to support this more flexible evaluation paradigm, enhancing the system's adaptability to diverse evaluation requirements.
This commit provides a **bug fix** to the **evaluation configuration module** within the ADK. Specifically, the `get_evaluation_criteria_or_default` function in `src/google/adk/evaluation/eval_config.py` has been updated to explicitly check if the evaluation configuration file exists before attempting to read it. This change prevents `FileNotFoundError` exceptions, making the **evaluation pipeline** more robust when the configuration file is absent. New unit tests have also been added to verify this improved error handling.
This commit **improves debuggability** within the **evaluation service** by enhancing error logging during the inference step. It modifies the `_run_inference` function in `src/google/adk/evaluation/local_eval_service.py` to capture and include full traceback information in error logs when inference fails. This **maintenance chore** provides developers with crucial debugging context, making it significantly easier to diagnose and resolve issues that previously obscured the root cause of failures. The change ensures that comprehensive error details are available, preventing the loss of vital traceback information.
This commit introduces a **new feature** that enhances the **agent evaluation system** by allowing the `agent_evaluator` to recognize and load agents from Python modules whose filenames end with `.agent`. Specifically, the `_get_agent_for_eval` function in `src/google/adk/evaluation/agent_evaluator.py` was modified to support this new naming convention. This change provides greater flexibility in organizing and identifying agent modules, enabling developers to use `.agent` as a distinct suffix for agent definitions. An accompanying integration test in `tests/integration/test_single_agent.py` validates the correct evaluation of agents using this new module naming scheme.