Masahiro Tanaka

Developer

Masahiro Tanaka

81312776+tohtana@users.noreply.github.com

70 commits~3 files/commit

Performance

2026Previous year

Insights

Key patterns and highlights from this developer's activity.

Peak MonthJan'26234 performance

Growth Trend↑46%vs prior period

Avg Files/Commit3files per commit

Active Days55of 455 days

Top RepoDeepSpeed70 commits

Effort Over Time

Breakdown of growth, maintenance, and fixes effort over time.

Bug Behavior

Beta

Bugs introduced vs. fixed over time.

Investment Quality

Beta

Reclassifies engineering effort based on bug attribution. Commits that introduced bugs are retrospectively counted as poor investments.

23%Productive TimeGrowth 45% + Fixes 55%

23%Maintenance Time

55%Wasted Time

How it works

Methodology

Investment Quality reclassifies engineering effort based on bug attribution data. Commits identified as buggy origins (those that introduced bugs later fixed by someone) have their grow and maintenance time moved into the Wasted Time category. Their waste (fix commits) remains counted as productive. All other commits retain their standard classification: grow is productive, maintenance is maintenance, and waste (fixes) is productive.

Relationship to Growth / Maintenance / Fixes

The standard model classifies commits as Growth, Maintenance, or Fixes. Investment Quality adds a quality lens: a commit that introduced a bug is retrospectively counted as a poor investment — the engineering time spent on it was wasted because it ultimately required additional fix work. Fix commits (Fixes in the standard model) are reframed as productive, because fixing bugs is valuable work.

Proposed API Endpoint

Currently computed client-side from commit and bug attribution data. Ideal server-side endpoint:

POST /v1/organizations/{orgId}/investment-quality
Content-Type: application/json

Request:
{
  "startTime": "2025-01-01T00:00:00Z",
  "endTime": "2025-12-31T23:59:59Z",
  "bucketSize": "BUCKET_SIZE_MONTH",
  "groupBy": ["repository_id" | "deliverer_email"]
}

Response:
{
  "productivePct": 74,
  "maintenancePct": 18,
  "wastedPct": 8,
  "buckets": [
    {
      "bucketStart": "2025-01-01T00:00:00Z",
      "productive": 4.2,
      "maintenance": 1.8,
      "wasted": 0.6
    }
  ]
}

Recent Activity

Latest analyzed commits from this developer.

Hash	Message	Date	Files	Effort
3bdebc0	This commit fixes a CI failure occurring in tests for AutoTP (Automatic Tensor Parallelism) and universal checkpoint. The issue, a "RuntimeError: Cannot re-initialize CUDA", arose because `torch.cuda.current_device()` was called prematurely during test setup under `pytest --forked`. To resolve this, a new method `_should_materialize_tp_partition` is introduced in `deepspeed/module_inject/layers.py` to conditionally skip constructor-time AutoTP materialization when no model-parallel group is provided. This bug fix ensures that AutoTP partitioning only occurs when an actual `mp_group` is present, preventing device placement issues and stabilizing the CI pipeline for these critical features.	Mar 31	1	waste
36f0b0c	This commit introduces a feature enhancement to the CI/CD pipeline by implementing dynamic hardware detection for DeepSpeed's full test suite. It modifies the `.github/workflows/aws-torch-latest-full.yml` workflow to detect the CUDA architecture and the number of GPUs available in the test environment. These detected values are then set as environment variables, enabling adaptive configuration of DeepSpeed installation and test execution. This change provides a crucial fallback mechanism to improve the reliability of nightly full tests, specifically addressing recent failures by allowing the system to better utilize available resources like A100 nodes.	Mar 30	1	grow
138f20d	This commit introduces a backward compatibility fix for DeepSpeed, specifically addressing issues when installing from source with PyTorch versions older than 2.4. It resolves a build failure caused by the absence of `torch.amp.custom_fwd` in older PyTorch releases, which was implicitly imported by DeepSpeed's `setup.py`. The DeepSpeed runtime's Zero module in `deepspeed/runtime/zero/linear.py` now includes a fallback mechanism, utilizing `torch.cuda.amp.custom_fwd` for these legacy environments. This ensures that users can install and run DeepSpeed from source on a broader range of PyTorch versions, with new unit tests verifying the correct `autocast` decorator behavior across different PyTorch versions.	Mar 25	2	waste
784cc26	This commit fixes a critical bug in the Evoformer attention mechanism that caused order-dependent failures during multi-architecture CUDA builds. It refactors the GPU architecture detection in `csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h` to enable runtime dispatch of appropriate kernels based on the device's compute capability. This ensures that Evoformer binaries built for mixed architectures (e.g., pre-Ampere and Ampere+) correctly select optimized kernels, deprecating the `DS_EVOFORMER_GPU_ARCH` build flag. The change improves the stability and performance of Evoformer across diverse GPU environments by providing a robust multi-architecture build and runtime solution.	Mar 13	4	maint
6c59d54	This commit delivers a critical performance fix for DeepSpeed's ZeRO-enabled training, resolving a regression where dynamic gradient hook counting caused significant overhead during the backward pass. It introduces a `should_refresh_expected_hook_count()` predicate to ensure the expensive hook count computation is performed only once per reentrant backward phase, rather than for every gradient hook. This optimization is applied across ZeRO-1, ZeRO-2, and ZeRO-3 stages by conditionally refreshing or reusing cached hook counts, and also includes resetting counters in `enter_backward()` to prevent pollution. The performance improvement is substantial, leading to a 2.5x speedup in backward pass iteration times for large transformer models.	Mar 5	4	maint
4dba1e2	This commit introduces a documentation update to the `docs/code-docs/source/training.rst` file, enhancing the project's user guidance. It adds a new section that clarifies the behavior and usage of `torch.autocast` when nested, specifically detailing its interaction within the DeepSpeed engine. This documentation improvement explains the rationale behind nesting `autocast` and provides guidance on when and why it is needed, thereby improving user understanding for developers utilizing these advanced training features.	Mar 4	1	maint
04d69cc	This commit delivers a bug fix addressing a `RuntimeError` encountered during `import deepspeed` on PyTorch 2.3 with Python 3.12. The `deepspeed.utils.torch.py` module's `jit_script_compat` decorator was unconditionally invoking `torch.compile()`, which lacked Dynamo support for Python 3.12 in that specific PyTorch version, leading to import crashes. The fix introduces a version gate within `jit_script_compat` to prevent `torch.compile()` calls on known-unsupported combinations and implements a robust double fallback mechanism. This ensures DeepSpeed can be successfully imported and utilized on these specific platform configurations, significantly improving compatibility.	Mar 2	1	waste
bffaf45	This commit fixes a critical bug affecting DeepSpeed ZeRO's parameter counting mechanism when running on PyTorch 2.3. Previously, the `_get_grad_fn_or_grad_acc` lookup within `count_used_parameters_in_backward` in `deepspeed/runtime/utils.py` would fail in `no-grad` contexts during backward hooks, leading to crashes. The bug fix explicitly wraps this lookup with `torch.enable_grad()` to ensure proper gradient function retrieval, aligning with newer PyTorch behavior. This ensures DeepSpeed ZeRO can reliably count used parameters and operate correctly on PyTorch 2.3, with a new unit test added to validate the change.	Mar 1	2	waste
efc0b49	This commit enhances the Automatic Tensor Parallelism (AutoTP) documentation by restructuring its navigation and content. It adds a new sidebar navigation link for the `AutoTP Training` tutorial, while renaming the existing AutoTP entry to `AutoTP Inference` for improved clarity within `docs/_data/navigation.yml`. Furthermore, this documentation update includes fixing broken internal links within the `docs/_tutorials/autotp-training.md` tutorial and updating introductory notes in `docs/_tutorials/automatic-tensor-parallelism.md` to correctly reference the new training guide. These changes improve the discoverability and accuracy of AutoTP resources for users.	Feb 25	3	maint
0416cf6	This commit schedules a nightly execution for the full unit test suite within the CI/CD pipeline. It introduces a new capability by adding a schedule trigger to the `.github/workflows/aws-torch-latest-full.yml` workflow. This maintenance improvement ensures that the comprehensive tests run automatically every night, but intelligently, only when new commits have been detected since the last successful run. This regular, conditional execution helps in continuously monitoring test stability and early detection of regressions in the project's core functionalities.	Feb 24	1	grow
93524c8	This commit fixes a regression in the `TestZeroStaticScale` unit test, which was failing for ZeRO optimizer stages 1, 2, and 3. The issue arose from an incorrect assertion in `tests/unit/runtime/half_precision/test_fp16.py` that attempted to access `optim.loss_scale_config.dynamic_loss_scale`, a property not present in ZeRO optimizers. This bug fix reverts the assertion to the correct `optim.dynamic_loss_scale`, ensuring the FP16 half-precision tests accurately validate static loss scaling behavior for ZeRO optimizers. This restores the integrity and reliability of ZeRO optimizer testing for mixed-precision training.	Feb 22	1	maint
57b10d5	This commit fixes a bug in the DeepSpeed ZeRO Redundancy Optimizer by enhancing the `GatheredParameters` context manager. It introduces sanity checks within `deepspeed/runtime/zero/partition_parameters.py` to detect and prevent in-place modification of parameters when `modifier_rank` is `None`. Previously, an incomplete implementation failed to catch these modifications, leading to potential silent data corruption or unexpected behavior during distributed training. Now, attempting such an operation will correctly raise a `RuntimeError`, improving the robustness and predictability of the ZeRO optimizer. This change ensures data integrity and provides clearer error feedback to users of the DeepSpeed ZeRO optimization.	Feb 21	2	maint
dbc1b07	This commit fixes compilation errors on HIP/ROCm (AMD) platforms within the DeepSpeed Inference v2 module. It addresses the absence of specific CUDA-style BF16 conversion intrinsics by introducing platform-specific fallback implementations in `deepspeed/inference/v2/kernels/includes/conversion_utils.h`. This bug fix ensures that integer, unsigned integer, and float to BF16 conversions, and vice-versa, are correctly handled on AMD GPUs. The change significantly improves platform compatibility for DeepSpeed Inference v2, enabling it to compile and run successfully on HIP/ROCm systems.	Feb 18	1	waste
d2ca6e7	This commit introduces a compatibility layer for JIT compilation within DeepSpeed, primarily to resolve deprecation warnings encountered when importing DeepSpeed on `torch==2.10.0`. It refactors several internal helper functions across the Mixture of Experts (MoE), ZeRO optimizer utilities, and sequence parallelism layers by replacing direct calls to `@torch.jit.script` with a new utility decorator, `jit_script_compat`. This new utility, defined in `deepspeed/utils/torch.py`, conditionally leverages `torch.compile` for newer PyTorch versions while falling back to `torch.jit.script` for older ones. The change ensures cleaner imports and better alignment with PyTorch's recommended JIT compilation practices, improving forward compatibility.	Feb 12	4	grow
1752c2a	This commit fixes gradient norm divergence observed during BF16 training with ZeRO stage 0 by addressing two critical bugs within the DeepSpeed engine. It resolves incorrect dynamic loss scaling application in `FP16_UnfusedOptimizer` and prevents unintended gradient accumulation caused by skipping `zero_grad` for BF16 without ZeRO. The bug fix disables loss scaling for BF16 and removes the `zero_optimization()` gate on `zero_grad`, complemented by a refactoring of the loss scaling mechanism to use a new `LossScaleConfig`. This ensures stable and accurate gradient updates for models leveraging these specific mixed-precision and optimization configurations.	Feb 12	8	maint
a44fb58	This commit delivers a bug fix and enhancement for DeepSpeed's Auto Tensor Parallelism (AutoTP), resolving critical issues with custom pattern configurations. It updates `deepspeed/module_inject/auto_tp.py` to correctly respect `use_default_specs: false` and disable traditional injection when custom patterns are enabled, ensuring proper module replacement. Additionally, `deepspeed/runtime/tensor_parallel/init_utils.py` is modified to automatically create a tensor parallel group during `deepspeed.initialize` if `mpu` is not provided, significantly improving Hugging Face Trainer integration. These changes make custom AutoTP patterns reliable and enhance the overall usability and compatibility of the tensor parallelism features.	Feb 7	4	maint
6b9cab1	This commit introduces a new capability for Automatic Tensor Parallelism (AutoTP), enabling users to define custom layer partitioning patterns via a flexible, configuration-driven API. This allows for precise control over how model parameters are sharded, supporting any model architecture including those with complex fused layers and unequal sub-parameter sizes, using regex patterns within the DeepSpeed configuration. The `deepspeed.initialize` function is enhanced to simplify AutoTP setup by integrating these configurations directly, while maintaining backward compatibility with previous initialization methods. This significantly improves the extensibility and usability of AutoTP for diverse and custom model training scenarios.	Jan 31	19	grow
52b1d4d	This commit fixes a race condition within DeepSpeed ZeRO3 leaf modules during the backward pass, specifically when PyTorch's autograd concurrently triggers hooks for modules returning multiple outputs. This bug fix introduces thread synchronization in `deepspeed/runtime/zero/partitioned_param_coordinator.py` to ensure only a single thread handles parameter fetching for a leaf module, preventing concurrent modifications to internal parameter states. This significantly improves the stability and correctness of ZeRO3 training, especially for models leveraging multi-output leaf modules. New tests in `test_zero_leaf_module.py` validate this thread-safe behavior.	Jan 30	2	waste
b19987c	This commit performs a maintenance update by upgrading the PyTorch version used within the project's continuous integration (CI) pipelines. Specifically, the `accelerate` and `torch_latest` CI environments are updated to use PyTorch v2.9.1 from the previous v2.6.0. This chore involves modifying `ci/accelerate.py` and `ci/torch_latest.py` to reflect the new base image version. Additionally, `ci/torch_latest.py` adjusts the `pytest` command to align with the updated torch and cuda versions, ensuring tests are run against a more current and compatible deep learning framework.	Jan 29	2	maint
d9f3d40	This commit fixes a crash in DeepSpeed's ZeRO-3 by introducing a clearer `RuntimeError` when `GatheredParameters` are modified in-place without `modifier_rank` specified. Specifically, the `GatheredParameters.__exit__` method in `deepspeed/runtime/zero/partition_parameters.py` now detects and raises an actionable error, synchronized across ranks, instead of an obscure internal invariant assertion. Additionally, the `free_param` function now provides more informative error messages when parameters are still active in submodules. This error handling improvement enhances debugging clarity and the overall developer experience for users of ZeRO-3.	Jan 28	2	maint

3bdebc0Mar 31

This commit **fixes a CI failure** occurring in tests for **AutoTP (Automatic Tensor Parallelism)** and **universal checkpoint**. The issue, a "RuntimeError: Cannot re-initialize CUDA", arose because `torch.cuda.current_device()` was called prematurely during test setup under `pytest --forked`. To resolve this, a new method `_should_materialize_tp_partition` is introduced in `deepspeed/module_inject/layers.py` to conditionally skip constructor-time AutoTP materialization when no model-parallel group is provided. This **bug fix** ensures that **AutoTP** partitioning only occurs when an actual `mp_group` is present, preventing device placement issues and stabilizing the CI pipeline for these critical features.

1 fileswaste

36f0b0cMar 30

This commit introduces a **feature enhancement** to the **CI/CD pipeline** by implementing dynamic hardware detection for DeepSpeed's full test suite. It modifies the `.github/workflows/aws-torch-latest-full.yml` workflow to detect the **CUDA architecture** and the **number of GPUs** available in the test environment. These detected values are then set as environment variables, enabling adaptive configuration of DeepSpeed installation and test execution. This change provides a crucial **fallback mechanism** to improve the **reliability** of nightly full tests, specifically addressing recent failures by allowing the system to better utilize available resources like A100 nodes.

1 filesgrow

138f20dMar 25

This commit introduces a **backward compatibility fix** for DeepSpeed, specifically addressing issues when installing from source with **PyTorch versions older than 2.4**. It resolves a build failure caused by the absence of `torch.amp.custom_fwd` in older PyTorch releases, which was implicitly imported by DeepSpeed's `setup.py`. The **DeepSpeed runtime's Zero module** in `deepspeed/runtime/zero/linear.py` now includes a fallback mechanism, utilizing `torch.cuda.amp.custom_fwd` for these legacy environments. This ensures that users can **install and run DeepSpeed from source** on a broader range of PyTorch versions, with new unit tests verifying the correct `autocast` decorator behavior across different PyTorch versions.

2 fileswaste

784cc26Mar 13

This commit **fixes a critical bug** in the **Evoformer attention mechanism** that caused order-dependent failures during multi-architecture CUDA builds. It **refactors** the GPU architecture detection in `csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h` to enable runtime dispatch of appropriate kernels based on the device's compute capability. This ensures that **Evoformer** binaries built for mixed architectures (e.g., pre-Ampere and Ampere+) correctly select optimized kernels, deprecating the `DS_EVOFORMER_GPU_ARCH` build flag. The change improves the stability and performance of **Evoformer** across diverse GPU environments by providing a robust multi-architecture build and runtime solution.

4 filesmaint

6c59d54Mar 5

This commit delivers a **critical performance fix** for **DeepSpeed's ZeRO-enabled training**, resolving a regression where dynamic gradient hook counting caused significant overhead during the backward pass. It introduces a `should_refresh_expected_hook_count()` predicate to ensure the expensive hook count computation is performed only once per reentrant backward phase, rather than for every gradient hook. This optimization is applied across **ZeRO-1, ZeRO-2, and ZeRO-3 stages** by conditionally refreshing or reusing cached hook counts, and also includes resetting counters in `enter_backward()` to prevent pollution. The **performance improvement** is substantial, leading to a 2.5x speedup in backward pass iteration times for large transformer models.

4 filesmaint

4dba1e2Mar 4

This commit introduces a **documentation update** to the `docs/code-docs/source/training.rst` file, enhancing the project's user guidance. It adds a new section that **clarifies the behavior and usage of `torch.autocast` when nested**, specifically detailing its interaction within the **DeepSpeed engine**. This **documentation improvement** explains the rationale behind nesting `autocast` and provides guidance on when and why it is needed, thereby improving user understanding for developers utilizing these advanced training features.

1 filesmaint

04d69ccMar 2

This commit delivers a **bug fix** addressing a `RuntimeError` encountered during `import deepspeed` on PyTorch 2.3 with Python 3.12. The `deepspeed.utils.torch.py` module's `jit_script_compat` decorator was unconditionally invoking `torch.compile()`, which lacked Dynamo support for Python 3.12 in that specific PyTorch version, leading to import crashes. The fix introduces a version gate within `jit_script_compat` to prevent `torch.compile()` calls on known-unsupported combinations and implements a robust double fallback mechanism. This ensures **DeepSpeed can be successfully imported and utilized** on these specific platform configurations, significantly improving **compatibility**.

1 fileswaste

bffaf45Mar 1

This commit **fixes a critical bug** affecting **DeepSpeed ZeRO's parameter counting** mechanism when running on **PyTorch 2.3**. Previously, the `_get_grad_fn_or_grad_acc` lookup within `count_used_parameters_in_backward` in `deepspeed/runtime/utils.py` would fail in `no-grad` contexts during backward hooks, leading to crashes. The **bug fix** explicitly wraps this lookup with `torch.enable_grad()` to ensure proper gradient function retrieval, aligning with newer PyTorch behavior. This ensures **DeepSpeed ZeRO** can reliably count used parameters and operate correctly on **PyTorch 2.3**, with a new unit test added to validate the change.

2 fileswaste

efc0b49Feb 25

This commit **enhances the Automatic Tensor Parallelism (AutoTP) documentation** by **restructuring its navigation and content**. It **adds a new sidebar navigation link** for the `AutoTP Training` tutorial, while **renaming the existing AutoTP entry** to `AutoTP Inference` for improved clarity within `docs/_data/navigation.yml`. Furthermore, this **documentation update** includes **fixing broken internal links** within the `docs/_tutorials/autotp-training.md` tutorial and updating introductory notes in `docs/_tutorials/automatic-tensor-parallelism.md` to correctly reference the new training guide. These changes improve the **discoverability and accuracy** of AutoTP resources for users.

3 filesmaint

0416cf6Feb 24

This commit **schedules a nightly execution** for the **full unit test suite** within the **CI/CD pipeline**. It introduces a **new capability** by adding a schedule trigger to the `.github/workflows/aws-torch-latest-full.yml` workflow. This **maintenance improvement** ensures that the comprehensive tests run automatically every night, but intelligently, only when new commits have been detected since the last successful run. This regular, conditional execution helps in **continuously monitoring test stability** and **early detection of regressions** in the project's core functionalities.

1 filesgrow

93524c8Feb 22

This commit **fixes a regression** in the `TestZeroStaticScale` unit test, which was failing for **ZeRO optimizer** stages 1, 2, and 3. The issue arose from an incorrect assertion in `tests/unit/runtime/half_precision/test_fp16.py` that attempted to access `optim.loss_scale_config.dynamic_loss_scale`, a property not present in ZeRO optimizers. This **bug fix** reverts the assertion to the correct `optim.dynamic_loss_scale`, ensuring the **FP16 half-precision tests** accurately validate static loss scaling behavior for ZeRO optimizers. This restores the integrity and reliability of **ZeRO optimizer testing** for mixed-precision training.

1 filesmaint

57b10d5Feb 21

This commit **fixes a bug** in the **DeepSpeed ZeRO Redundancy Optimizer** by enhancing the `GatheredParameters` context manager. It introduces **sanity checks** within `deepspeed/runtime/zero/partition_parameters.py` to detect and prevent in-place modification of parameters when `modifier_rank` is `None`. Previously, an incomplete implementation failed to catch these modifications, leading to potential silent data corruption or unexpected behavior during distributed training. Now, attempting such an operation will correctly raise a `RuntimeError`, improving the **robustness and predictability** of the ZeRO optimizer. This change ensures data integrity and provides clearer error feedback to users of the **DeepSpeed ZeRO** optimization.

2 filesmaint

dbc1b07Feb 18

This commit **fixes compilation errors** on **HIP/ROCm (AMD)** platforms within the **DeepSpeed Inference v2** module. It addresses the absence of specific CUDA-style BF16 conversion intrinsics by introducing **platform-specific fallback implementations** in `deepspeed/inference/v2/kernels/includes/conversion_utils.h`. This **bug fix** ensures that integer, unsigned integer, and float to BF16 conversions, and vice-versa, are correctly handled on AMD GPUs. The change significantly improves **platform compatibility** for **DeepSpeed Inference v2**, enabling it to compile and run successfully on **HIP/ROCm** systems.

1 fileswaste

d2ca6e7Feb 12

This commit introduces a **compatibility layer** for JIT compilation within DeepSpeed, primarily to **resolve deprecation warnings** encountered when importing DeepSpeed on `torch==2.10.0`. It **refactors** several internal helper functions across the **Mixture of Experts (MoE)**, **ZeRO optimizer utilities**, and **sequence parallelism layers** by replacing direct calls to `@torch.jit.script` with a new utility decorator, `jit_script_compat`. This new utility, defined in `deepspeed/utils/torch.py`, conditionally leverages `torch.compile` for newer PyTorch versions while falling back to `torch.jit.script` for older ones. The change ensures **cleaner imports** and better alignment with PyTorch's recommended JIT compilation practices, improving **forward compatibility**.

4 filesgrow

1752c2aFeb 12

This commit **fixes gradient norm divergence** observed during **BF16 training with ZeRO stage 0** by addressing two critical bugs within the **DeepSpeed engine**. It resolves incorrect dynamic loss scaling application in `FP16_UnfusedOptimizer` and prevents unintended gradient accumulation caused by skipping `zero_grad` for BF16 without ZeRO. The **bug fix** disables loss scaling for BF16 and removes the `zero_optimization()` gate on `zero_grad`, complemented by a **refactoring** of the loss scaling mechanism to use a new `LossScaleConfig`. This ensures **stable and accurate gradient updates** for models leveraging these specific mixed-precision and optimization configurations.

8 filesmaint

a44fb58Feb 7

This commit delivers a **bug fix** and **enhancement** for **DeepSpeed's Auto Tensor Parallelism (AutoTP)**, resolving critical issues with custom pattern configurations. It updates `deepspeed/module_inject/auto_tp.py` to correctly respect `use_default_specs: false` and disable traditional injection when custom patterns are enabled, ensuring proper module replacement. Additionally, `deepspeed/runtime/tensor_parallel/init_utils.py` is modified to automatically create a tensor parallel group during `deepspeed.initialize` if `mpu` is not provided, significantly improving **Hugging Face Trainer integration**. These changes make custom AutoTP patterns reliable and enhance the overall usability and compatibility of the tensor parallelism features.

4 filesmaint

6b9cab1Jan 31

This commit introduces a **new capability** for **Automatic Tensor Parallelism (AutoTP)**, enabling users to define **custom layer partitioning patterns** via a flexible, configuration-driven API. This allows for precise control over how model parameters are sharded, supporting **any model architecture** including those with complex fused layers and unequal sub-parameter sizes, using regex patterns within the DeepSpeed configuration. The `deepspeed.initialize` function is enhanced to simplify AutoTP setup by integrating these configurations directly, while maintaining **backward compatibility** with previous initialization methods. This significantly improves the **extensibility and usability** of AutoTP for diverse and custom model training scenarios.

19 filesgrow

52b1d4dJan 30

This commit **fixes a race condition** within **DeepSpeed ZeRO3 leaf modules** during the backward pass, specifically when PyTorch's autograd concurrently triggers hooks for modules returning multiple outputs. This **bug fix** introduces **thread synchronization** in `deepspeed/runtime/zero/partitioned_param_coordinator.py` to ensure only a single thread handles parameter fetching for a leaf module, preventing concurrent modifications to internal parameter states. This significantly improves the **stability and correctness** of **ZeRO3** training, especially for models leveraging multi-output leaf modules. New tests in `test_zero_leaf_module.py` validate this thread-safe behavior.

2 fileswaste

b19987cJan 29

This commit performs a **maintenance update** by upgrading the **PyTorch version** used within the project's continuous integration (CI) pipelines. Specifically, the **`accelerate` and `torch_latest` CI environments** are updated to use PyTorch **v2.9.1** from the previous v2.6.0. This **chore** involves modifying `ci/accelerate.py` and `ci/torch_latest.py` to reflect the new base image version. Additionally, `ci/torch_latest.py` adjusts the `pytest` command to align with the updated torch and cuda versions, ensuring tests are run against a more current and compatible deep learning framework.

2 filesmaint

d9f3d40Jan 28

This commit **fixes a crash** in **DeepSpeed's ZeRO-3** by introducing a **clearer `RuntimeError`** when `GatheredParameters` are modified in-place without `modifier_rank` specified. Specifically, the `GatheredParameters.__exit__` method in `deepspeed/runtime/zero/partition_parameters.py` now detects and raises an actionable error, synchronized across ranks, instead of an obscure internal invariant assertion. Additionally, the `free_param` function now provides more informative error messages when parameters are still active in submodules. This **error handling improvement** enhances **debugging clarity** and the overall **developer experience** for users of ZeRO-3.

2 filesmaint

Work Patterns

Beta

Commit activity distribution by hour and day of week. Shows when this developer is most active.

Collaboration

Beta

Developers who frequently work on the same files and symbols. Higher score means stronger code collaboration.

POST /v1/organizations/{orgId}/investment-quality Content-Type: application/json Request: { "startTime": "2025-01-01T00:00:00Z", "endTime": "2025-12-31T23:59:59Z", "bucketSize": "BUCKET_SIZE_MONTH", "groupBy": ["repository_id" | "deliverer_email"] } Response: { "productivePct": 74, "maintenancePct": 18, "wastedPct": 8, "buckets": [ { "bucketStart": "2025-01-01T00:00:00Z", "productive": 4.2, "maintenance": 1.8, "wasted": 0.6 } ] }

Hash

Message

Date

Files

Effort

3bdebc0

Mar 31

waste

36f0b0c

Mar 30

grow

138f20d

Mar 25

waste

784cc26

Mar 13

maint

6c59d54

Mar 5

maint

4dba1e2

Mar 4

maint

04d69cc

Mar 2

waste

bffaf45

Mar 1

waste

efc0b49

Feb 25

maint

0416cf6

Feb 24

grow

93524c8

Feb 22

maint

57b10d5

Feb 21

maint

dbc1b07

Feb 18

waste

d2ca6e7

Feb 12

grow

1752c2a

Feb 12

maint

a44fb58

Feb 7

maint

6b9cab1

Jan 31

grow

52b1d4d

Jan 30

waste

b19987c

Jan 29

maint

d9f3d40

Jan 28

maint