NavigaraNavigara
OrganizationsDistributionCompareResearch
NavigaraNavigara
OrganizationsDistributionCompareResearch
All developers

Masahiro Tanaka

Developer

Masahiro Tanaka

81312776+tohtana@users.noreply.github.com

70 commits~3 files/commit

Performance

2026Previous year

Insights

Key patterns and highlights from this developer's activity.

Peak MonthJan'26234 performance
Growth Trend↑46%vs prior period
Avg Files/Commit3files per commit
Active Days55of 455 days
Top RepoDeepSpeed70 commits

Effort Over Time

Breakdown of growth, maintenance, and fixes effort over time.

Bug Behavior

Beta

Bugs introduced vs. fixed over time.

Investment Quality

Beta

Reclassifies engineering effort based on bug attribution. Commits that introduced bugs are retrospectively counted as poor investments.

23%Productive TimeGrowth 45% + Fixes 55%
23%Maintenance Time
55%Wasted Time
How it works

Methodology

Investment Quality reclassifies engineering effort based on bug attribution data. Commits identified as buggy origins (those that introduced bugs later fixed by someone) have their grow and maintenance time moved into the Wasted Time category. Their waste (fix commits) remains counted as productive. All other commits retain their standard classification: grow is productive, maintenance is maintenance, and waste (fixes) is productive.

Relationship to Growth / Maintenance / Fixes

The standard model classifies commits as Growth, Maintenance, or Fixes. Investment Quality adds a quality lens: a commit that introduced a bug is retrospectively counted as a poor investment — the engineering time spent on it was wasted because it ultimately required additional fix work. Fix commits (Fixes in the standard model) are reframed as productive, because fixing bugs is valuable work.

Proposed API Endpoint

Currently computed client-side from commit and bug attribution data. Ideal server-side endpoint:

POST /v1/organizations/{orgId}/investment-quality
Content-Type: application/json

Request:
{
  "startTime": "2025-01-01T00:00:00Z",
  "endTime": "2025-12-31T23:59:59Z",
  "bucketSize": "BUCKET_SIZE_MONTH",
  "groupBy": ["repository_id" | "deliverer_email"]
}

Response:
{
  "productivePct": 74,
  "maintenancePct": 18,
  "wastedPct": 8,
  "buckets": [
    {
      "bucketStart": "2025-01-01T00:00:00Z",
      "productive": 4.2,
      "maintenance": 1.8,
      "wasted": 0.6
    }
  ]
}

Recent Activity

Latest analyzed commits from this developer.

HashMessageDateFilesEffort
3bdebc0This commit **fixes a CI failure** occurring in tests for **AutoTP (Automatic Tensor Parallelism)** and **universal checkpoint**. The issue, a "RuntimeError: Cannot re-initialize CUDA", arose because `torch.cuda.current_device()` was called prematurely during test setup under `pytest --forked`. To resolve this, a new method `_should_materialize_tp_partition` is introduced in `deepspeed/module_inject/layers.py` to conditionally skip constructor-time AutoTP materialization when no model-parallel group is provided. This **bug fix** ensures that **AutoTP** partitioning only occurs when an actual `mp_group` is present, preventing device placement issues and stabilizing the CI pipeline for these critical features.Mar 311waste
36f0b0cThis commit introduces a **feature enhancement** to the **CI/CD pipeline** by implementing dynamic hardware detection for DeepSpeed's full test suite. It modifies the `.github/workflows/aws-torch-latest-full.yml` workflow to detect the **CUDA architecture** and the **number of GPUs** available in the test environment. These detected values are then set as environment variables, enabling adaptive configuration of DeepSpeed installation and test execution. This change provides a crucial **fallback mechanism** to improve the **reliability** of nightly full tests, specifically addressing recent failures by allowing the system to better utilize available resources like A100 nodes.Mar 301grow
138f20dThis commit introduces a **backward compatibility fix** for DeepSpeed, specifically addressing issues when installing from source with **PyTorch versions older than 2.4**. It resolves a build failure caused by the absence of `torch.amp.custom_fwd` in older PyTorch releases, which was implicitly imported by DeepSpeed's `setup.py`. The **DeepSpeed runtime's Zero module** in `deepspeed/runtime/zero/linear.py` now includes a fallback mechanism, utilizing `torch.cuda.amp.custom_fwd` for these legacy environments. This ensures that users can **install and run DeepSpeed from source** on a broader range of PyTorch versions, with new unit tests verifying the correct `autocast` decorator behavior across different PyTorch versions.Mar 252waste
784cc26This commit **fixes a critical bug** in the **Evoformer attention mechanism** that caused order-dependent failures during multi-architecture CUDA builds. It **refactors** the GPU architecture detection in `csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h` to enable runtime dispatch of appropriate kernels based on the device's compute capability. This ensures that **Evoformer** binaries built for mixed architectures (e.g., pre-Ampere and Ampere+) correctly select optimized kernels, deprecating the `DS_EVOFORMER_GPU_ARCH` build flag. The change improves the stability and performance of **Evoformer** across diverse GPU environments by providing a robust multi-architecture build and runtime solution.Mar 134maint
6c59d54This commit delivers a **critical performance fix** for **DeepSpeed's ZeRO-enabled training**, resolving a regression where dynamic gradient hook counting caused significant overhead during the backward pass. It introduces a `should_refresh_expected_hook_count()` predicate to ensure the expensive hook count computation is performed only once per reentrant backward phase, rather than for every gradient hook. This optimization is applied across **ZeRO-1, ZeRO-2, and ZeRO-3 stages** by conditionally refreshing or reusing cached hook counts, and also includes resetting counters in `enter_backward()` to prevent pollution. The **performance improvement** is substantial, leading to a 2.5x speedup in backward pass iteration times for large transformer models.Mar 54maint
4dba1e2This commit introduces a **documentation update** to the `docs/code-docs/source/training.rst` file, enhancing the project's user guidance. It adds a new section that **clarifies the behavior and usage of `torch.autocast` when nested**, specifically detailing its interaction within the **DeepSpeed engine**. This **documentation improvement** explains the rationale behind nesting `autocast` and provides guidance on when and why it is needed, thereby improving user understanding for developers utilizing these advanced training features.Mar 41maint
04d69ccThis commit delivers a **bug fix** addressing a `RuntimeError` encountered during `import deepspeed` on PyTorch 2.3 with Python 3.12. The `deepspeed.utils.torch.py` module's `jit_script_compat` decorator was unconditionally invoking `torch.compile()`, which lacked Dynamo support for Python 3.12 in that specific PyTorch version, leading to import crashes. The fix introduces a version gate within `jit_script_compat` to prevent `torch.compile()` calls on known-unsupported combinations and implements a robust double fallback mechanism. This ensures **DeepSpeed can be successfully imported and utilized** on these specific platform configurations, significantly improving **compatibility**.Mar 21waste
bffaf45This commit **fixes a critical bug** affecting **DeepSpeed ZeRO's parameter counting** mechanism when running on **PyTorch 2.3**. Previously, the `_get_grad_fn_or_grad_acc` lookup within `count_used_parameters_in_backward` in `deepspeed/runtime/utils.py` would fail in `no-grad` contexts during backward hooks, leading to crashes. The **bug fix** explicitly wraps this lookup with `torch.enable_grad()` to ensure proper gradient function retrieval, aligning with newer PyTorch behavior. This ensures **DeepSpeed ZeRO** can reliably count used parameters and operate correctly on **PyTorch 2.3**, with a new unit test added to validate the change.Mar 12waste
efc0b49This commit **enhances the Automatic Tensor Parallelism (AutoTP) documentation** by **restructuring its navigation and content**. It **adds a new sidebar navigation link** for the `AutoTP Training` tutorial, while **renaming the existing AutoTP entry** to `AutoTP Inference` for improved clarity within `docs/_data/navigation.yml`. Furthermore, this **documentation update** includes **fixing broken internal links** within the `docs/_tutorials/autotp-training.md` tutorial and updating introductory notes in `docs/_tutorials/automatic-tensor-parallelism.md` to correctly reference the new training guide. These changes improve the **discoverability and accuracy** of AutoTP resources for users.Feb 253maint
0416cf6This commit **schedules a nightly execution** for the **full unit test suite** within the **CI/CD pipeline**. It introduces a **new capability** by adding a schedule trigger to the `.github/workflows/aws-torch-latest-full.yml` workflow. This **maintenance improvement** ensures that the comprehensive tests run automatically every night, but intelligently, only when new commits have been detected since the last successful run. This regular, conditional execution helps in **continuously monitoring test stability** and **early detection of regressions** in the project's core functionalities.Feb 241grow
93524c8This commit **fixes a regression** in the `TestZeroStaticScale` unit test, which was failing for **ZeRO optimizer** stages 1, 2, and 3. The issue arose from an incorrect assertion in `tests/unit/runtime/half_precision/test_fp16.py` that attempted to access `optim.loss_scale_config.dynamic_loss_scale`, a property not present in ZeRO optimizers. This **bug fix** reverts the assertion to the correct `optim.dynamic_loss_scale`, ensuring the **FP16 half-precision tests** accurately validate static loss scaling behavior for ZeRO optimizers. This restores the integrity and reliability of **ZeRO optimizer testing** for mixed-precision training.Feb 221maint
57b10d5This commit **fixes a bug** in the **DeepSpeed ZeRO Redundancy Optimizer** by enhancing the `GatheredParameters` context manager. It introduces **sanity checks** within `deepspeed/runtime/zero/partition_parameters.py` to detect and prevent in-place modification of parameters when `modifier_rank` is `None`. Previously, an incomplete implementation failed to catch these modifications, leading to potential silent data corruption or unexpected behavior during distributed training. Now, attempting such an operation will correctly raise a `RuntimeError`, improving the **robustness and predictability** of the ZeRO optimizer. This change ensures data integrity and provides clearer error feedback to users of the **DeepSpeed ZeRO** optimization.Feb 212maint
dbc1b07This commit **fixes compilation errors** on **HIP/ROCm (AMD)** platforms within the **DeepSpeed Inference v2** module. It addresses the absence of specific CUDA-style BF16 conversion intrinsics by introducing **platform-specific fallback implementations** in `deepspeed/inference/v2/kernels/includes/conversion_utils.h`. This **bug fix** ensures that integer, unsigned integer, and float to BF16 conversions, and vice-versa, are correctly handled on AMD GPUs. The change significantly improves **platform compatibility** for **DeepSpeed Inference v2**, enabling it to compile and run successfully on **HIP/ROCm** systems.Feb 181waste
d2ca6e7This commit introduces a **compatibility layer** for JIT compilation within DeepSpeed, primarily to **resolve deprecation warnings** encountered when importing DeepSpeed on `torch==2.10.0`. It **refactors** several internal helper functions across the **Mixture of Experts (MoE)**, **ZeRO optimizer utilities**, and **sequence parallelism layers** by replacing direct calls to `@torch.jit.script` with a new utility decorator, `jit_script_compat`. This new utility, defined in `deepspeed/utils/torch.py`, conditionally leverages `torch.compile` for newer PyTorch versions while falling back to `torch.jit.script` for older ones. The change ensures **cleaner imports** and better alignment with PyTorch's recommended JIT compilation practices, improving **forward compatibility**.Feb 124grow
1752c2aThis commit **fixes gradient norm divergence** observed during **BF16 training with ZeRO stage 0** by addressing two critical bugs within the **DeepSpeed engine**. It resolves incorrect dynamic loss scaling application in `FP16_UnfusedOptimizer` and prevents unintended gradient accumulation caused by skipping `zero_grad` for BF16 without ZeRO. The **bug fix** disables loss scaling for BF16 and removes the `zero_optimization()` gate on `zero_grad`, complemented by a **refactoring** of the loss scaling mechanism to use a new `LossScaleConfig`. This ensures **stable and accurate gradient updates** for models leveraging these specific mixed-precision and optimization configurations.Feb 128maint
a44fb58This commit delivers a **bug fix** and **enhancement** for **DeepSpeed's Auto Tensor Parallelism (AutoTP)**, resolving critical issues with custom pattern configurations. It updates `deepspeed/module_inject/auto_tp.py` to correctly respect `use_default_specs: false` and disable traditional injection when custom patterns are enabled, ensuring proper module replacement. Additionally, `deepspeed/runtime/tensor_parallel/init_utils.py` is modified to automatically create a tensor parallel group during `deepspeed.initialize` if `mpu` is not provided, significantly improving **Hugging Face Trainer integration**. These changes make custom AutoTP patterns reliable and enhance the overall usability and compatibility of the tensor parallelism features.Feb 74maint
6b9cab1This commit introduces a **new capability** for **Automatic Tensor Parallelism (AutoTP)**, enabling users to define **custom layer partitioning patterns** via a flexible, configuration-driven API. This allows for precise control over how model parameters are sharded, supporting **any model architecture** including those with complex fused layers and unequal sub-parameter sizes, using regex patterns within the DeepSpeed configuration. The `deepspeed.initialize` function is enhanced to simplify AutoTP setup by integrating these configurations directly, while maintaining **backward compatibility** with previous initialization methods. This significantly improves the **extensibility and usability** of AutoTP for diverse and custom model training scenarios.Jan 3119grow
52b1d4dThis commit **fixes a race condition** within **DeepSpeed ZeRO3 leaf modules** during the backward pass, specifically when PyTorch's autograd concurrently triggers hooks for modules returning multiple outputs. This **bug fix** introduces **thread synchronization** in `deepspeed/runtime/zero/partitioned_param_coordinator.py` to ensure only a single thread handles parameter fetching for a leaf module, preventing concurrent modifications to internal parameter states. This significantly improves the **stability and correctness** of **ZeRO3** training, especially for models leveraging multi-output leaf modules. New tests in `test_zero_leaf_module.py` validate this thread-safe behavior.Jan 302waste
b19987cThis commit performs a **maintenance update** by upgrading the **PyTorch version** used within the project's continuous integration (CI) pipelines. Specifically, the **`accelerate` and `torch_latest` CI environments** are updated to use PyTorch **v2.9.1** from the previous v2.6.0. This **chore** involves modifying `ci/accelerate.py` and `ci/torch_latest.py` to reflect the new base image version. Additionally, `ci/torch_latest.py` adjusts the `pytest` command to align with the updated torch and cuda versions, ensuring tests are run against a more current and compatible deep learning framework.Jan 292maint
d9f3d40This commit **fixes a crash** in **DeepSpeed's ZeRO-3** by introducing a **clearer `RuntimeError`** when `GatheredParameters` are modified in-place without `modifier_rank` specified. Specifically, the `GatheredParameters.__exit__` method in `deepspeed/runtime/zero/partition_parameters.py` now detects and raises an actionable error, synchronized across ranks, instead of an obscure internal invariant assertion. Additionally, the `free_param` function now provides more informative error messages when parameters are still active in submodules. This **error handling improvement** enhances **debugging clarity** and the overall **developer experience** for users of ZeRO-3.Jan 282maint
3bdebc0Mar 31

This commit **fixes a CI failure** occurring in tests for **AutoTP (Automatic Tensor Parallelism)** and **universal checkpoint**. The issue, a "RuntimeError: Cannot re-initialize CUDA", arose because `torch.cuda.current_device()` was called prematurely during test setup under `pytest --forked`. To resolve this, a new method `_should_materialize_tp_partition` is introduced in `deepspeed/module_inject/layers.py` to conditionally skip constructor-time AutoTP materialization when no model-parallel group is provided. This **bug fix** ensures that **AutoTP** partitioning only occurs when an actual `mp_group` is present, preventing device placement issues and stabilizing the CI pipeline for these critical features.

1 fileswaste
36f0b0cMar 30

This commit introduces a **feature enhancement** to the **CI/CD pipeline** by implementing dynamic hardware detection for DeepSpeed's full test suite. It modifies the `.github/workflows/aws-torch-latest-full.yml` workflow to detect the **CUDA architecture** and the **number of GPUs** available in the test environment. These detected values are then set as environment variables, enabling adaptive configuration of DeepSpeed installation and test execution. This change provides a crucial **fallback mechanism** to improve the **reliability** of nightly full tests, specifically addressing recent failures by allowing the system to better utilize available resources like A100 nodes.

1 filesgrow
138f20dMar 25

This commit introduces a **backward compatibility fix** for DeepSpeed, specifically addressing issues when installing from source with **PyTorch versions older than 2.4**. It resolves a build failure caused by the absence of `torch.amp.custom_fwd` in older PyTorch releases, which was implicitly imported by DeepSpeed's `setup.py`. The **DeepSpeed runtime's Zero module** in `deepspeed/runtime/zero/linear.py` now includes a fallback mechanism, utilizing `torch.cuda.amp.custom_fwd` for these legacy environments. This ensures that users can **install and run DeepSpeed from source** on a broader range of PyTorch versions, with new unit tests verifying the correct `autocast` decorator behavior across different PyTorch versions.

2 fileswaste
784cc26Mar 13

This commit **fixes a critical bug** in the **Evoformer attention mechanism** that caused order-dependent failures during multi-architecture CUDA builds. It **refactors** the GPU architecture detection in `csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h` to enable runtime dispatch of appropriate kernels based on the device's compute capability. This ensures that **Evoformer** binaries built for mixed architectures (e.g., pre-Ampere and Ampere+) correctly select optimized kernels, deprecating the `DS_EVOFORMER_GPU_ARCH` build flag. The change improves the stability and performance of **Evoformer** across diverse GPU environments by providing a robust multi-architecture build and runtime solution.

4 filesmaint
6c59d54Mar 5

This commit delivers a **critical performance fix** for **DeepSpeed's ZeRO-enabled training**, resolving a regression where dynamic gradient hook counting caused significant overhead during the backward pass. It introduces a `should_refresh_expected_hook_count()` predicate to ensure the expensive hook count computation is performed only once per reentrant backward phase, rather than for every gradient hook. This optimization is applied across **ZeRO-1, ZeRO-2, and ZeRO-3 stages** by conditionally refreshing or reusing cached hook counts, and also includes resetting counters in `enter_backward()` to prevent pollution. The **performance improvement** is substantial, leading to a 2.5x speedup in backward pass iteration times for large transformer models.

4 filesmaint
4dba1e2Mar 4

This commit introduces a **documentation update** to the `docs/code-docs/source/training.rst` file, enhancing the project's user guidance. It adds a new section that **clarifies the behavior and usage of `torch.autocast` when nested**, specifically detailing its interaction within the **DeepSpeed engine**. This **documentation improvement** explains the rationale behind nesting `autocast` and provides guidance on when and why it is needed, thereby improving user understanding for developers utilizing these advanced training features.

1 filesmaint
04d69ccMar 2

This commit delivers a **bug fix** addressing a `RuntimeError` encountered during `import deepspeed` on PyTorch 2.3 with Python 3.12. The `deepspeed.utils.torch.py` module's `jit_script_compat` decorator was unconditionally invoking `torch.compile()`, which lacked Dynamo support for Python 3.12 in that specific PyTorch version, leading to import crashes. The fix introduces a version gate within `jit_script_compat` to prevent `torch.compile()` calls on known-unsupported combinations and implements a robust double fallback mechanism. This ensures **DeepSpeed can be successfully imported and utilized** on these specific platform configurations, significantly improving **compatibility**.

1 fileswaste
bffaf45Mar 1

This commit **fixes a critical bug** affecting **DeepSpeed ZeRO's parameter counting** mechanism when running on **PyTorch 2.3**. Previously, the `_get_grad_fn_or_grad_acc` lookup within `count_used_parameters_in_backward` in `deepspeed/runtime/utils.py` would fail in `no-grad` contexts during backward hooks, leading to crashes. The **bug fix** explicitly wraps this lookup with `torch.enable_grad()` to ensure proper gradient function retrieval, aligning with newer PyTorch behavior. This ensures **DeepSpeed ZeRO** can reliably count used parameters and operate correctly on **PyTorch 2.3**, with a new unit test added to validate the change.

2 fileswaste
efc0b49Feb 25

This commit **enhances the Automatic Tensor Parallelism (AutoTP) documentation** by **restructuring its navigation and content**. It **adds a new sidebar navigation link** for the `AutoTP Training` tutorial, while **renaming the existing AutoTP entry** to `AutoTP Inference` for improved clarity within `docs/_data/navigation.yml`. Furthermore, this **documentation update** includes **fixing broken internal links** within the `docs/_tutorials/autotp-training.md` tutorial and updating introductory notes in `docs/_tutorials/automatic-tensor-parallelism.md` to correctly reference the new training guide. These changes improve the **discoverability and accuracy** of AutoTP resources for users.

3 filesmaint
0416cf6Feb 24

This commit **schedules a nightly execution** for the **full unit test suite** within the **CI/CD pipeline**. It introduces a **new capability** by adding a schedule trigger to the `.github/workflows/aws-torch-latest-full.yml` workflow. This **maintenance improvement** ensures that the comprehensive tests run automatically every night, but intelligently, only when new commits have been detected since the last successful run. This regular, conditional execution helps in **continuously monitoring test stability** and **early detection of regressions** in the project's core functionalities.

1 filesgrow
93524c8Feb 22

This commit **fixes a regression** in the `TestZeroStaticScale` unit test, which was failing for **ZeRO optimizer** stages 1, 2, and 3. The issue arose from an incorrect assertion in `tests/unit/runtime/half_precision/test_fp16.py` that attempted to access `optim.loss_scale_config.dynamic_loss_scale`, a property not present in ZeRO optimizers. This **bug fix** reverts the assertion to the correct `optim.dynamic_loss_scale`, ensuring the **FP16 half-precision tests** accurately validate static loss scaling behavior for ZeRO optimizers. This restores the integrity and reliability of **ZeRO optimizer testing** for mixed-precision training.

1 filesmaint
57b10d5Feb 21

This commit **fixes a bug** in the **DeepSpeed ZeRO Redundancy Optimizer** by enhancing the `GatheredParameters` context manager. It introduces **sanity checks** within `deepspeed/runtime/zero/partition_parameters.py` to detect and prevent in-place modification of parameters when `modifier_rank` is `None`. Previously, an incomplete implementation failed to catch these modifications, leading to potential silent data corruption or unexpected behavior during distributed training. Now, attempting such an operation will correctly raise a `RuntimeError`, improving the **robustness and predictability** of the ZeRO optimizer. This change ensures data integrity and provides clearer error feedback to users of the **DeepSpeed ZeRO** optimization.

2 filesmaint
dbc1b07Feb 18

This commit **fixes compilation errors** on **HIP/ROCm (AMD)** platforms within the **DeepSpeed Inference v2** module. It addresses the absence of specific CUDA-style BF16 conversion intrinsics by introducing **platform-specific fallback implementations** in `deepspeed/inference/v2/kernels/includes/conversion_utils.h`. This **bug fix** ensures that integer, unsigned integer, and float to BF16 conversions, and vice-versa, are correctly handled on AMD GPUs. The change significantly improves **platform compatibility** for **DeepSpeed Inference v2**, enabling it to compile and run successfully on **HIP/ROCm** systems.

1 fileswaste
d2ca6e7Feb 12

This commit introduces a **compatibility layer** for JIT compilation within DeepSpeed, primarily to **resolve deprecation warnings** encountered when importing DeepSpeed on `torch==2.10.0`. It **refactors** several internal helper functions across the **Mixture of Experts (MoE)**, **ZeRO optimizer utilities**, and **sequence parallelism layers** by replacing direct calls to `@torch.jit.script` with a new utility decorator, `jit_script_compat`. This new utility, defined in `deepspeed/utils/torch.py`, conditionally leverages `torch.compile` for newer PyTorch versions while falling back to `torch.jit.script` for older ones. The change ensures **cleaner imports** and better alignment with PyTorch's recommended JIT compilation practices, improving **forward compatibility**.

4 filesgrow
1752c2aFeb 12

This commit **fixes gradient norm divergence** observed during **BF16 training with ZeRO stage 0** by addressing two critical bugs within the **DeepSpeed engine**. It resolves incorrect dynamic loss scaling application in `FP16_UnfusedOptimizer` and prevents unintended gradient accumulation caused by skipping `zero_grad` for BF16 without ZeRO. The **bug fix** disables loss scaling for BF16 and removes the `zero_optimization()` gate on `zero_grad`, complemented by a **refactoring** of the loss scaling mechanism to use a new `LossScaleConfig`. This ensures **stable and accurate gradient updates** for models leveraging these specific mixed-precision and optimization configurations.

8 filesmaint
a44fb58Feb 7

This commit delivers a **bug fix** and **enhancement** for **DeepSpeed's Auto Tensor Parallelism (AutoTP)**, resolving critical issues with custom pattern configurations. It updates `deepspeed/module_inject/auto_tp.py` to correctly respect `use_default_specs: false` and disable traditional injection when custom patterns are enabled, ensuring proper module replacement. Additionally, `deepspeed/runtime/tensor_parallel/init_utils.py` is modified to automatically create a tensor parallel group during `deepspeed.initialize` if `mpu` is not provided, significantly improving **Hugging Face Trainer integration**. These changes make custom AutoTP patterns reliable and enhance the overall usability and compatibility of the tensor parallelism features.

4 filesmaint
6b9cab1Jan 31

This commit introduces a **new capability** for **Automatic Tensor Parallelism (AutoTP)**, enabling users to define **custom layer partitioning patterns** via a flexible, configuration-driven API. This allows for precise control over how model parameters are sharded, supporting **any model architecture** including those with complex fused layers and unequal sub-parameter sizes, using regex patterns within the DeepSpeed configuration. The `deepspeed.initialize` function is enhanced to simplify AutoTP setup by integrating these configurations directly, while maintaining **backward compatibility** with previous initialization methods. This significantly improves the **extensibility and usability** of AutoTP for diverse and custom model training scenarios.

19 filesgrow
52b1d4dJan 30

This commit **fixes a race condition** within **DeepSpeed ZeRO3 leaf modules** during the backward pass, specifically when PyTorch's autograd concurrently triggers hooks for modules returning multiple outputs. This **bug fix** introduces **thread synchronization** in `deepspeed/runtime/zero/partitioned_param_coordinator.py` to ensure only a single thread handles parameter fetching for a leaf module, preventing concurrent modifications to internal parameter states. This significantly improves the **stability and correctness** of **ZeRO3** training, especially for models leveraging multi-output leaf modules. New tests in `test_zero_leaf_module.py` validate this thread-safe behavior.

2 fileswaste
b19987cJan 29

This commit performs a **maintenance update** by upgrading the **PyTorch version** used within the project's continuous integration (CI) pipelines. Specifically, the **`accelerate` and `torch_latest` CI environments** are updated to use PyTorch **v2.9.1** from the previous v2.6.0. This **chore** involves modifying `ci/accelerate.py` and `ci/torch_latest.py` to reflect the new base image version. Additionally, `ci/torch_latest.py` adjusts the `pytest` command to align with the updated torch and cuda versions, ensuring tests are run against a more current and compatible deep learning framework.

2 filesmaint
d9f3d40Jan 28

This commit **fixes a crash** in **DeepSpeed's ZeRO-3** by introducing a **clearer `RuntimeError`** when `GatheredParameters` are modified in-place without `modifier_rank` specified. Specifically, the `GatheredParameters.__exit__` method in `deepspeed/runtime/zero/partition_parameters.py` now detects and raises an actionable error, synchronized across ranks, instead of an obscure internal invariant assertion. Additionally, the `free_param` function now provides more informative error messages when parameters are still active in submodules. This **error handling improvement** enhances **debugging clarity** and the overall **developer experience** for users of ZeRO-3.

2 filesmaint

Work Patterns

Beta

Commit activity distribution by hour and day of week. Shows when this developer is most active.

Collaboration

Beta

Developers who frequently work on the same files and symbols. Higher score means stronger code collaboration.

NavigaraNavigara
OrganizationsDistributionCompareResearch