xpu_graph

简介

面向司内异构硬件应用的图层级优化工具：

To User：实现业务积累的常见优化pattern
To Developer：支持便利的pattern编写
To Vendor：暴露公共的优化算子接口

下图是xpu_graph的架构图：

安装

xpu_graph 的安装方法如下：

pip install https://luban-source.byted.org/repository/scm/Seed.Foundation.xpu_graph_1.0.0.1.tar.gz

使用

xpu_graph 的使用方法如下：

Use as a torch.compile backend

def foo(x, y):
 z = x + y
 another_z = x + y
 return z, another_z
from xpu_graph.compiler import XpuGraph
compiled_foo = torch.compile(foo, backend=XpuGraph())
compiled_foo(torch.randn(10), torch.randn(10))

Configure

@dataclass
class XpuGraphConfig:
 """Configuration for XPU graph execution."""

 is_training: bool  # Must fill, if is_training is True, XpuGraph will work as a training compiler, otherwise a inference compiler
 debug: bool = False
 target: Target = field(default=Target.none) # Target hardware backend
 opt_level: OptLevel = OptLevel.level1
 dump_graph: bool = False
 enable_cache: bool = True
 use_xpu_ops: bool = False  # Use xpu_ops or not
 freeze: bool = (
     # Only take effects when "is_training" is False.
     # Freezing parameter will change model's parameter from inputs into attributes.
     # This may help XpuGraph do better constant folding.
     False
 )
 constant_folding: bool = True

 # Till now we only support configure "mode", because we mainly use "Inductor" as a vendor's compiler.
 # mode must be one of {"cudagraphs", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"},
 # we add a "cudagraphs" option. At this mode, XpuGraph will only enable torch.compile in-tree backend "cudugraphs".
 # https://pytorch.org/docs/stable/torch.compiler_cudagraph_trees.html
 vendor_compiler_config: Optional[Dict[str, Any]] = None