Kuan-Hsien Lee
NumPy is fast, but how can we go faster? In this talk, we will show how to make a faster backend for array processing by writing high-performance code from scratch. We will explore how to construct a NumPy-like array engine in C++, integrating hardware acceleration features such as single-instruction multiple data (SIMD), and exposing it to Python using Pybind11. Comparison of runtime performance to NumPy is made to analyze how well it works and plan for continuous improvement. This talk will provide practical insights for Python users interested in low-level performance optimization and extension development.
NumPy is a popular choice for array operations and numerical computation in Python, offering a simple interface and excellent performance. However, its architecture—built primarily around the Python C-API—can be difficult to integrate into applications where tighter control over system components is required. In our project, we implemeted a custom array computation backend in C++ to meet integration requirements specific to our numerical application. To utilize the usability benefits of Python, we exposed this backend via Pybind11. In this talk, I will share our experience building this system, with a focus on achieving runtime performance comparable to NumPy through low-level optimization and exposing the backend with a clean Python interface.
プロフィール