I’m extremely happy to say that my Google Summer of Code project proposal was approved! That means I get to spend my summer working on my favorite project, PyPy, and combine it with NumPy. Hopefully the result will be an extra efficient NumPy implementation on PyPy.
This morning I met with Maciej Fijałkowski and Stéfan van der Walt, my unofficial mentor, representing PyPy, and my official mentor, representing NumPy respectively. Since exposing my proposal to the NumPy mailing list, it became apparent that my project, in order to best serve the existing NumPy users, would need to be approached differently. It seems that everyone uses their obscure corners of the library, and it would be tough to address only the most used parts and still make anyone happy.
Because of these needs I couldn’t really just hack away on a small subset of NumPy and have my project be useful at all. Originally it was proposed that we try and replace NumPy in a piecemeal approach with RPython code, however there were numerous technical problems and practical problems with this approach. (Such as the JIT not being so useful when there’s C code, and the extra overhead of trying to keep my code “plugged into” NumPy’s existing code) Stéfan came up with a sane approach, however. We will first port NumPy to PyPy by way of CPyExt (a growing PyPy sub-project allowing CPython extensions to be compiled to work with PyPy, described a bit here). This way, we will be completely compatible with NumPy on CPython. This allows us to do whatever we please with micronumpy. The idea as it is today is to provide both side-by side, and allow converting between the two via micronumpy.asarray()
and numpy.asarray()
or something of that sort. micronumpy arrays will likely be lacking in features, but blazingly fast by comparison, and NumPy arrays will be mostly identical in speed, and as CPyExt matures, I expect that speed to converge to CPython speeds). In the long term it would be great to make micronumpy arrays completely compatible with NumPy arrays. I plan to re-use as much of NumPy as I can for micronumpy arrays.
To Do
The question remains, then. How should I move forward from today? Well, I’m going to stay on top of my school work, and then on the 16th I’ll be able to really start working on this. Starting on the 16th this is what I need to accomplish.
- Clean up the existing micronumpy code. I worked on it during school and the result is functional, mostly, but it’s quite ugly, and is less than ideal in many ways.
- Eliminate trivial errors with building NumPy with CPyExt. Stéfan created a bug here.
- Port existing micronumpy code to low level RPython arrays, maybe should be done during the cleanup.
- Add
__array_interface__
to micronumpy arrays to facilitate interoperability.