In order to get NumPy to build against PyPy I’ve had to make numerous changes to the CPyExt module, many of them ugly.
It’s also become apparent that Cython and NumPy both depend directly on CPython, and as such they touch structures that they shouldn’t (as dictated by the C API) and will have to be modified. Unfortunately, not everything NumPy does appears possible through the C API and therefore extensions to the C API may be necessary, I’ll need further guidance on this, but I think it’s likely that I will add PyPy*() functions which may then be proposed for CPython (at which point they’d be renamed to Py*()). The only problem with that is the possibility that these functions would become relied upon with their PyPy*() name, which would create another problem. Of course, that’s still the right way to go, as it’s not guaranteed that any extensions to the API will be accepted by the CPython folks, especially not in their originally proposed forms.
I have to apologize, this past week was my finals week, so I didn’t really accomplish much during the first week of the GSoC, but starting today I’m getting going.
I’m extremely happy to say that my Google Summer of Code project proposal was approved! That means I get to spend my summer working on my favorite project, PyPy, and combine it with NumPy. Hopefully the result will be an extra efficient NumPy implementation on PyPy.
This morning I met with Maciej Fijałkowski and Stéfan van der Walt, my unofficial mentor, representing PyPy, and my official mentor, representing NumPy respectively. Since exposing my proposal to the NumPy mailing list, it became apparent that my project, in order to best serve the existing NumPy users, would need to be approached differently. It seems that everyone uses their obscure corners of the library, and it would be tough to address only the most used parts and still make anyone happy.
Because of these needs I couldn’t really just hack away on a small subset of NumPy and have my project be useful at all. Originally it was proposed that we try and replace NumPy in a piecemeal approach with RPython code, however there were numerous technical problems and practical problems with this approach. (Such as the JIT not being so useful when there’s C code, and the extra overhead of trying to keep my code “plugged into” NumPy’s existing code) Stéfan came up with a sane approach, however. We will first port NumPy to PyPy by way of CPyExt (a growing PyPy sub-project allowing CPython extensions to be compiled to work with PyPy, described a bit here). This way, we will be completely compatible with NumPy on CPython. This allows us to do whatever we please with micronumpy. The idea as it is today is to provide both side-by side, and allow converting between the two via
numpy.asarray() or something of that sort. micronumpy arrays will likely be lacking in features, but blazingly fast by comparison, and NumPy arrays will be mostly identical in speed, and as CPyExt matures, I expect that speed to converge to CPython speeds). In the long term it would be great to make micronumpy arrays completely compatible with NumPy arrays. I plan to re-use as much of NumPy as I can for micronumpy arrays.
The question remains, then. How should I move forward from today? Well, I’m going to stay on top of my school work, and then on the 16th I’ll be able to really start working on this. Starting on the 16th this is what I need to accomplish.
- Clean up the existing micronumpy code. I worked on it during school and the result is functional, mostly, but it’s quite ugly, and is less than ideal in many ways.
- Eliminate trivial errors with building NumPy with CPyExt. Stéfan created a bug here.
- Port existing micronumpy code to low level RPython arrays, maybe should be done during the cleanup.
__array_interface__ to micronumpy arrays to facilitate interoperability.