My mentor asked me to write about what we’ve done so far, how we got here, and where we’re going.
Most of the story of what’s happened from May 24 to now, is chronicled in this blog, however I’m going to make an attempt to summarize it a little to make a nice, digestible byte for you to consume.
To start, my school didn’t end until May 28th, and so I didn’t get anything done during my final’s week. Then, the next week or so my family from Nebraska and San Diego came to stay at my house for my younger brother’s celebrations. That is, my little brother achieved the rank of Eagle Scout in Boy Scouts, which is the highest rank, and he had also graduated highschool. Both celebrations were in the span of one hectic weekend, so, though I did shut myself in my room to get work done, my work really began after everyone left.
On the technical side of things, I’d been working on getting NumPy working on top of PyPy. Unfortunately, CPyExt, PyPy’s CPython compatibility layer is not as mature or complete as we had hoped, and so I spent the better part of June hammering on it and NumPy. My work on NumPy/CPyExt has been very tough. Implementing mostly wrapper functions in CPyExt for the sake of NumPy was the easiest part, but there are bugs lurking in CPyExt, and subtle incompatibilities with the CPython interface. I have a few functions sitting in my working copy that I want to commit to PyPy, but won’t until I take another look at them.
On top of this, NumPy doesn’t always play within the API and will go around touching your structs in their private members. I tried to eliminate NumPy’s bad touch, and those changes, (most heinous of which was in
PyArray_Scalar()) are in my github repository for NumPy.
In an attempt to make up for lost time, I’ve been doing a lot of work lately. I quickly (re)implemented micronumpy and it passes 50% of its tests. It does seem to have a somewhat obscure bug where it segfaults if you allocate (and subsequently collect?) too many micronumpy arrays. I haven’t had a chance to look into this just yet, but that’s my #1 bug to squash right now. Thanks to the awesome work from the PyPy guys with the JIT generator, my incredibly naive implementation of micronumpy arrays is already twice as fast as normal NumPy on CPython for the convolve benchmark. This is part of the beauty of PyPy, I didn’t do any JIT specific work, and the JIT has yielded a significant improvement. There are hints that I can and will give the JIT, to hopefully further improve performance. (After all, we want to beat Cython (well, that’d be nice, I’ll be happy to be within 20% or so))
A little about Subversion. I attempted to merge trunk back into the micronumpy branch. Mind you, not one file has a conflicting edit. The merge resulted in a series of conflicts. Subsequently, I tried to commit my latest changes including the convolve benchmark I used, and svn would not co-operate. Eventually I got a bit of help on
irc.freenode.net and started
svn updateing, and after around three hours, I finally was able to commit again. During the whole process I ended up writing a few bash one-liners and brainlessly
svn reverted my working copy of micronumpy. Luckily I still had
vim open with the latest changes that I could write back out, but that had me scared…
I’ve been developing some tools to make debugging CPyExt easier, so I’m very hopeful that I’ll have NumPy working on PyPy. The first is a little indented printer, which allows you to push/pop levels of indentation. I implemented it as a decorator which outputs function arguments, so I’m able to somewhat usefully, see the CPyExt call stack. Right now I’m interested in implementing a more useful printer for lltype.Struct, which currently only tells us it’s a struct, and what its fields are. I’d like to make it certain key fields from the struct as name/value pairs.
On top of this, micronumpy is coming together nicely, I still have to implement some of the tougher array access methods (multi-dimensional slices for instance) and those will be coming, hopefully I can scavenge them from the original micronumpy. My first priority, for micronumpy is to sort out this segfault that keeps occuring.
So that’s what has been, what is, and what will be. I hope someone out there is finding this interesting, and eventually people will find my work useful. Also writing this post has made me think more closely about how much time I’ve put into this summer of code. For a while I was feeling that maybe I wasn’t working enough, or hard enough, but writing this, and realizing how much time I’ve actually had to put in, puts it in perspective, and I feel much better.