Performance Update

As promised, I haven’t just dropped micronumpy, I’m continuing to work on it. As of September 10th, 2010 micronumpy takes 1.28736s per iteration over the convolve benchmark, and NumPy on CPython takes 1.87520s per iteration. This is about a 31.3% speedup over NumPy, I didn’t record the exact numbers near the end of the SoC but I believe I’ve made things slower still… On the bright side, I’m passing more tests than ever, and support slicing correctly. On the downside, I have no idea why it’s slower, I eliminated a whole loop from the calculation, I expected at least a moderate gain in performance… I’m investigating now, so I’m keeping this short.

Performance Update

When All is Said and Done

In the Beginning

Back when I was young and naïve at the beginning of the summer, I proposed to continue the work that a few PyPy developers and I had worked on, a reimplementation of NumPy in RPython. The project holds a lot of promise, as PyPy can generate a JIT compiler for itself and its components written in RPython. With a NumPy array written in RPython, the PyPy JIT can see inside of it and from that can make far more optimizations than it could otherwise. Since the PyPy JIT is especially good at optimizing CPU/computationally expensive code, bringing the two together could go a long way to bridge the gap between Python performance, and statically compiled languages.

As luck would have it, my project was categorized by the Python Software Foundation as a NumPy project, rather than a PyPy project, whose developers I’d been bugging and asking questions for some time. I soon came into contact with Stéfan van der Walt, a member of the NumPy Steering Committee. After consulting with him and the NumPy mailing list, it was decided that most people would not find a super fast NumPy array very useful by itself. For it to matter to most people, it would need to be able to do everything that the existing NumPy array does, and someone brought up the point that there is a great deal of C and Cython code already written which interacts with NumPy arrays, and it’s important that my project would allow these things.

So my project ballooned to a huge size, and I thought I could handle it all. The new burden of full compatibility was to be attacked by porting NumPy to PyPy and providing an easy interface for switching to and from NumPy and micronumpy arrays. Unfortunately, this pursuit wasn’t very fruitful, as PyPy’s CPyExt isn’t yet equipped to handle the demands of a module as all encompassing as NumPy. I spent a fair amount of time simply implementing symbols to satisfy the dependencies of NumPy. I made some significant changes to NumPy which are currently sitting in my git repository on github. I don’t know what the future holds for them unfortunately (If the NumPy refactor is completed soon enough, I may be able to sidestep CPyExt which will be faster anyways).


Around midterms I had micronumpy arrays working reasonably well enough that they could run the convolve benchmark, and handily beat NumPy arrays (twice as fast is fairly impressive). However, the point is to demonstrate that the JIT can speed up code to near compiled code, theoretically removing the need to rewrite large portions of python code in C or Cython. By this time, it was becoming clear that getting NumPy to work with PyPy was not going to happen over the summer. I’ve adjusted my expectations, NumPy working on PyPy is still on my TODO list but won’t be completed this summer. This might be for the better anyways, as NumPy is being refactored to be less Python (and therefore CPython) centric, as a result, in the near future I may be able to completely avoid CPyExt and use RPython’s foreign function interface to call NumPy code directly.

The Final Stretch

One of the beautiful things about PyPy’s JIT is that it’s generated, not hard coded, so I didn’t have do to anything in order to have micronumpy be JITed. Unfortunately, in the past three days or so, I’ve discovered that my code no longer works with the JIT. I’ve done all I can to figure out what’s wrong, and I can’t fix it on my own. Diving into the JIT in the last 24 hours of the summer of code surely won’t bear any fruit. I’ve put up my distress signal on the mailing list, and hopefully this issue can be resolved in time to provide some awesome benchmark results. If not, at least I can get this resolved in the next couple of weeks and then move on to the other things I want to fix.

EDIT: Thanks to the help of the core PyPy developers we determined that the problem was that arrays allocated with the flavor “raw” were being accepted. Apparently these arrays still have length fields, by using rffi.CArray I was able to instruct PyPy to construct an array without a stored length field.

I’d also like to add that in the final hours, we added support for the NumPy __array_interface__ so that as soon as NumPy is working on PyPy, NumPy can take micronumpy arrays and do all sorts of useful things with them, and then when you need speed for simpler operations, you can pass your NumPy arrays to micronumpy (this side of the transaction hasn’t been implemented yet).

The End

So here we are at the end of the Summer of Code, and my project isn’t where I wanted it to be. Specifically, given my addition of slice support, performance has dropped to around 50% faster than NumPy, even farther from my goal, so that’s my top priority to address in the coming weeks. In my previous blog post I outlined what my plans are for the future (as I don’t like leaving things undone). Basically it comes down to:

  • Optimizing!
  • Minor compatibility fixes
  • Bridging NumPy and PyPy completely


I’d just like to thank Google very quickly, and specifically Carol Smith, who has done a great job of managing the Google Summer of Code this year. I thoroughly enjoyed the program, and would love to do it again given the chance. I’ve learned a lot about writing software, dealing with deadlines, and time management (which is a skill I’ve let atrophy…) this summer. And thanks to you who’ve taken interest in my project. If you want to check back occasionally, the summer may be over, but my project isn’t, and I’ll be sure to brag about benchmark results as soon as they’re more favorable :-).

I’d also like to thank my mentor, Stéfan van der Walt for his help throughout my project, for being supportive and understanding when unexpected problems occurred and set us back. And I’d like to thank Maciej FijaŃkowski for his support from the PyPy side. The rest of the PyPy developers have all been helpful at some point or another, so thanks to them too.

When All is Said and Done

The End is Nigh!


We’re already past the suggested pencils down date for the Google Summer of Code, and I’m certainly paying my penance for my previous sloth. Just last night I got the test suite passing again, after several hours of hacking. Advanced indexing is nearly done, which is wonderful. I currently have one slice-related issue, which I’ll hopefully be resolving in the next couple of hours.

The Next Week

As soon as I have this indexing done, it’ll be time to optimize. Maciej was kind enough to show me how to get tracing information, so that I can produce the most JIT friendly code I can. I probably will spend the next five days working on tuning that. The original goal, of course, was to be near Cython speeds using the JIT, and we were nowhere near that on the first pass (though twice as fast as CPython and normal NumPy). Unfortunately, with the addition of the advanced slicing, I may have made Cython speeds harder to achieve. Hopefully the JIT will be ok with my first pass with slicing, however I’m prepared to revert to “dumb slicing” for the end of the GSoC and resume advanced indexing support after it is over. I’d feel bad about that though, as that’s been my major stumbling block these past weeks. In the next few hours I need to make sure everything translates so that I have something to show to Stéfan tonight.


My biggest regret from the summer of code, is that I haven’t succeeded in porting NumPy to PyPy. This is something I hope to address in my free time this coming semester. This will require extensive work on CPyExt which is a complicated beast.

Additionally, I want to make sure that micronumpy is as useful as it can be, and that’s something that should be pretty easily accomplished in my free time. This will include implementing basic math operations, and some ufuncs. I may make a first pass at everything with naïve implementations written in applevel Python, then progressively optimize things. Depending on the progress for the refactoring of NumPy, I might be able to plug in some NumPy code for fast implementations of some things which would be great.

Back to work with me,
The End is Nigh!

Camping and NumPy


So I’ve been back from camping for around a week, and it definitely derailed my train of thought… Subsequently, I went to Reno, Nevada with my girlfriend to meet up with one of her friends. We stayed at a casino in a nice room for 30 USD which is pretty awesome. But enough about me…

I’m starting to worry a bit about my progress, the past two trips have put me behind (I’ve only been gone a cumulative five days, but the interim days were mostly unproductive as well. I tend to code straight through the night if I’m on a roll, because even the eight hours I would sleep might throw off my current train of thought, so this traveling has been unhelpful)


What I have been working on, is advanced array indexing in micronumpy. I’ve pretty much broken indexing for the moment, but out of this should come slicing and ellipses support, because we don’t all use simple indexing. I’m afraid that with this significantly more complicated indexing scheme, is going to come alot of overhead, so it may set back performance, we’ll see. I’ve tried my best to put the common case first (single dimensional index handled first, then simple multi-dimensional indices). I’m actually not sure how much PyPy will be able to optimize out via JIT compilation, since dynamic types become static for the JIT’s purposes. I may find that the extra complexity is irrelevant to the JIT-ed code.

There’s more to say, but I should get back to work 🙂

That’s all for the moment.

Camping and NumPy


My mentor asked me to write about what we’ve done so far, how we got here, and where we’re going.

Most of the story of what’s happened from May 24 to now, is chronicled in this blog, however I’m going to make an attempt to summarize it a little to make a nice, digestible byte for you to consume.


To start, my school didn’t end until May 28th, and so I didn’t get anything done during my final’s week. Then, the next week or so my family from Nebraska and San Diego came to stay at my house for my younger brother’s celebrations. That is, my little brother achieved the rank of Eagle Scout in Boy Scouts, which is the highest rank, and he had also graduated highschool. Both celebrations were in the span of one hectic weekend, so, though I did shut myself in my room to get work done, my work really began after everyone left.

On the technical side of things, I’d been working on getting NumPy working on top of PyPy. Unfortunately, CPyExt, PyPy’s CPython compatibility layer is not as mature or complete as we had hoped, and so I spent the better part of June hammering on it and NumPy. My work on NumPy/CPyExt has been very tough. Implementing mostly wrapper functions in CPyExt for the sake of NumPy was the easiest part, but there are bugs lurking in CPyExt, and subtle incompatibilities with the CPython interface. I have a few functions sitting in my working copy that I want to commit to PyPy, but won’t until I take another look at them.

On top of this, NumPy doesn’t always play within the API and will go around touching your structs in their private members. I tried to eliminate NumPy’s bad touch, and those changes, (most heinous of which was in PyArray_Scalar()) are in my github repository for NumPy.


In an attempt to make up for lost time, I’ve been doing a lot of work lately. I quickly (re)implemented micronumpy and it passes 50% of its tests. It does seem to have a somewhat obscure bug where it segfaults if you allocate (and subsequently collect?) too many micronumpy arrays. I haven’t had a chance to look into this just yet, but that’s my #1 bug to squash right now. Thanks to the awesome work from the PyPy guys with the JIT generator, my incredibly naive implementation of micronumpy arrays is already twice as fast as normal NumPy on CPython for the convolve benchmark. This is part of the beauty of PyPy, I didn’t do any JIT specific work, and the JIT has yielded a significant improvement. There are hints that I can and will give the JIT, to hopefully further improve performance. (After all, we want to beat Cython (well, that’d be nice, I’ll be happy to be within 20% or so))

A little about Subversion. I attempted to merge trunk back into the micronumpy branch. Mind you, not one file has a conflicting edit. The merge resulted in a series of conflicts. Subsequently, I tried to commit my latest changes including the convolve benchmark I used, and svn would not co-operate. Eventually I got a bit of help on #svn on and started svn reverting, rming, and svn updateing, and after around three hours, I finally was able to commit again. During the whole process I ended up writing a few bash one-liners and brainlessly svn reverted my working copy of micronumpy. Luckily I still had vim open with the latest changes that I could write back out, but that had me scared…


I’ve been developing some tools to make debugging CPyExt easier, so I’m very hopeful that I’ll have NumPy working on PyPy. The first is a little indented printer, which allows you to push/pop levels of indentation. I implemented it as a decorator which outputs function arguments, so I’m able to somewhat usefully, see the CPyExt call stack. Right now I’m interested in implementing a more useful printer for lltype.Struct, which currently only tells us it’s a struct, and what its fields are. I’d like to make it certain key fields from the struct as name/value pairs.

On top of this, micronumpy is coming together nicely, I still have to implement some of the tougher array access methods (multi-dimensional slices for instance) and those will be coming, hopefully I can scavenge them from the original micronumpy. My first priority, for micronumpy is to sort out this segfault that keeps occuring.


So that’s what has been, what is, and what will be. I hope someone out there is finding this interesting, and eventually people will find my work useful. Also writing this post has made me think more closely about how much time I’ve put into this summer of code. For a while I was feeling that maybe I wasn’t working enough, or hard enough, but writing this, and realizing how much time I’ve actually had to put in, puts it in perspective, and I feel much better.



Unofficial Results

So, in order to make up for lost time, I’ve been coding a significant amount the past 48 hours, near non-stop. My labor has finally bore fruit.


Micronumpy found at (Click to browse the source, copy the link name for the checkout address, full instructions here) has been rewritten to use lltype.Array and now supports enough to run the convolve benchmark found in pypy/tool/ which is a slightly modified version of this. Unfortunately, pypy segfaults if the test is run more than 20 times by timeit. However, with 20 repetitions, a 200 by 200 image, and a 3×3 kernel, PyPy handily beats CPython and plain NumPy.

Command Average Time per run (Seconds)
./pypy-c ~/Projects/micronumpy/pypy/tool/ 200 200 3 3 0.38495
python ~/Projects/micronumpy/pypy/tool/ 200 200 3 3 0.85705

That’s an improvement of about 55% shorter runtime!

That’s not bad for a first iteration (Even if the iteration’s been a while in the works). Now note that the benchmark as it is right now does not provide for a warmup period, which should improve PyPy’s score even more. Whether or not a warmup period is more or less of a measurement of what we care about, I’m not sure. Also note that, while the times have remained stable, they are likely susceptible to my other processes on this computer. I plan on writing a follow up article which will hopefully have more interesting results, and maybe even a graph!

NumPy on PyPy

Though it’s taken far longer than any of us expected(Except maybe Maciej). I’m very hopeful that in the next week or so I can have NumPy running in a somewhat stable fashion on PyPy. Currently I’m hacking on CPyExt/lltype to give more useful str() values, rather than <* <Array of Char> at 0xDEADBEEF> I hope it will look more like <* <Char array = "spam"> at 0xDEADBEEF> which I think will be a useful debugging tool for everyone.

More on everything after the midterm reviews 🙂

Unofficial Results

A Bit Behind Schedule

Life seems intent on limiting the time I can spend on my GSoC project. The first wave of family arrived at my house today, and more still will be coming. I have a lot to do to help my family host. Last week we set the goal that I should have NumPy working on PyPy by the end of last week. (which we’ll call Tuesday since that’s when we set those goals forward) In addition I was intended to have started fixing up micronumpy.

While I haven’t started on micronumpy, I have made significant strides in getting NumPy to work on PyPy. Unfortunately, CPyExt isn’t as mature as we’d hoped, and rather than working on NumPy, I’m spending my time implementing C API functions in PyPy. NumPy still doesn’t import, but I can see from the output of nm -u the number of missing symbols is being slowly whittled down. I should also note that NumPy’s has caused me significant grief since its ‘clean’ target doesn’t clean up after build_ext -i at the moment I haven’t identified whether that’s a bug in NumPy’s copy of distutils (gross in and of itself) or if it’s a nasty interaction between it and CPyExt’s which creates a stub dll/so for testing purposes.

Now on to the positive side of things, since I began writing this post (about a week ago) I’ve satisfied the last of NumPy’s Symbol dependencies it seems, now there are still issues with importing the module, for instance some API functions get passed non-Python objects (probably being double freed) but I’m not sure how best to track this all down. I’m not sure whether I just haven’t found the source, or if it has to do with output redirection and buffering, but no matter where i’ve put printfs (in c) and prints (in python) they always end up after the exception is thrown in the output…

EDIT: It’s almost certainly turned out to be a buffering/stderr vs. stdout issue, how I’m going to get Python buffering to play nice with C is a mystery to me though… Perhaps if I turn off buffering…

So, sorry for the absence, things are still moving along, though I definitely have a bit of catching up to do.

A Bit Behind Schedule