More NumPy, Please

What are the good ideas from NumPy?

To what can Numpy attribute its wild success? I’d say, along with others, it’s due to several features.

NumPy:

The cumulative effect is that, with NumPy, Python is a very productive language for both high-level programming that can get to good-enough performance, and easily interface with C-compatible languages. Given timings and the vagaries of history, that has helped to make Python one of the most widely used languages around.

What is NumPy not good at? Some of these are not fair, exactly, since they’re things that I would never expect or want a simple ndarray library to support, but they have come up in various contexts in the past:

I’ve come across a few ML libraries, implemented in C with Python bindings, that use NumPy arrays extensively, but have to go through painful and error-prone contortions because C has so few built-in data structures. For instance, one library had to resort to sorting arrays and doing a binary search via libc’s bsearch to do a set-contains equivalent.

What if we apply ideas from NumPy to other data structures?

What if we did for the dictionary, or set (or myriad other data structures, but let’s start with the basics) what NumPy did for the List? By that I mean, what if we had a Python dict-like data structure (let’s call it a “dtyped-dictionary” for lack of a better term) that had typed keys and values and was backed by an efficient, low-level hash-map? This is almost certainly not a new idea and has likely been implemented in various forms all over the place. I think it’s still worth exploring, particularly in conjunction with some other ideas.

In particular, if this could be made to work well with the other dtype’d data structures, then here are a few things this would enable:

We’d still have to address some of the NA issues with NumPy, and we’d have to implement O(n) to_python() and from_python() functions or methods that convert between pure python versions and the dtyped versions, and we’d have to think carefully about nested JSON-like data, but I think this could be really interesting, especially if the pieces compose well together.

Other thoughts