Deep Learning frameworks, Tensorflow, Swift and Fast.ai29-Mar-2020
- machine learning
After introducing some of the very basics in data science tooling, with numpy and Jupyter notebooks in a previous post, here I'll explore the state of the art in deep learning frameworks. This field is still moving fast so we'll end the post with a recommendation on what to learn and use going forward.
Google created version 1 of Tensorflow with a static graph based model because Python is a slow, interpreted language. Still it’s the most used language in data science and machine learning, with a lot of libraries available. So the idea to speed up the pipeline was to create a framework out of a graph model so that you can script the graph using Python with the high level APIs and Tensorflow would take care of the fast execution through lower level C++ code. The problem with this approach is that it’s somewhat static in nature, so that if all you need to do can be modeled beforehand on the graph then great but otherwise you’d run into issues and the whole pipeline would become really cumbersome.
Facebook solved this by adopting a dynamic approach when they built the PyTorch machine learning library, with a big focus on Natural Language Processing. So PyTorch is a thin Python layer on top of a fast C++ runtime. In fact when you need to deploy Machine Learning model today, it’s usually compiled C++ code and not Python code. Python code is used for the higher level modeling and then you deploy production code: for PyTorch this is done through TorchScript and in Tensorflow through Tensorflow Graph. This is mostly because of Python being so slow in production, especially for machine learning systems.
Google took notice and started rearchitecting Tensorflow with a more dynamic approach. One of the newer components is Swift4Tensorflow (Swift4TF). It’s the open source Swift language, created by Apple around 6 years ago for iOS and Mac development, that is now applied to solve a lot of these issues in the Machine Learning field.
What’s a Machine Learning model?
It’s essentially a function that modifies an input in some ways and returns an output. But instead of doing this manually all the way through with code like in standard programming, a machine learning model is first structured manually but then learns the weights of the function by itself. This learning process differs depending on the model/architecture being used. In Neural Networks and deep learning, this is usually called backpropagation. When you are training you feed data in the model and it will accumulate gradients, the mathematical objects that will allow the model to optimize its learning process and come to a conclusion regarding the final output.
Swift 4 Tensorflow
Swift was developed by Apple as a modern replacement to Objective-C and it’s now being used by millions of developers to build apps for the Mac, iPhone, iPad, Apple Watch and AppleTV. It is way faster than most languages and very developer friendly. It’s open source and runs on Linux as well so it’s also becoming popular in server-side web development. Frameworks like Kitura (made by IBM) and Vapor add to Swift a lot of libraries that are crucial for web development, like an ORM layer to connect to databases and many others. Swift is a progressive language. For example compared to other performance oriented languages like Rust, Swift has dynamic memory management with reference counting through ARC. With Rust you have to think about memory management constantly as it’s done statically with lots of code annotations. In Swift you only think about that until you hit performance cliffs in your code and it even lets you own memory management completely if you need to. So it achieves a very good tradeoff between easy to use but very heavy models like garbage collectors as in Java and constant memory ownership like C and Rust. Julia is also a good language that's being used in the numerical programming and data science community. Language features aside, the Swift community is at this point bigger and includes several communities like server-side development (just like Python).
So Swift for Tensorflow allows several benefits including much faster execution times and integrated deployment for machine learning systems, so that all the problems outlined above get fixed. This also comes with good developer experience so that it’s integrated with existing tools data science people are familiar with like Jupyter notebooks and even Python integration!
To calculate the gradients, Swift4TF introduces differentiable programming so that derivatives and gradients are now first class citizens of the language. This implies that instead of getting runtime errors like in Python based pipelines like in traditional Tensorflow or PyTorch, you can now get those straight in your IDE (Xcode or even Jupyter notebooks) making the whole development process more efficient. This autodifferentiable feature of Swift4TF, given how Swift works, can be applied to any data type and custom data types. Like a quaternion (custom data type) or a float (standard data type). This capability doesn’t really exist in other Machine Learning frameworks and languages (hacks aside). The Swift4TF team is working to eventually integrate these features in standard Swift so that hopefully in the future we can use these features in standard Swift, inside iOS or Mac apps. Nevertheless even today this makes it easier to integrate the ML frameworks Apple built for its own ecosystem, like CoreML, into Mac and iOS apps. So that you can for example leverage the TPU or Neural Engine in your iPhone to do edge machine learning on-device.
As mentioned another feature of Swift4TF is Python integration (via linking the Python interpreter). This means you can use it inside Jupyter notebooks and have interoperability with Python itself and its vast libraries. So you can use Numpy for example. There are several tutorials available. The features that allow this complete interoperability have recently shipped in the official Swift 5.2 language. Swift can run on Mac, Windows, Linux, iPhone/iPad (Swift Playground) and even hosted in the browser (IBM cloud or Google Colab).
So to recap, Tensorflow is undergoing a lot of structural changes. On the language side you have innovation from Swift and on the compiler and execution side you have MLIR, which has just been released.
Compilers and LMIR
If you want to talk to a CPU so all you have is roughly C-type code with vectors, LLVM is a great compiler. If you need to talk to an accelerator like a TPU, you ideally need something different. That’s where LMIR comes in. It’s a way more flexible approach and completes the ambitious new pipeline for machine learning and beyond, from embedded low power devices to clouds and supercomputers.
Here Jeremy Howard goes through their journey of trying the different deep learning frameworks and expands on some of the points I outlined above, including why Python for Tensorflow is not the way to go.
This segment is great until minute 8 when he starts talking about CoreML, which tells me he never really used it (yet). Jeremy doesn't really focus in ML for edge computing (a model running on your smartphone) and even in his courses he says that different models and architectures would be needed for on-device machine learning than the ones he uses and teaches. So in his lectures he suggests to make a mobile app talk to the cloud and have deep learning run there as usual. While that's what most mobile apps do today anyway, CoreML focuses on on-device machine learning so that the model can run on the Neural Engine (the TPU inside iPhone and iPad processors). This allows for privacy-preserving federated learning, differential privacy, better overall efficiency as you don't have to talk to the cloud, etc.
For these use cases and especially if you're not talking about big data, CoreML is really good and easy to use. There's even a GUI called CreateML, that lets you create models and apply deep learning in your app without any coding if you want to. Swift4Tensorflow should enable way more interoperability in the future even inside CoreML, I'm curious on what CoreML 3 and future versions will bring.
Learn practical Deep Learning
Fast.ai is both a deep learning course and a library. The library is supported by the major cloud providers so you can use it in production as well if you want to.
It's based on PyTorch and Python for good reasons but you'll see that parts of it are available in Swift as well. As we said you can mix and match Swift and Python and use both at the same time. In the next few years the Swift data science community will grow and in the meantime I recommend learning and using the Fast.ai 2 library, the new version that is just coming out. If you're reading this after June 2020, the new course should be available at Fast.ai.
Otherwise if you want to get started now I recommend watching the course below. It's modeled after the 2019 Fast.ai course but uses some of the new APIs introduced in version 2 of the library. You can follow along with the notebook and its github.