It's a very common question in the field and has a lot of answers on the internet. However, I would want to answer this based on my own personal experience training, deploying, and maintaining ML models using both frameworks. The keyword that I want to highlight here is deploying
and maintaining
. This factor often mislooked by fellow data scientists because we tend to not think further and not involve more than the model training process.
Also, unlike many other articles, in the end of this blog I will give my clear-winner answer and not a vague pros-and-cons answer. However, it is not to devalue the works done by both frameworks' maintainers, just mere personal choice from 2 options.
Tensorflow Versions Incompatibility
An ML model trained on let say tensorflow v2.13
cannot be imported and used using other versions like v2.12
and v2.14
. This introduces a lot of problem in the productions.
Considering you have 2 ML models, one is trained on v2.12
and another one trained on v2.13
. In a scenario where you would want to deploy both models on a single environment/runtime, you are very much not possible to do that because you can't install both versions. The only way is to have 2 different runtimes which can be translated into perhaps 2 codebases and 2 servers that basically means double the work and double the cost.
Other than that, let say you have a legacy model which has not been properly documented. You would need to trial-and-error with different versions of tensorflow just to find the correct one. To me that is just ridiculous.
CUDA Installation
I have an Ubuntu machine which I use for daily dev-works. I feel like this is a key information to understand the situation, I'm not even using Windows for this. In my attempt to install the GPU driver, CUDA, cuDNN etc, it seems like there's a lot of ways to install these drivers and libraries. However to my headache, nothing seems to work. My tensorflow seems to never able to run on CUDA. I even fresh install the OS trying to fix it. Some might say "skill issue", but I feel this is too complicated compared to my experience working with Pytorch which I will share later.
Pytorch does not have the versions incompatibility issues. It might, between the 2 major releases such as v1
and v2
which is understandable but not between the minor releases. This solves all the painpoints I mentioned earlier. Easier to deploy, easier to maintain, less tech debt.
The CUDA installation is very much straight-forward. All you need to do is making sure your device already have GPU driver installed . The CUDA installation will all be handled by the pytorch. During pytorch installation, you can specify if you want to use CPU or CUDA. If you pick CUDA, the installation will be bundled together with CUDA-related libraries and you pretty much ready to go.
https://pytorch.org/get-started/locally/
Use pytorch.