Showing posts with label: tensorflow. Show all posts.

PyTorch Update

Thursday 18 April 2019

After another couple of weeks using PyTorch my initial enthusiasm has somewhat faded. I still like it a lot, but I have encountered many disadvantages. For one I can now see the advantage of TensorFlows static graphs - it makes the API easier to use. Since the graph is completely defined and then compiled you can just tell each layer how many units it should have and it will infer the number of inputs from whatever it's input is. In PyTorch you need to manually specify the inputs and outputs, which isn't a big deal, but makes it more difficult to tune networks since to change the number of units in a layer you need to change the inputs to the next layer, the batch normalization, etc. whereas with TensorFlow you can just change one number and everything is magically adjusted.

I also think that the TensorFlow API is better than PyTorch. There are some things which are very easy to do in TensorFlow which become incredibly complicated with PyTorch, like adding different regularization amounts to different layers. In TensorFlow there is a parameter to the layer that controls the regularization, in PyTorch you apparently need to loop through all of the parameters and know which ones to add what amount of regularization to.

I suppose one could easily get around these limitations with custom functions and such, and it shouldn't be surprising that TensorFlow seems more mature given that it has the weight of Google behind it, is considered the "industry standard", and has been around for longer. But I now see that TensorFlow has some advantages over PyTorch.

Labels: python, machine_learning, tensorflow, pytorch
1 comments

PyTorch

Monday 08 April 2019

When I first started with neural networks I learned them with TensorFlow and it seemed like TensorFlow was pretty much the industry standard. I did however keep hearing about PyTorch which was supposedly better than TensorFlow in many ways, but I never really got around to learning it. Last week I had to do one of my assignments in PyTorch so I finally got around to it, and I am already impressed.

The biggest problem I always had with TensorFlow was that the graphs are static. The entire graph must be defined and compiled before it is run and it can't be altered at runtime. You feed data into the graph and it returns output. This results in the rather awkward tf.Session() which must be created before you can do anything, and which contains all of the parameters for the model.

PyTorch has dynamic graphs which are compiled at runtime. This means that you can change things as you go, including altering the graph while it is running, and you don't need to have all the dimensions of all of the data specified in advance like you do in TensorFlow. You can also do things like change the numbers of neurons in a layer dynamically and drop entire layers at runtime which you can't do with TensorFlow.

Debugging PyTorch is a lot easier since you can just make a change and test it - you don't need to recreate the graph and instantiate a session to test it out. You can just run an optimization step whenever you want. Coming from TensorFlow that is just a breath of fresh air.

TensorFlow still has many advantages, including the fact that it is still an industry standard, is easier to deploy and is better supported. But PyTorch is definitely a worth competitor, is far more flexible, and solves many of the problems with TensorFlow.

Labels: python, machine_learning, tensorflow, pytorch
No comments

CoLab TPUs One Month Later

Wednesday 31 October 2018

After having used both CoLab GPUs and TPUs for almost a month I must significantly revise my previous opinion. Even for a Keras model not written or optimized for TPUs, with some minimal configuration changes TPUs perform much faster - minimum of twice the speed. In addition to making sure that all operations are TPU compatible, the only major configuration change required is increasing the batch size by 8. At first I was playing around with the batch size, but I realized that this was unnecessary. TPUs have 8 shards, so you simply multiple the GPU batch size by 8 and that should be a good baseline. 

The model I am currently training on a TPU and a GPU simultaneously is training 3-4x faster on the TPU than on the GPU and the code is exactly the same. I have this block of code:

use_tpu = True
# if we are using the tpu copy the keras model to a new var and assign the tpu model to model
if use_tpu:
    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    
    # create network and compiler
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
        model, strategy = tf.contrib.tpu.TPUDistributionStrategy(
            tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
    
    BATCH_SIZE = BATCH_SIZE * 8

The model is created with Keras and the only change I make is setting use_tpu to True on the TPU instance. 

One other thing I thought I would mention is that CoLab creates separate instances for GPU, TPU and CPU, so you can run multiple notebooks without sharing RAM or processor if you give each one a different type.

Labels: machine_learning, tensorflow, google, google_cloud
4 comments

Keras

Wednesday 26 September 2018

When I first started working with TensorFlow I didn't really like Keras. It seemed like a dumbed down interface to TensorFlow and I preferred having greater control over everything to the ease of use of Keras. However I have recently changed my mind. When you use Keras with a TensorFlow back-end you can still use TensorFlow if you need to tweak something that you can't in Keras, but otherwise Keras just provides an easier to use way to access TensorFlow's functionality. This is especially useful for prototyping models since you can easily make changes without having to write or rewrite a lot of code. I used to write my own functions to do things like make a convolutional layer, but most of that was duplicating functionality that already exists in Keras. 

My original opinion was incorrect, Keras is a valuable tool for creating neural networks, and since you can mix TensorFlow in, there is nothing lost by using it.

Labels: machine_learning, tensorflow, keras
1 comments

When I began working on this project my intention was to do multi-class classification of the images. To this end I built my graph with logits and a cross-entropy loss function. I soon realized that the decision to do multi-class classification was quite ambitious, and scaled back to doing binary classification into positive and negative. My goal was to implement the multi-class approach once I had the binary approach working reasonably well, so I left the cross entropy in place.

Over the months I have been working on this I have realized that, for many reasons, the multi-class classification was a bad idea. For an academic project it might have made sense, but for any sort of real world use case it made none. There is really no use to outputting a simple classification for something as important as detecting cancer. A much more useful output is the probabilities that each area of the image contains an abnormality as this could aid a radiologist in diagnosing abnormalities rather than completely replacing her. Yet for some reason I never bothered to change the output or the loss function.

The limiting factor on the size of the model has been the GPU memory of the Google Cloud instance I am training this monstrosity on. So I've been trying to optimize the model to run within the RAM constraints and train in a reasonable amount of time. Mostly this has involved trying to keep the number of parameters to a minimum, but today I was looking at the model and realized that the logits were definitely not helping the situation.

For this problem classification was absolutely the wrong approach. We aren't trying to classify the content of the image, we are trying to detect abnormalities. The negative class was not really a separate class, but the absence of any abnormalities, and the graph and the loss function should reflect this. In order to coerce the logits into an output that reflected the reality just described, I put the logits through a softmax and then discarded the negative probability - as I said the negative class doesn't really exist. However the cross entropy function does not know this and it places equal importance on the imaginary negative class as on the positive class (subject to the cross entropy weighting of course.) This means that the gradients placed equal weight on trying to find imaginary "normal" patterns, despite the fact that this information is discarded and never used.

So I reduced the logits layer to one unit, replaced the softmax activation with a sigmoid activation, and replaced the cross entropy with binary cross entropy.  And the change has been more impactful than I imagined it would be. The model immediately began performing better than the same model with the logits/cross entropy structure. It seems obvious that this would be the case as now the model can focus on detecting abnormalities rather than wasting half of it's efforts on trying to detect normal patterns. 

I am not sure why I waited so long to make this change and my best guess is that I was seduced by the undeniable elegance of the cross entropy loss function. For multi-class classification it is truly a thing of beauty, and I may have been blinded by that into attempting to use it in a situation it was not designed for.

Labels: coding, machine_learning, tensorflow, mammography
No comments