Pokedex Devlog #4
Heyyyyy Everyone!!!
The training of the model is finally completed and I have made the first version of the Pokedex app, It captures an image using react-native-vision-camera, crops only the middle portion, resizes the image to match my model’s image size of 224*224 px, then gets the raw pixelbuffer, converts into rgb and normalises the values from 0-255 to 0-1.0. This normalised pixel data is then fed to the model which predicts the class of the image. There were a lot of problems in making the app and there are still lot more to come
, but here is everything I did:
1. An Ironical Training
After collecting 75,000 images from my custom scraper, I zipped all of those images, and uploaded them directly to my colab notebook, uploading them took half an hour though😫 . I ran the training script at 50 epochs and training the model took another 8 hours
. It was really ironical to see the model train in google colab on the same images that google was trying to hide from my little scraper.
2. A Successful Model
After the training was completed, I had to test the model to see how it performed and whether I need to retrain it. The results shocked me, I gave the model 12 images from pintrest that it had never seen before and it predicted all of them correctly.
The accuracy was way better than what I expect from a model trained on raw data. However, if needed I will clean the dataset and train again.
3. React native Camera Hell
Everything was going very smoothly at this point, but I was never prepared for the next phase. I built a blank react native project, installed react-native-vision-camera and added a camera to the app.js, the camera was working, great, so I built a function to capture the photo and display it on the screen, that too worked. But the next step was to process the raw image pixel data into the format that my model required.
I searched online, how to resize an image in react-native-vision-camera, the ai results told me to use the resizer package that comes with it. I spent three, three whole days!!! figuring why the resizer was not able to resize the image only to realise that it took a frame output not an image, so I rewrote the whole camera code again
to use a live stream instead of capturing a photo. The resizer was working, I then normalized the pixel values while converting it into rgb format and fed the data to my model which outputed complete gibberish, all that work for nothing!!! 😭 I tried logging in the shape of input and output of the model, and the pixelformat of the bufffer and the raw pixel data. But all those were giving either undefined or 0. Nothing seemed to be working at this about, there was something seriously wrong with how I was processing the images. But then going through the docs of react-native-vision-camera, I saw that the image object that my previous capturing images code, that i had dumped, was outputting was a react-native-nitro-image object and it had an inbuilt resizeAsync function.
I got frustrated again and rewrote the entire code again, this time with the previous capture photo logic. After cropping, resizing and getting the rgb data in to the required format, I tested the app with a photo of Gloom displayed on my laptop screen and it predicted it correctly. I was sooooo happyyy that it worked and is still working tho the accuracy is low due to pixel noises.
What’s next?
The hardest parts of the project are completed now, I just need to an api call, after the model predicts the pokemon to, get the details of the pokemon, like it’s type, pokedex entry and other things. The UI also sucks so I will work on that, I think I want to make it space theme, I don’t know how it would look but I will try. Till then… stay tuned!!!