Guide by Jaryd; Raspberry Pi AI HAT 1 vs AI HAT 2: Which Should You Buy?

Hello forum goers,

Got a short and sweet guide for you, looking at the differences between the AI HAT 1 and the AI HAT 2. Similar boards but with very different capabilites, take a look: “Raspberry Pi AI HAT 1 vs AI HAT 2: Which Should You Buy?”





Read more

Hi Jaryd, Of all the show and tell videos out there, those created by Core are among the best. So having whet out appetite for the AI HAT +2 running a VLM .. will you guys be doing a show and tell on setting one up any time soon? I have a small group of final year degree students that are doing a group project where they are trying to make an aid for a visually impaired disabled person and I was going to point them in the directyion of the AI HAT +2. Cheers from the UK!

Hey @Robert312307,

Thank you for the kind words!

The software support for VLMs has been a bit rough on launch. The original AI HATs were a bit rough like this as well, and it took a few months before they became really polished. Using object detection and LLMs on the AI HAT 2 is super easy and straightforward, just waiting for VLMs to catch up.

Because of that, it might be a little while before we make some VLM guides for them as the current method of getting them going is a bit janky, and will likely break as an official method is released soon.

Until then, we do have a guide on using Moondream - a very lightweight VLM that can run on the Pi’s CPU. It is no where near as fast as the AI HAT running more powerful VLMs, but there is an option to use cloud processing on a very generous free plan if that works? If not, for simple yes or no answer, you can get the processing time down to about 8 seconds.

Hope this helps!

Thanks Jaryd. Useful response. We may need to revise our project goals for this current cohort.

Kind regards

2 Likes

I am looking for advice/guidance on the choice of AI HAT for a specific vision processing task. This in the POC phase currently, but I have a prototype running on a RPI4 hooked up to a GPS location reader and separately some code which uses Google cloud vision to extract a hand written (spray painted) 5 digit number from an image. The end goal is to have the Pi, GPS locator, and a camera on a forklift to identify the 5 digit number as it picks up an object and when it puts it down. The forklifts will have a WiFi connection but I am unsure of the latency involved in taking the image, then sending it to GC Vision and getting a result back. If the same level of accuracy can be achieved on-board, that issues goes away. I will have a “white list” of possible numbers which greatly improves accuracy ion any context.

1 Like

Hey @Andrew121107,

This sounds like it is definitely possible to do locally on the AI HAT. Note that the AI HAT will need a Pi 5 as it connects via it’s PCIe.

The biggest hurdle here would be attaining a model that could read the numbers - for a proof of concept I think you might be able to source a model someone has already trained. Something YOLO-based may be your best option, and its not too difficult to try and train your own. If you have a decent GPU (or a weaker one and a bit more patience for the training time), you can do it all locally with some photos of the numbers for training data.

My only concern is that the variations of the sprayed numbers would make it less reliable. If the numbers could be stencil sprayed on, then this suddenly becomes a WAY easier task.

Once you are happy with the yolo model, you would just need to use Hailo’s developer tools to convert the model to the format required for the AI hat, and run it! We have some guides on running it here to check out.

In terms of performance, YOLO comes in different sized models. The bigger the model, it is “smarter” but processes slower. With a medium sized model (which should be more than powerful enough for your needs) you could expect about 20 - 25 FPS coming from the AI HAT 1 (13TOP). I think from the camera firing to capture the image, to getting the detection result, you could expect under 100 ms (likely less). The more powerful AI HATs might not reduce the latency by much, but they will process more frames a second with the 26 TOP version getting about 50fps.

Some tips on developing this. The first step would be to source that YOLO model. The great thing about this model is that you can run it on nearly any computer, and you will get similar detection results to what you will get on the AI HAT. Processing speeds will vary, but detection results will be very similar - so you can validate the reliability of the model first. I would film some video on the pi 4 in your real world setting, and run it through the YOLO model on a desktop PC to check reliability.

Hope this helps!

Thanks very much @Jaryd - that helps quite a bit. Using a stencil is something I considered, but I am trying to avoid too many (or too large) changes to current work practices. One thing I did omit was using a distance sensor to trigger image capture when the object is close enough that the ROI fills most of the view/image. I use a ToF sensor in an unrelated home project, but was leaning towards an ultrasonic sensor for this project. My POC is switchable between OCR methods (Tesseract or Paddle) at the moment, and these highlight the difficulties with hand written/spray painted numbers. Neither produced usable results and both were significantly slower than sending the image to Google cloud vision (which did have a 98%+ accuracy score). For anyone interested, my fallback is UHF RFID tags instead of image recognition and I have ported the C/C++ library and drivers for the M5Stack UHF RFID unit to a python library that happily runs under micropython (I am using it on a RPi Pico) or CPython (on the RPi 4). There are still a few moving parts to pull all this together, but almost everything is there albeit not all tied together yet. I still have to build a “safe” interface to the 12v supply from the forklift, but the RPi5 will make a safe shutdown/startup a bit easier, and the extra processing power will not get wasted as there is a 7" touch screen in front of all this :slight_smile:. I am interested in the YOLO training side next - I do have 100s of sample images that I can use as input to the model - time to stop coding and start reading again !

1 Like