Getting Started with YOLO Object and Animal Recognition on the Raspberry Pi

Hey @Piers298469, welcome to the forum!

The newest version of the Ultralytics package seems to of broken the installation on some devices. You might wanna give this line a go instead

pip install ultralytics==8.3.100

It installs one of the newer version before the break.

Let us know how you go!

1 Like

Hi Jaryd,

I got it to install with the --no-cache-dir and --prefer-binary options, so

pip install ultralytics[export] —prefer-binary —no-cache-dir

Works very well using ncnn model Yolo11n (not Yolov11n).

Thanks for your speedy and helpful input!

Cheers,

Piers

2 Likes

Hey @Piers298469,

Fantastic timing, I just booted up the Pi to investigate a good fix for this as well as starting a new guide. Cheers for the fix, it’s great to see you got it going and I’ll give it a spin, appreciateit.

Currently investigating YOLOE and it’s one of the coolest things to come out of vision models in a while - a model that can detect objects based on a prompt, even if has never been trained to detect it. We touched on YoloWorld in this guide but YOLOE is a hugely improved version of it. I definitely recommend checking it out as you should be able to use it without any additional installations.

Cheers for the work!
Jaryd

2 Likes

I’m still trying object tracking using YOLO11n on my Pi5 with the V2 picamera in the hope that I can get around 20 to 30 FPS with an image size of around 224 to 320 converted to ncnn format. I’m finding that the FPS with a pretrained model or my own trained model fluctuate significantly between about 7 to 30 FPS in a fairly cyclical manner. It seems like there may be a bottleneck in the process.

Are you able suggest how I can get a more constant processing speed with the same resolution and target FPS?

1 Like

Hey @Richard83832,

It could help to monitor the temperature and clock speed while you’re testing. Try using vcgencmd measure_temp and vcgencmd measure_clock arm to check if the FPS drops line up with clock speed changes.

If you’re not already doing so, a heatsink and active cooling would be strongly recommended for this project.

It might also be worth timing each stage in your processing loop to pinpoint exactly where the slowdown is happening.

Thanks for those suggestions.

My Pi 5 temperature using vcgencmd measure_temp peaks at 58C. I do use the Pi 5 Active Cooler.

I checked the clock speed while running the YOLO code and it was a constant 2.4GHz.

It seems the code from your tutorial already prints out some timing data; see the extract below:

The inference time ranges from 34 to 188ms in this small sample, any idea why? The FPS in your tutorial was quite steady.

0: 320x320 (no detections), 70.8ms
Speed: 3.5ms preprocess, 70.8ms inference, 2.0ms postprocess per image at shape (1, 3, 320, 320)

0: 320x320 (no detections), 59.8ms
Speed: 5.1ms preprocess, 59.8ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 320)

0: 320x320 (no detections), 120.7ms
Speed: 3.4ms preprocess, 120.7ms inference, 1.0ms postprocess per image at shape (1, 3, 320, 320)

0: 320x320 (no detections), 33.7ms
Speed: 5.2ms preprocess, 33.7ms inference, 0.7ms postprocess per image at shape (1, 3, 320, 320)

0: 320x320 (no detections), 68.2ms
Speed: 3.9ms preprocess, 68.2ms inference, 1.0ms postprocess per image at shape (1, 3, 320, 320)

0: 320x320 (no detections), 188.1ms
Speed: 3.2ms preprocess, 188.1ms inference, 1.8ms postprocess per image at shape (1, 3, 320, 320)

Hey @Richard83832,

YOLO can just be really hit and miss like this. The footage we collected in our demonstration might of been a really lucky run as the inference times you posted are pretty typical in deviation. I wasn’t even aware the demo footage was smoother, will need to go check that out!

It might not be the best solution, but if you are chasing a somewhat consistent FPS, you could do something like this code (it is completely untested, but it shows how you might implement it). You set an FPS target (say 10 FPS), and if the frame is processed quicker than 100 ms (1 second / 10 FPS), it will wait. This will slow down your speed because you are telling it to wait, but it will give you a more consistent FPS if the timing is critical for your project. However, if it takes longer than 100 ms, it will just let it run, so it may sometimes drop below 10.

What is critical in your project to have more consistent FPS?

import cv2
import time
from picamera2 import Picamera2
from ultralytics import YOLO

# Set target FPS
TARGET_FPS = 10
TARGET_FRAME_TIME = 1.0 / TARGET_FPS  # Time per frame in seconds

# Set up the camera with Picam
picam2 = Picamera2()
picam2.preview_configuration.main.size = (1280, 1280)
picam2.preview_configuration.main.format = "RGB888"
picam2.preview_configuration.align()
picam2.configure("preview")
picam2.start()

# Load YOLOv8
model = YOLO("yolov8n.pt")

# Variables for FPS calculation
frame_count = 0
start_time = time.time()

while True:
    frame_start_time = time.time()
    
    # Capture a frame from the camera
    frame = picam2.capture_array()
    
    # Run YOLO model on the captured frame and store the results
    results = model(frame)
    
    # Output the visual detection data, we will draw this on our camera preview window
    annotated_frame = results[0].plot()
    
    # Calculate actual FPS based on complete loop time
    frame_count += 1
    elapsed_time = time.time() - start_time
    actual_fps = frame_count / elapsed_time
    
    # Get inference time for reference
    inference_time = results[0].speed['inference']
    
    # Display both actual FPS and inference time
    text_fps = f'Actual FPS: {actual_fps:.1f} | Target: {TARGET_FPS}'
    text_inference = f'Inference: {inference_time:.1f}ms'

    # Define font and position for FPS text
    font = cv2.FONT_HERSHEY_SIMPLEX
    text_size_fps = cv2.getTextSize(text_fps, font, 0.7, 2)[0]
    text_size_inf = cv2.getTextSize(text_inference, font, 0.7, 2)[0]
    
    text_x_fps = annotated_frame.shape[1] - text_size_fps[0] - 10
    text_y_fps = text_size_fps[1] + 10
    
    text_x_inf = annotated_frame.shape[1] - text_size_inf[0] - 10  
    text_y_inf = text_size_fps[1] + text_size_inf[1] + 20

    # Draw the text on the annotated frame
    cv2.putText(annotated_frame, text_fps, (text_x_fps, text_y_fps), font, 0.7, (255, 255, 255), 2, cv2.LINE_AA)
    cv2.putText(annotated_frame, text_inference, (text_x_inf, text_y_inf), font, 0.7, (255, 255, 255), 2, cv2.LINE_AA)

    # Display the resulting frame
    cv2.imshow("Camera", annotated_frame)
    
    # Calculate frame processing time
    frame_processing_time = time.time() - frame_start_time
    
    # Apply FPS limiting
    if frame_processing_time < TARGET_FRAME_TIME:
        sleep_time = TARGET_FRAME_TIME - frame_processing_time
        time.sleep(sleep_time)
    
    # Reset frame counter every 30 frames for more responsive FPS reading
    if frame_count >= 30:
        frame_count = 0
        start_time = time.time()

    # Exit the program if q is pressed
    if cv2.waitKey(1) == ord("q"):
        break

# Close all windows
cv2.destroyAllWindows()
1 Like

Hi, I got this working, but how can I make it run automatically when the pi starts?

1 Like

Hey there, @Kevin310522, and welcome to the forum. Glad to have you here.

There’s actually quite a few ways to do that on Linux systems.

Probably the best way would be to use crontab.

There’s a very handy little tutorial on how to do so here.

It’s pretty straight forward, but feel free to reach out if you have any troubles.

1 Like

Thank you for the response. I have been unsuccessful at getting this to run on startup, despite many hours of trying. When I use Crontab method, I get the following error logged:

[0:00:12.324458695] [858] e[1;32m INFO e[1;37mCamera e[1;34mcamera_manager.cpp:340 e[0mlibcamera v0.6.0+rpt20251202
[0:00:12.332993862] [1532] e[1;32m INFO e[1;37mRPI e[1;34mpisp.cpp:720 e[0mlibpisp version 1.3.0
[0:00:12.335371232] [1532] e[1;32m INFO e[1;37mIPAProxy e[1;34mipa_proxy.cpp:180 e[0mUsing tuning file /usr/share/libcamera/ipa/rpi/pisp/imx477.json
[0:00:12.342287639] [1532] e[1;32m INFO e[1;37mCamera e[1;34mcamera_manager.cpp:223 e[0mAdding camera ā€˜/base/axi/pcie@1000120000/rp1/i2c@88000/imx477@1a’ for pipeline handler rpi/pisp
[0:00:12.342312899] [1532] e[1;32m INFO e[1;37mRPI e[1;34mpisp.cpp:1181 e[0mRegistered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx477@1a to CFE device /dev/media3 and ISP device /dev/media0 using PiSP variant BCM2712_D0
[0:00:12.345069787] [858] e[1;32m INFO e[1;37mCamera e[1;34mcamera.cpp:1215 e[0mconfiguring streams: (0) 1280x1280-RGB888/sRGB (1) 2028x1520-BGGR_PISP_COMP1/RAW
[0:00:12.345161695] [1532] e[1;32m INFO e[1;37mRPI e[1;34mpisp.cpp:1485 e[0mSensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx477@1a - Selected sensor format: 2028x1520-SBGGR12_1X12/RAW - Selected CFE format: 2028x1520-PC1B/RAW

0: 160x160 (no detections), 186.0ms
Speed: 14.5ms preprocess, 186.0ms inference, 20.0ms postprocess per image at shape (1, 3, 160, 160)
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin ā€œxcbā€ in ā€œ/home/kevinbright/yolo_object/lib/python3.13/site-packages/cv2/qt/pluginsā€ even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

Aborted

I’m not sure what that means. It runs fine when I run it either from Thonny, as shown in the tutorial, or using the terminal window using:
python3 /home/kevinbright/Desktop/yolo/yolo2.py

And then:

python3 /home/kevinbright/Desktop/yolo/yolo2.py

Any ideas?

Hey Kevin,

One of two things immediately springs to mind:

  1. If you were following the instructions from the guide, you would have created a virtual environment that you would be running in Thonny. If you haven’t launched your script in the Virtual Environment you used in Thonny, it would not load. If you can find the venv you used in thonny and the virtual environment you used it with, you could add the below code after the crontab reboot:
@reboot source <path_to_venv>/bin/activate && <path_to_pythonscript>/<yourscript>.py

The && chains the commands together.

  1. The other possibility is that the script is launching before all components, such as the camera, are initialised by the code, thus it is failing. If that is the case, we want to add a sleep command to the crontab.
@reboot sleep 60 && <path_to_pythonscript>/<yourscript>.py

Try one then the other and see if either resolves the issue.