Page 1 of 1

mlapi image resizing curiosity

Posted: Fri Jul 31, 2020 10:29 pm
by ibrewster
Sorry if this is the wrong place, but I can't find any other support info for mlapi/zmeventserver questions.

At any rate, in the latest version of mlapi (and, presumably, zmeventserver), I noticed that in the config it notes that the image will be internally resized to 416x416. This made me wonder about a couple of things:

1) Since that is square, but my input images are not, is the input image distorted on this resize? If so, wouldn't that affect recognition accuracy? Or is it simply resized to be no more than 416x416, while keeping the aspect ratio?

2) Since the image is going to be resized internally, would it be better to enable the resize_image option in the objectconfig.ini? Or is said option no longer relevant since it is resizing internally?

3) Why so small? Does image recognition work better on small images?

Re: mlapi image resizing curiosity

Posted: Sat Aug 01, 2020 3:27 pm
by asker
The reason for 416x416 is that the default Yolo weights are actually trained for 416x416 sized images. I do believe there are weights available that are also trained in 608x608 somewhere.

Note that training much larger images takes a huge amount of time, and as I understand, at some point, gives diminishing returns. A lot of weights are actually trained even smaller at 256x256. Don't think of these as purely visual (i.e. bigger the better).

Yolo maintains aspect ratio - the default weights use 416x416, so if the resized image doesn't fit, it will pad it.

There's a lot of black magic (to me) in these models and how they learn/get accuracy. I understand some of it, most of it I don't and this is an area of continuous research. If you want to dive into details, you're going to have to read the research papers and understand the math. I don't dive in that deep. There are also many online discussions on stackoverflow/darknet/tensorflow repos you can follow.

This brings me to: I really don't think the resize option in object config is useful, since darknet already resizes it to 416x416 no matter what we resize it to.