Engine 3 returns generated-looking text for images with no visible text

Hi OCR.space team,

We are testing OCR.space PRO with OCREngine=3 for automated document processing.

We found a reproducible case where Engine 3 returns text even though the submitted image has no visible text. The returned text looks generated/instruction-like rather than extracted from the image:

Watermarks should be wrapped in brackets. Ex: OFFICIAL COPY. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ✅ for check boxes.

The API reports this as a successful OCR result:

{
  "OCRExitCode": 1,
  "IsErroredOnProcessing": false,
  "FileParseExitCode": 1,
  "ErrorMessage": "",
  "ParsedText": "Watermarks should be wrapped in brackets..."
}

Request fields used:

endpoint: https://apipro1.ocr.space/parse/image
language=auto
OCREngine=3
scale=true
isTable=true
detectOrientation=true

Changing these did not stop the output:

scale=false
isTable=false
detectOrientation=false
language=eng

Images that reproduce the issue:

original.png
https://i.imgur.com/dSQGqpm.png

bottom-half.png
https://i.imgur.com/4kPrtBJ.png

bottom-30.png
https://i.imgur.com/LOSYvUV.png

bottom-half-gray.jpg
https://i.imgur.com/MLxf6xS.jpeg

bottom-half-blur.png
https://i.imgur.com/WjVlLiW.png

bottom-half-flop.png
https://i.imgur.com/uc6AgOG.png

bottom-half-resized-50.png
https://i.imgur.com/cb2ZAoS.png

original-stripped.png
https://i.imgur.com/BqVqsSj.png

Some crops did not return text and instead timed out:

left-half.png
https://i.imgur.com/HCIIVpn.png

right-half.png
https://i.imgur.com/xv01koI.png

top-half.png
https://i.imgur.com/k9MnT4q.png

center-dog.png
https://i.imgur.com/pTdL01N.png

blank-white.png
https://i.imgur.com/OVJhDB4.png

Engine 2 returned empty text for the original image, which is what we expected.

For our use case, the problem is that Engine 3 returns this with a normal success response, so our pipeline cannot tell whether the text is actually present in the submitted image.

What configuration, preprocessing, or response fields do you recommend using so that automated document processing can avoid accepting this kind of false-positive OCR output as real extracted text?

Thanks!

Issue confirmed. Please see our answer on LLM OCR hallucinations. We are working on it. It is important to note that Engines 1 and 2 do not have this issue.

Thank you for the reply!

Yes, we’ve confirmed that Engines 1 and 2 do not show this issue. Unfortunately, for our document set, their OCR output quality is also significantly worse than Engine 3.

When this is resolved or there’s a recommended mitigation/configuration for Engine 3, please reply here so we can follow up.

Thanks a lot!