限制谷歌视野中的检测区域，文本识别

android java android-camera google-vision

2022-09-03 16:18:58

我一整天都在寻找解决方案。我已经检查了几个关于我的问题的线程。

但它对我并没有帮助很多。基本上，我希望相机预览是全屏的，但文本只能在屏幕中心识别，在那里绘制一个矩形。

我正在使用的技术：

谷歌移动视觉API用于光学字符识别（OCR）
依赖性：play-services-vision

我的当前状态：我创建了一个 BoxDetector 类：

public class BoxDetector extends Detector {
    private Detector mDelegate;
    private int mBoxWidth, mBoxHeight;

    public BoxDetector(Detector delegate, int boxWidth, int boxHeight) {
        mDelegate = delegate;
        mBoxWidth = boxWidth;
        mBoxHeight = boxHeight;
    }

    public SparseArray detect(Frame frame) {
        int width = frame.getMetadata().getWidth();
        int height = frame.getMetadata().getHeight();
        int right = (width / 2) + (mBoxHeight / 2);
        int left = (width / 2) - (mBoxHeight / 2);
        int bottom = (height / 2) + (mBoxWidth / 2);
        int top = (height / 2) - (mBoxWidth / 2);

        YuvImage yuvImage = new YuvImage(frame.getGrayscaleImageData().array(), ImageFormat.NV21, width, height, null);
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        yuvImage.compressToJpeg(new Rect(left, top, right, bottom), 100, byteArrayOutputStream);
        byte[] jpegArray = byteArrayOutputStream.toByteArray();
        Bitmap bitmap = BitmapFactory.decodeByteArray(jpegArray, 0, jpegArray.length);

        Frame croppedFrame =
                new Frame.Builder()
                        .setBitmap(bitmap)
                        .setRotation(frame.getMetadata().getRotation())
                        .build();

        return mDelegate.detect(croppedFrame);
    }

    public boolean isOperational() {
        return mDelegate.isOperational();
    }

    public boolean setFocus(int id) {
        return mDelegate.setFocus(id);
    }

    @Override
    public void receiveFrame(Frame frame) {
        mDelegate.receiveFrame(frame);
    }
}

并在此处实现了此类的实例：

final TextRecognizer textRecognizer = new TextRecognizer.Builder（App.getContext（））.build（）;

// Instantiate the created box detector in order to limit the Text Detector scan area
BoxDetector boxDetector = new BoxDetector(textRecognizer, width, height);

//Set the TextRecognizer's Processor but using the box collider

boxDetector.setProcessor(new Detector.Processor<TextBlock>() {
    @Override
    public void release() {
    }

    /*
        Detect all the text from camera using TextBlock
        and the values into a stringBuilder which will then be set to the textView.
    */
    @Override
    public void receiveDetections(Detector.Detections<TextBlock> detections) {
        final SparseArray<TextBlock> items = detections.getDetectedItems();
        if (items.size() != 0) {

            mTextView.post(new Runnable() {
                @Override
                public void run() {
                    StringBuilder stringBuilder = new StringBuilder();
                    for (int i = 0; i < items.size(); i++) {
                        TextBlock item = items.valueAt(i);
                        stringBuilder.append(item.getValue());
                        stringBuilder.append("\n");
                    }
                    mTextView.setText(stringBuilder.toString());
                }
            });
        }
    }
});


    mCameraSource = new CameraSource.Builder(App.getContext(), boxDetector)
            .setFacing(CameraSource.CAMERA_FACING_BACK)
            .setRequestedPreviewSize(height, width)
            .setAutoFocusEnabled(true)
            .setRequestedFps(15.0f)
            .build();

执行时，将引发此异常：

Exception thrown from receiver.
java.lang.IllegalStateException: Detector processor must first be set with setProcessor in order to receive detection results.
    at com.google.android.gms.vision.Detector.receiveFrame(com.google.android.gms:play-services-vision-common@@19.0.0:17)
    at com.spectures.shopendings.Helpers.BoxDetector.receiveFrame(BoxDetector.java:62)
    at com.google.android.gms.vision.CameraSource$zzb.run(com.google.android.gms:play-services-vision-common@@19.0.0:47)
    at java.lang.Thread.run(Thread.java:919)

如果有人有线索，我的错是什么，或者有任何替代方案，我将不胜感激。谢谢！

这就是我想要实现的目标，一个矩形文本区域扫描仪：

答案 1

谷歌视觉检测有输入的是一帧。框架是图像数据，包含宽度和高度作为关联数据。您可以在将其传递到检测器之前处理此帧（将其剪切到较小的居中帧）。这个过程必须快速，并沿着相机处理图像。看看下面的Github，搜索FrameProcessingRunnable。你可以看到那里的帧输入。您可以在那里自己完成该过程。

相机源

答案 2

您可以尝试预先解析@'Thành Hà Văn'提到的CameraSource源（我自己先尝试过，但在尝试调整新旧相机API后被丢弃），但我发现限制搜索区域并使用默认视觉检测和CameraSource返回的检测更容易。您可以通过多种方式执行此操作。例如

（1）通过根据屏幕/预览大小
设置边界来限制屏幕区域（2）创建可用于动态设置检测区域的自定义类

我选择了选项 2（如果需要，我可以发布我的自定义类），然后在检测区域中，我筛选它以仅在指定区域内进行检测：

                for (j in 0 until detections.size()) {
                    val textBlock = detections.valueAt(j) as TextBlock
                    for (line in textBlock.components) {                        
                        if((line.boundingBox.top.toFloat()*hScale) >= scanView.top.toFloat() && (line.boundingBox.bottom.toFloat()*hScale) <= scanView.bottom.toFloat()) {
                            canvas.drawRect(line.boundingBox, linePainter)
                            
                            if(scanning)
                                if (((line.boundingBox.top.toFloat() * hScale) <= yTouch && (line.boundingBox.bottom.toFloat() * hScale) >= yTouch) &&
                                    ((line.boundingBox.left.toFloat() * wScale) <= xTouch && (line.boundingBox.right.toFloat() * wScale) >= xTouch) ) {                                    
                                    acceptDetection(line, scanCount)
                                }
                        }
                    }
                }

扫描部分只是我用来允许用户选择要保留的检测的一些自定义代码。您可以将 if（line....）循环中的所有内容替换为自定义代码，以便仅对裁剪的检测区域执行操作。请注意，此示例代码仅垂直裁剪，但您也可以水平放置，也可以沿两个方向放置。