限制谷歌视野中的检测区域,文本识别

我一整天都在寻找解决方案。我已经检查了几个关于我的问题的线程。

但它对我并没有帮助很多。基本上,我希望相机预览是全屏的,但文本只能在屏幕中心识别,在那里绘制一个矩形。

我正在使用的技术:

  • 谷歌移动视觉API用于光学字符识别(OCR)
  • 依赖性:play-services-vision

我的当前状态:我创建了一个 BoxDetector 类:

public class BoxDetector extends Detector {
    private Detector mDelegate;
    private int mBoxWidth, mBoxHeight;

    public BoxDetector(Detector delegate, int boxWidth, int boxHeight) {
        mDelegate = delegate;
        mBoxWidth = boxWidth;
        mBoxHeight = boxHeight;
    }

    public SparseArray detect(Frame frame) {
        int width = frame.getMetadata().getWidth();
        int height = frame.getMetadata().getHeight();
        int right = (width / 2) + (mBoxHeight / 2);
        int left = (width / 2) - (mBoxHeight / 2);
        int bottom = (height / 2) + (mBoxWidth / 2);
        int top = (height / 2) - (mBoxWidth / 2);

        YuvImage yuvImage = new YuvImage(frame.getGrayscaleImageData().array(), ImageFormat.NV21, width, height, null);
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        yuvImage.compressToJpeg(new Rect(left, top, right, bottom), 100, byteArrayOutputStream);
        byte[] jpegArray = byteArrayOutputStream.toByteArray();
        Bitmap bitmap = BitmapFactory.decodeByteArray(jpegArray, 0, jpegArray.length);

        Frame croppedFrame =
                new Frame.Builder()
                        .setBitmap(bitmap)
                        .setRotation(frame.getMetadata().getRotation())
                        .build();

        return mDelegate.detect(croppedFrame);
    }

    public boolean isOperational() {
        return mDelegate.isOperational();
    }

    public boolean setFocus(int id) {
        return mDelegate.setFocus(id);
    }

    @Override
    public void receiveFrame(Frame frame) {
        mDelegate.receiveFrame(frame);
    }
}

并在此处实现了此类的实例:

final TextRecognizer textRecognizer = new TextRecognizer.Builder(App.getContext()).build();

// Instantiate the created box detector in order to limit the Text Detector scan area
BoxDetector boxDetector = new BoxDetector(textRecognizer, width, height);

//Set the TextRecognizer's Processor but using the box collider

boxDetector.setProcessor(new Detector.Processor<TextBlock>() {
    @Override
    public void release() {
    }

    /*
        Detect all the text from camera using TextBlock
        and the values into a stringBuilder which will then be set to the textView.
    */
    @Override
    public void receiveDetections(Detector.Detections<TextBlock> detections) {
        final SparseArray<TextBlock> items = detections.getDetectedItems();
        if (items.size() != 0) {

            mTextView.post(new Runnable() {
                @Override
                public void run() {
                    StringBuilder stringBuilder = new StringBuilder();
                    for (int i = 0; i < items.size(); i++) {
                        TextBlock item = items.valueAt(i);
                        stringBuilder.append(item.getValue());
                        stringBuilder.append("\n");
                    }
                    mTextView.setText(stringBuilder.toString());
                }
            });
        }
    }
});


    mCameraSource = new CameraSource.Builder(App.getContext(), boxDetector)
            .setFacing(CameraSource.CAMERA_FACING_BACK)
            .setRequestedPreviewSize(height, width)
            .setAutoFocusEnabled(true)
            .setRequestedFps(15.0f)
            .build();

执行时,将引发此异常:

Exception thrown from receiver.
java.lang.IllegalStateException: Detector processor must first be set with setProcessor in order to receive detection results.
    at com.google.android.gms.vision.Detector.receiveFrame(com.google.android.gms:play-services-vision-common@@19.0.0:17)
    at com.spectures.shopendings.Helpers.BoxDetector.receiveFrame(BoxDetector.java:62)
    at com.google.android.gms.vision.CameraSource$zzb.run(com.google.android.gms:play-services-vision-common@@19.0.0:47)
    at java.lang.Thread.run(Thread.java:919)

如果有人有线索,我的错是什么,或者有任何替代方案,我将不胜感激。谢谢!

这就是我想要实现的目标,一个矩形文本区域扫描仪:

What I want to achieve


答案 1

谷歌视觉检测有输入的是一帧。框架是图像数据,包含宽度和高度作为关联数据。您可以在将其传递到检测器之前处理此帧(将其剪切到较小的居中帧)。这个过程必须快速,并沿着相机处理图像。看看下面的Github,搜索FrameProcessingRunnable。你可以看到那里的帧输入。您可以在那里自己完成该过程。

相机源


答案 2

您可以尝试预先解析@'Thành Hà Văn'提到的CameraSource源(我自己先尝试过,但在尝试调整新旧相机API后被丢弃),但我发现限制搜索区域并使用默认视觉检测和CameraSource返回的检测更容易。您可以通过多种方式执行此操作。例如

(1)通过根据屏幕/预览大小
设置边界来限制屏幕区域 (2)创建可用于动态设置检测区域的自定义类

我选择了选项 2(如果需要,我可以发布我的自定义类),然后在检测区域中,我筛选它以仅在指定区域内进行检测:

                for (j in 0 until detections.size()) {
                    val textBlock = detections.valueAt(j) as TextBlock
                    for (line in textBlock.components) {                        
                        if((line.boundingBox.top.toFloat()*hScale) >= scanView.top.toFloat() && (line.boundingBox.bottom.toFloat()*hScale) <= scanView.bottom.toFloat()) {
                            canvas.drawRect(line.boundingBox, linePainter)
                            
                            if(scanning)
                                if (((line.boundingBox.top.toFloat() * hScale) <= yTouch && (line.boundingBox.bottom.toFloat() * hScale) >= yTouch) &&
                                    ((line.boundingBox.left.toFloat() * wScale) <= xTouch && (line.boundingBox.right.toFloat() * wScale) >= xTouch) ) {                                    
                                    acceptDetection(line, scanCount)
                                }
                        }
                    }
                }

扫描部分只是我用来允许用户选择要保留的检测的一些自定义代码。您可以将 if(line....) 循环中的所有内容替换为自定义代码,以便仅对裁剪的检测区域执行操作。请注意,此示例代码仅垂直裁剪,但您也可以水平放置,也可以沿两个方向放置。


推荐