Skip to content

text security audit 安全审核-语义模型过滤 敏感内容检测系统

Notifications You must be signed in to change notification settings

jiangnanboy/text_security_audit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text security audit 安全审核-语义模型过滤 敏感内容检测系统

本项目收集大量的涉及政治、色情、辱骂、暴力违禁等敏感数据,据此微调roberta模型,根据模型对文本内容进行分类打分,进行检测审核;

针对业务场景下个性化的数据和需求,可自由定制审核模型的阈值参数,合适的策略配置将有效提升内容审核的召回率和精确率。

This project collects a large number of sensitive data involving politics, pornography, abuse, violence and contraband, and fine-tunes roberta model according to which the text content is classified and scored for detection and verification.

You can customize the threshold parameters of the audit model according to the personalized data and requirements in business scenarios. Proper policy configuration can effectively improve the recall rate and accuracy rate of content audit.


将roberta类模型转为onnx格式,利用java进行推理。

The roberta model is converted to onnx format, and java is used for inference.

模型下载:

链接: https://pan.baidu.com/s/1bksb12LOUV3dhJd0Wk4ZAw 提取码: 2dkf

将下载后的model放在resources下。Place the downloaded model under resources.

usage

【text_classification/TextClassify】

        var textTokenizer = new TextTokenizer("resources\\vocab.txt");
        var politicModel = new PoliticModel("resources\\roberta_wwm_politic_model.onnx");
        var violenceModel = new ViolenceModel("resources\\roberta_wwm_violence_model.onnx");
        var pornModel = new PornModel("resources\\roberta_wwm_porn_model.onnx");
        var insultModel = new InsultModel("resources\\roberta_wwm_insult_model.onnx");


        var text = "黑人很多都好吃懒做,偷奸耍滑!";

        var onnxTensorMap = textTokenizer.parseInputText(text, politicModel.env);

        // politic detection
        var pairResult = politicModel.pred(onnxTensorMap);
        System.out.println(pairResult);

        // violence detection
        pairResult = violenceModel.pred(onnxTensorMap);
        System.out.println(pairResult);

        // porn detection
        pairResult = pornModel.pred(onnxTensorMap);
        System.out.println(pairResult);

        // insult detection
        pairResult = insultModel.pred(onnxTensorMap);
        System.out.println(pairResult);

        /**
         * label=0 -> No; label=1 -> Yes
         *
         * politic:
         * (1,0.77812237)
         *
         * violence:
         * (1,0.7366322)
         *
         * porn:
         * (0,0.65102273)
         *
         * insult:
         * (0,0.6051175)
         */

requirement

java11+

onnxruntime1.11.0

contact

reference

About

text security audit 安全审核-语义模型过滤 敏感内容检测系统

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages