Threshold-Based Classification to Enhance Confidence in Open Set of Legal Texts.

Daniela L. Freire, Alex M. G. de Almeida, Márcio de Souza Dias,Adriano Rivolli,Fabíola S. F. Pereira, Giliard Almeida de Godoi,André C. P. L. F. de Carvalho

Intelligent Data Engineering and Automated Learning – IDEAL 2023: 24th International Conference, Évora, Portugal, November 22–24, 2023, Proceedings(2023)

引用 0|浏览0
暂无评分
摘要
Machine Learning has revolutionized the categorization of vast legal documents, minimizing costs and improving evaluations. However, conventional models struggle with unseen data categories in real-world scenarios, a challenge termed Open Set Classification. Our study tackles the issue faced by the Court of Justice in São Paulo, Brazil, to identify recurring lawsuit themes from texts, as manual sorting is inefficient. We introduce a method to enhance confidence in text classification using an open dataset by converting multiclass challenges into binary ones with four confidence tiers. By testing various techniques, we found that combining doc2vec with the Support Vector Machine classifier delivers trustworthy results and robust performance. Ultimately, our method offers an effective solution for classifying legal texts confronting Open Set Classification issues in the legal sector.
更多
查看译文
关键词
texts,classification,threshold-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要