博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
lucene IK分词器 同义词
阅读量:6610 次
发布时间:2019-06-24

本文共 2485 字,大约阅读时间需要 8 分钟。

hot3.png

public class IKSynonymsAnalyzer5x extends Analyzer {    @Override    protected TokenStreamComponents createComponents(String fieldName) {        IKTokenizer5x tokenizer5x = new IKTokenizer5x(true);        Map paramsMap=new HashMap();        paramsMap.put("luceneMatchVersion", "LUCENE_11");        paramsMap.put("synonyms", "luceneIndexCreate/synonyms.txt");        paramsMap.put("expand", "true");        SynonymFilterFactory factory=new SynonymFilterFactory(paramsMap);        ClasspathResourceLoader loader = new ClasspathResourceLoader();        try {            factory.inform(loader);        } catch (IOException e) {            e.printStackTrace();        }        return new TokenStreamComponents(tokenizer5x, factory.create(tokenizer5x));    }}
public class IKTokenizer5x extends Tokenizer {    private IKSegmenter _IKImplement;    private final CharTermAttribute termAtt = (CharTermAttribute)this.addAttribute(CharTermAttribute.class);    private final OffsetAttribute offsetAtt = (OffsetAttribute)this.addAttribute(OffsetAttribute.class);    private final TypeAttribute typeAtt = (TypeAttribute)this.addAttribute(TypeAttribute.class);    private int endPosition;    public IKTokenizer5x() {        this._IKImplement = new IKSegmenter(this.input, true);    }    public IKTokenizer5x(boolean useSmart) {        this._IKImplement = new IKSegmenter(this.input, useSmart);    }    public IKTokenizer5x(AttributeFactory factory) {        super(factory);        this._IKImplement = new IKSegmenter(this.input, true);    }    public boolean incrementToken() throws IOException {        this.clearAttributes();        Lexeme nextLexeme = this._IKImplement.next();        if(nextLexeme != null) {            this.termAtt.append(nextLexeme.getLexemeText());            this.termAtt.setLength(nextLexeme.getLength());            this.offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());            this.endPosition = nextLexeme.getEndPosition();            this.typeAtt.setType(nextLexeme.getLexemeTypeString());            return true;        } else {            return false;        }    }    public void reset() throws IOException {        super.reset();        this._IKImplement.reset(this.input);    }    public final void end() {        int finalOffset = this.correctOffset(this.endPosition);        this.offsetAtt.setOffset(finalOffset, finalOffset);    }}

luceneIndexCreate/synonyms.txt

女鞋,女靴靴子,长靴,短靴

转载于:https://my.oschina.net/zhuqianli/blog/1589433

你可能感兴趣的文章
atitti.atiNav 手机导航组件的设计
查看>>
Ubuntu+Apache+PHP+Mysql环境搭建(完整版)
查看>>
Atitit.计算机图形图像图片处理原理与概论attilax总结
查看>>
于ssh端口转发的深入实例[转 - 当当 - 51CTO技术博客
查看>>
从Python安装到语法基础,这才是初学者都能懂的爬虫教程 ...
查看>>
超级AD远程管理软件
查看>>
Oracle数据库安全加固记录
查看>>
安全运维之:Linux系统账户和登录安全
查看>>
【cocos2d-x从c++到js】17:使用FireFox进行JS远程调试
查看>>
Kafka Offset Storage
查看>>
深度学习笔记之CNN(卷积神经网络)基础
查看>>
JAVA设计模式之【原型模式】
查看>>
Hadoop 添加删除数据节点(datanode)
查看>>
33.8. slb configuration
查看>>
ext的window如何隐藏水平滚动条
查看>>
71.8. Run level shell script to start Oracle 10g services on RedHat Enterprise Linux (RHAS 4)
查看>>
SAP QM Transfer of Inspection Stock
查看>>
全新视觉| 数治省市:SAP大数据构想一切可能
查看>>
ORACLE expdp备份与ORA-31693、ORA-02354、ORA-02149
查看>>
DBMS_STATS.GATHER_TABLE_STATS
查看>>