Effective Tokenization

node v6.17.1
version: 2.0.0
Effective tokenizing is all about handling sentences that contain features beyond words & punctuations such as e-mails, mentions, emojis, emoticons, hashtags, urls and more! Here is an example:
// Load wink tokenizer. var winkTokenize = require( 'wink-tokenizer' ); // Instantiate and obtain tokenize api. var tokenize = winkTokenize().tokenize; // Notice apart from tokenization, how the feature of every token is identified & tagged. var s = '@FeminismInIndiašŸ‘§ conducted a workshop at #AmbedkarUniversity on "online safety" \ recently:-)! reach us at info@feminisminindia.com. #DigitalSaftey http://bit.ly/2F2m9rL'; console.log( tokenize( s ) );

no comments

    sign in to comment