Sentiment classification using qminer. Given a set of tweets and their sentiment labels (1 for positive and -1 for negative) we will train a machine learning model that predicts the sentiment on new texts (uses Support Vector Machines as the machine learning model and tfidf-weighted bag of words feature extraction for text processing).
Import libraries: the main library qminer and a helper package for the example dataset.
var qm = require('qminer'); var loader = require('qminer-data-loader'); 'loaded'
Define the storage schema. We define one store called 'tweets' where each record has two fields: text and target (+1 is for positive sentiment and -1 for negative sentiment).
var base = new qm.Base({ mode: 'createClean', schema: [{ name: 'tweets', fields: [{ name: 'text', type: 'string' }, { name: 'target', type: 'float' }] }] }); 'defined schema'
Import data and select all records.
loader.loadSentimentDataset(base.store('tweets')); var tweets = base.store('tweets').allRecords; 'got ' + tweets.length + ' tweets';
Let's print the first tweet in the training set and its sentiment.
tweets[0].text;
tweets[0].target > 0 ? 'positive' : 'negative';
Build feature space (mapping from records to linear algebra vectors). Here we use a simple tfidf weighted bag-of-words feature extractor.
var featureSpace = new qm.FeatureSpace(base, { type: 'text', source: 'tweets', field: 'text'}); featureSpace.updateRecords(tweets); 'built feature space with ' + featureSpace.dim + ' dimensions'
Build a sentiment classifier model. We use a Support Vector Classifier with default parameters (C = 1).
var SVC = new qm.analytics.SVC({ maxTime: 5 }); SVC.fit(featureSpace.extractSparseMatrix(tweets), tweets.getVector('target')); 'trained model'
Predict sentiment of a new example. What is the sentiment of the sentence "Cats are stupid" ?
var y1 = SVC.predict(featureSpace.extractSparseVector({ text: "Cats are stupid." })); 'predicted sentiment: ' + (y1 > 0 ? 'positive' : 'negative');
What about the sentence "Cats are totally amazing!" ?
var y2 = SVC.predict(featureSpace.extractSparseVector({ text: "Cats are totally amazing!" })); 'predicted sentiment: ' + (y2 > 0 ? 'positive' : 'negative');

no comments

    sign in to comment

    running