random forest – Training and Test Set in Weka InCompatible in Text Classification

I have two datasets regarding whether a sentence contains a mention of a drug adverse event or not, both the training and test set have only two fields the text and the labels{Adverse Event, No Adverse Event} I have used weka with the stringtoWordVector filter to build a model using Random Forest on the training set.

I want to test the model built with removing the class labels from the test data set, applying the StringToWordVector filter on it and testing the model with it. When I try to do that it gives me the error saying training and test set not compatible probably because the filter identifies a different set of attributes for the test dataset. How do I fix this and output the predictions for the test set.

لینک منبع

classification – Stemming words in email addresses without putting them into arrays

I was trying to re-implement the spam classifier in the Stanford ML course by Andrew Ng and i used ‘PorterStemmer()’ but the emails after stemming is in a list form

ps = PorterStemmer()

for i in range(len(just_emails)):
    words = word_tokenize(just_emails[i])
    just_emails[i] = [ps.stem(w) for w in words]

but what i need to do is stem the words in the email and keep the emails in their original form

Actual Output: [go, until, jurong, point, crazi, avail, onli,…]

Desired Output : go until jurong point crazi avail onli

لینک منبع