Modular systems for annotation of news articles 4

This study was part of the CompLACS project and the goal was to develop a platform to test algorithms and concepts, and to inspire new algorithms and concepts, within the domain of composing learning systems. One of the four scenarios identified within this goal was “Annotation of News Articles”. This scenario is centered around building modules for the ‘Articles’ blackboard (collection of news articles) that annotate items with descriptors such as tags and fields, that can be used by other modules.

macsy_modules

Caption: This figure is a general example of how modules work in the system. For example module ‘a’ obtains articles from the Article blackboard and extracts features such as a tf-idf vector of words, while module ‘b’ takes this feature as input and trains a classifier or a preference ranker and adds a topic/appeal tag back to the article in the Article blackboard. Both modules can be adaptive.

In our platform we have developed several online classifiers to  learn news topics such as Sports or Politics etc. Modules were also developed to learn readers’ preferences [1] that is to classify if an article was popular or non-popular.

The annotation of articles may involve the deployment of some related classifications tasks. The question is whether it is possible to take advantage of the relations between the classification tasks to improve performance. For example, we are interested in learning what makes an article popular. We could combine the different models to answer the problem. The question is how we can take advantage of the multiple parallel and related learning tasks in order to propose a better solution for the problem.

The objective of this task is to solve simultaneously multiple supervised learning problems by taking advantage of the relations between the learning tasks. The figure below explains how this module works.

complacs

Caption: Module ‘a’ gets articles from Articles blackboard and extracts features. There are many modules such as ‘b*’ which is shown in double circles. They get these features as input, trains and tests an online classifier or a preference ranker and outputs the 2nd layer features which are added to each article in the Articles blackboard. A third module ‘c’ takes the 2nd layer features, trains and tests an online preference ranker and produces an appeal score for each article which is added to the articles in the Articles blackboard.

We have trained a popularity ranker on years of data which takes as input a range of features as shown in Figure below for some news outlets such as BBC and NPR. Feature extraction is done using a dedicated online classifier/ranker per feature so to enable adaptive feature mapping. This shows modularity in the system where features are obtained from a set of topic classifiers and rankers which are independently running modules themselves.

The input features for a document consist of the following: Readability; length of the document; topic scores assigned by online topic classifier module for education; entertainment, technology, politics and sports; sentiment scores assigned by the sentiment module such as anger, joy, sadness and fear. We also add the appeal scores of other outlets in order to predict appeal for an outlet. For example in order to predict BBC appeal we use the appeal scores for NPR and Seattle Times assigned by the online rankers. We have used the scores from the online ranker based on the clasher algorithm [2] for this purpose. For this two layer representation we have used the online lasso (regularisation parameter is 0.1) algorithm.

online lasso - seattle online lasso - npr online lasso - BBC

 

 

 

Caption: Train and test error for the two layer representation using online lasso algorithm for BBC, Seattle Times and NPR news outlets (trained on a period of 3 years starting from 2011 to 2014)

The figures above show the train error and test error for the two layer representation using the online lasso
for BBC and other outlets. We achieve 40% error rate for BBC, 38% and below 45% for Seattle Times and NPR. Though the signal is weak here still it is significantly better than random. The features ranked according to their weights are shown in the Table below for BBC. The topics corresponding to highest weights are Politics, Education and Entertainment which means BBC readers like these topics the most.

features_BBC

Caption: Features ranked according to the weights from Online Lasso for BBC.

This was a study of combining modular adaptive modules into a single learning system where an intermediate representation of the data was learnt (hidden layer) by supervised online learning based on web streams. This was then used as an input to train another system, realising a combined architecture. Whereas the signal was not very strong, it was important to observe how the performance was affected by the noise in the system.

References

[1] Elena Hensinger, Ilias Flaounas, and Nello Cristianini (2013) Modelling and Predicting News Popularity. Pattern Analysis and Applications, 16(4): pp. 623–635.

[2] Ricardo Nanculef, Ilias Flaounas, and Nello ̃Cristianini (2014) Efficient Classification of Multi-labelled Text Streams by Clashing. Expert Systems with Applications, 41(11)

4 thoughts on “Modular systems for annotation of news articles

  1. Reply Colon Plus May 29,2015 7:30 pm

    I have read so many articles concerning the
    blogger lovers but this post is genuinely a fastidious paragraph,
    keep it up.

  2. Reply Garcinia Xtreme Diet May 30,2015 7:32 am

    Excellent post. I was checking constantly this blog and I’m impressed!
    Extremely helpful info specifically the last part 🙂
    I care for such information a lot. I was seeking this certain info for a long time.
    Thank you and good luck.

  3. Reply Xtreme Muscle Gain May 31,2015 7:57 pm

    Oh my goodness! Impressive article dude! Thank you so much, However I am going through issues with your RSS.
    I don’t understand why I cannot join it.
    Is there anybody else getting similar RSS issues? Anyone who
    knows the solution can you kindly respond? Thanks!!

  4. Reply Easy Life INcomes Jun 1,2015 5:35 am

    I am not sure where you are getting your info, but great topic.
    I needs to spend some time learning much more or understanding more.

    Thanks for excellent information I was looking for this info for my mission.

Leave a Reply