Google News covers news articles appearing within the past 30 days on various news websites, in different countries and languages.
In total, Google News aggregates content from more than 25,000 publishers. Google News provides searching, and the choice of sorting the results by date and time of publishing or grouping them.
The actual list of sources is not known outside of Google. The stated information from Google is that it watches more than 4,500 English-language news sites. Continue reading
In a project to categorize a list of news websites last week we wrote a Python script, based on a script from the great book Programming Collective Intelligence.
We started with an example taken from the book; generatefeedvector.py as printed in chapter 3.
Although the explanation given in the book was of great help and we ended up with our own working script, the sample code gave us a hard time at the start. Due to at least one typo in the code, we just could not get the code to run.
It was like the original code could not process the list with URL’s stored in a txt file.
Fortunately we were not the first who ran into this problem. With some help of the department of computer science at the the Old Dominion University we started with a working script as a base for our own classifier.
A working version of generatefeedvector.py can be found at: http://www.cs.odu.edu/~hany/teaching/cs495-f12/lectures/lecture_4/code/generatefeedvector.python
Just in case, the code for a working generatefeevector.py is given below as well. Don’t forget you need some extra modules for Python installed and a textfile with URL’s to get this working. Continue reading