The reporter is the part of Newman which retrieves stories from various websites. The process is fairly straightforward:
- Retrieve web page containing a list of the latest news stories.
- Read the list of stories and retrieve links to individual stories.
- Retrieve each story one by one.
This description is generic enough to be satisfied by every site of those I considered reporting from, notably Football Italia, Tribalfootball, Eurosport, and Goal. Every site has a list of stories and then individual stories on separate pages. But that doesn’t mean there weren’t a few challenges to make this work, notably:
- Every site uses different html – we have to read the info we need out of the html source by using regular expressions.
- The result from every story retrieval should be just plain text, no html tags or other code.
- If the connection fails or times out, Newman should ignore the error and continue, it shouldn’t crash.

Sunday, August 20th, 2006
