Improved Aggregator for Drupal 7: What's Under the Hood
An Overview of Its New Features and a Request to Test Drive It

The patch for an improved aggregator for Drupal 7 is now available on Drupal.org #236237. This code is result of Aron Novak's Google Summer of Code project and it is available as a Drupal project with regular patches against Drupal HEAD #236237. The patch has been out for a couple of weeks, so it's high time to talk about what improvements it aims to bring to Drupal core.

Before I dive into the details though, I'd like to point out that several people requested to break the patch into smaller pieces as it is rather big and touches on more than one functionality of the aggregator. We yet have to work on this, however I do think that there is a value in presenting the proposed improvements as a whole. So here we go.

There are four major differences in comparison to the existing aggregator:

  • Extensible architecture - allows external modules to add or replace functionality
  • Per feed content type configuration of aggregator
  • Replaces aggregator's XML parser with a SimpleXML based parser
  • Replaces aggregator's category system with taxonomy

1. Extensible architecture

This change is the widest reaching of all. At its core, there are the concepts of parsers and processors for aggregator. Parsers download and parse feeds, normalize feed data and expose it to other parts of the application. Processors grab feed data and act on it. For example they create database records for feed items.

In order to define a parser or a processor, one of two hooks need to be implemented:

  • hook_parse() for defining a parser
  • hook_process() for defining a processor

According to the parser/processor architecture, the new structure of aggregator is as follows:

  • aggregator module - implements API and standard download routines
  • syndication_parser module - standard RSS/Atom/RDF parser module that is supposed to ship with core. Can be used independently from aggregator.
  • aggregator.light.inc (part of aggregator module) - this implements a processor that stores feed items as lightweight database records just as the current aggregator does

The implementation of parsers and processors can be seen in syndication_parser module and in aggregator.light.inc. There is also a feed-items-as-nodes implementation in the project version of aggregator for Drupal 7. To my knowledge, the parser/processor architecture was first introduced in Drupal by Ted Serbinski with SimpleFeed, and it also exists in FeedAPI.

Current discussion points around the extensible API are:

2. Per feed content type configuration

A pluggable configuration of aggregator makes it desirable to run more than one configuration on a single Drupal site. Imagine having a configuration for aggregating iCal feeds and a different one for news feeds.

To achieve this patch #236237 uses nodes to store feeds in Drupal and piggy backs feed configuration on content type configuration. No admin/settings/aggregator anymore, but admin/build/content/[type]/aggregator instead.

In addition to supporting multiple configurations, this approach has the advantage of using the common content type permission mechanisms instead of aggregator specific permissions.

Current discussion points:

3. SimpleXML based parser

The idea of using SimpleXML as a basis for an Atom/RSS/RDF parser has come up before. Walkah talked about this at his Drupal and SimpleXML presentation in Barcelona, for example, and Mistknight introduced the first SimpleXML parser with Aggregation module in Drupal contrib. FeedAPI maintains a version of Aggregation's parser and this is also the code patch #236237 takes as a basis for its Syndication Parser.

In addition, a test framework for syndication and a ton of test feeds are in the works. We tapped into Universal Feedparser's huge repository of test feeds to stock up quick. I'm excited about these test feeds because they will help us avoiding regressions on parser changes. On the other hand, this is also the part of the patch that hast the biggest potential for breaking things. A complete new parser for aggregator will bring a world of small time parsing glitches and we don't have a way of quickly pulling up of what's been improved and fixed in the many years of the existing aggregator's life, since these fixes weren't backed up by tests.

4. Use taxonomy instead of categories

By using nodes to represent feeds, taxonomy can now be used to categorize feed items - no matter if these feed items are nodes or other entities. This change comes at the cost of feed items not being permitted to be categorized on a item per item basis, but it has the beauty of removing a stack that duplicates the functionality of taxonomy module.

Current discussion points:

Other changes

Aside from the changes outlined above, there are some smaller adjustments:

  • Special attention has been given to having solid CRUD functions. A feed can be created and parsed programmatically with this patch.
  • The API supports multithreaded parsing - hook_parse() passes an array of feed nodes, which opens the possibility of retrieving and parsing feeds in a parallel fashion.
  • A node parser implementation is available - while feed items as nodes isn't part of the patch in question, there is an implementation of it as part of the project version.
  • A download is limited to a percentage of cron execution time. This keeps the aggregator a tamed beast even if the number of feeds is high.
  • Feed blocks configuration will be more scalable.
  • Uses Drupal's input filters for feed items.
  • There's the possibility of creating a lazy installation processor. The ability to install a local entity only if a user acts on a feed item has been frequently requested, and this patch opens the possibility to implement it.

Missing functionality

How to Get Started

Get the latest Drupal HEAD, then either grab the latest patch here, or check out the project here. The patch will likely be broken up into smaller chunks, but this shows where we would like to go. Have fun poking around, and please give us feedback. It would be very much appreciated.

2 Comments
Gratitude

I'm just going to make mention of the fact that seeing aggregator go this direction is really gratifying and I certainly hope to see it in a nice shiney CVS download in the near future. This looks like great stuff all around, and should make management a dream comparatively.

Kudos, and keep working, this is sounding great (I hope to patch my 7.x install soon to include these features.

Eclipse

Thank you

Hi Eclipse, thanks for the encouraging words. We're looking for reviewers and feedback. If you would like to chip in, poke me on IRC - alex_b.

Alex