CPAN Testers is only made possible with the support of our sponsors.
For more information on sponsoring, please visit the I CPAN Testers website.

Upgrade Notice

The CPAN Testers Blog site has been upgraded since you last accessed the site. Please press the F5 key or CTRL-R to refresh your browser cache to use the latest javascript and CSS files.

News & Views

December turned out to be a very interesting month. After a lot of tweaks, I updated several parts of the ecosystem from the Generator, which takes the metabase reports and parses them for the cpanstats database, through to the Builder, which builds the files used by the Reports website, to the Uploads mechanism, which monitors the changes in CPAN and new uploads to PAUSE. It has all helped to streamline the process a little, which means the processing from a report submission to appearing on the CPAN Testers Reports website is getting quicker.

Back in November I reported that the Builder had caught up so that pages were getting built in less than 24 hours. This was shortlived, as the changes made to the ecosystem meant a total rebuild of all pages was required. In addition since the launch of CT2.0, there had been a number of reports that hadn't been passed between the Metabase server and the CPAN Testers server. As a consequence, David Golden provided me with a list of the missing reports, and I set about parsing over 70,000 reports. During the first few weeks of December, this meant the build once again fell back to updating within 5 days. However, with all the improvements it has quickly caught up again and is now roughly 36 hours behind. There are still a few improvements left to include, so hopefully we'll start seeing most (if not all) reports appearing on the Reports site in less than 24 hours.

A noticeable improvement has been to reduce the default creation of RSS files. The next improvement is to reduce the default creation of YAML files. The RSS and YAML are essentially support files and can be dynamically created on request from the JSON file that is created with all the cpanstats report information regarding a distribution or author. If you have an application that uses the current YAML files, please look at whether it can switch to using the JSON files instead. This will reduce the burden on the server and allow us to process reports even quicker.

Just before Christmas we also passed the 10 million reports mark, that's 10 million reports submitted in a little over 10 years. With the launch of CT2.0, CPAN Testers is a testing service to be truly reckoned with. While other languages may have sought to emulate our success, I have yet to see any posts of others coming anywhere close to CPAN Testers.

Having said that, it seems we aren't that great at self-promotion either. While these summaries hopefully add to the search caches, it is surprising how many have either forget the efforts of CPAN Testers over the past year, or where unaware of them in the first place. Though very grateful thanks go to chromatic and Karen for including us in their mentions of Perl accomplishments in 2010. If you are planning a look at Perl accomplishments, please don't forget to mention CPAN Testers. Behind most the Perl projects you are likely to mention, CPAN Testers have probably helped to iron out quirks and bugs on their way towards the success stories they have become.

The CT2.0 project was perhaps the biggest project overhaul I've ever been involved with, and the fact that we managed to get it done in 6 months, with a small team of developers is phenonmenal. CPAN Testers continues to evolve and I think we are well placed to grow ever more exponentially for the next 10 years. Thank you to everyone who has been involved in the CT2.0, all the testers (even if you've only submitted 1 report this year) and to everyone who has offered suggestions, feedback and thanks to all the efforts over the past year. Special thanks should also go to Robert, Ask and Léon who truly set CPAN Testers on the road to being the project it is today, and nutured it for many years before David and I were able to take over.

If you watch the recent upload lists, you may have noticed a release on 1st Jan 2011. Originally this had been planned for some time in 2010, but other tasks took priority. After having several people ask about it, Labyrinth has finally been released, which provides the core engine behind several of the CPAN Testers websites. This now means I will be releasing the code to run the Reports, Blog, Wiki and Preferences websites in the coming months. This should hopefully make the whole ecosystem more accessible to anyone wishing to submit patches. In addition I hope it also provides the ability for other projects and businesses to develop their own in house testing, reporting and analysis systems.

Looking forward to 2011, there are lots of plans for CPAN Testers, from the testing clients, documentation and server side development through to encouraging more testers on more diverse platforms to get involved. It'll also be interesting to see whether anyone can compete with Chris Williams multi-million report submission achievement. Whatever happens, I'm sure there will be plenty to write about.

Posted by Barbie
on 25th May 2010

Recently Leo Lapworth updated several of the sites he's been working on to list the current stats for CPAN, as was previously seen in the footer of each page. It's good to remind people (even subconsciously) just how big CPAN is. In his post Leo also wondered what the 20,000th distribution was. With the CPAN Testers database holding lots of stats about CPAN, as well as the CPAN Testers reports, it was fairly straightforward to extract some numbers. In fact it proved so straightforward I promised to include it in the CPAN Testers Statistics website.

I'm pleased to say I finally found some time to do just that, and have revamped the CPAN Statistics page to include the CPAN Milestones. Now you can keep up to date with what distributions are hitting some significant milestones.

One month to go and work is progressing well on the transformation to CPAN Testers 2.0. Over the last month many changes to the websites have been visible, but just as many changes have been happening behind the scenes. The Metabase is a key part of the transformation, and although work has been going well, it is reaching the point where it'll need some serious testing prior to switch over on 1st march 2010. If you have the time, please join the cpan-testers-discuss mailing list or contact David Golden to let him know what you can help with. See David's CPAN Testers 2.0 mid-January update blog post for a more detailed status update.

In order to reduce the load on the servers, after the announcement of the switch over, the CPAN Testers agreed to back off their smokers to ease the pressure on the mail servers. The cpan-testers mailing list is a very high volume list, and takes up a lot of resources to manage it. Many of the testers throttled back their smoke bots and we did see a dramatic reduction in test reports being submitted. We were aiming for around 5,000 a day maximum. Within a day or two we were successfully below the target.

However, not all went well. One smoker bot suddenly appeared to go AWOL, and the tester didn't seem to be responding to direct requests to throttle the smoker. Worse still the bulk of reports being produced were bogus. While some PASS reports got through, most were failing due to what appears to be a bad combination of environment and old toolchain software. As this was now polluting the pool of reports at a considerable rate (for every good report submitted, 1 or more was submitted by the bad smoker), something needed to be done to reduce or halt the effects. Several authors were rightly concerned that this would make their distributions look bad on the CPAN Testers Reports site. Thankfully, a new site (more on the later) is in the works which will make this easier to manage, but in the interim a further measure was put in place. I now have the ability to blacklist runaway smokers, by invalidating their reports as the come in. This then means the reports are ignored by the Reports site, the Statistics site and the rest of the eco-system. I also manually marked all the smoker's reports during January as invalid.

It transpires that the tester was on holiday and had started off his smokers before he went, without checking to ensure the reports they were sending were valid. Once this tester has upgraded to use the right tools, I'll remove him from the blacklist. However, it is good to know that we can now quickly stop any future runaway smokers before they can do much damage to the reports and statistics.

Normally one story would be the only excitement of a month, but there was more to come. On 17th January, the CPAN Testers server started show effects of being under attack. In the early hours of 18th January, the server locked up, and required manual intervention to reboot it. Once back online, an investigation through the logs revealled that the MSNBot, as used by Microsoft, had been hitting the server at a rate of knots. In fact, so much so, that the logs began rapidly filling up again after the reboot. After initially blocking the range of IPs, which grew as the day went, I wrote an article and posted to the CPAN Testers blogs to warn anyone who might be using the CPAN Testers server. Little did I realise that the story would spread like wildfire around the world on numerous IT related news networks and blogs! I did get an apology from someone representing the Bing team, but it should never have got to that. Reading many of the comments on various blogs, although a small minority took delight in having a kick at Perl, the majority of posts were in support of the ban, and many even had their own experiences. While I may have been the first to shout loudly, CPAN Testers definitely weren't the first to be knocked out by Microsoft.

Over a week later, with the ban still in place, and the robots.txt changed to ban all access to msnbot, every hour now the msnbot blasts the server for about 5-10 minutes at the rate of between 4-8 requests a second, mostly from the same 2 IP addresses. So even after banning the bot (it gets a 403) and having an apology from Microsoft, the bot still hasn't learnt to get itself under control. If Microsoft ever want people to take Bing seriously as a search engine, then they need to start acting responsibly, otherwise they are likely to find themselves banned from a good portion of the internet.

One thing I would like to make clear about the incident, is that all the monitoring of the server is done completely voluntarily. Over the last month this has taken up a lot of spare time, which often wasn't there to begin with. However, the server itself is *NOT* a High-Availability setup, and is *ONE* server on its own. No redundancy (apart from the RAIDed disks) with the web server, database and processing tools all sharing the same physical hardware. If it takes 2 seconds to return a web page, its likely that the server is under considerable load to process incoming reports, running backups or generating web pages, RSS feeds, JSON/YAML files to keep the rest of the eco-system (including CPAN/CPANPLUS,, etc) able to keep up to date. Taking it out of action is not something that is taken lightly.

The original post was perhaps rather emotionally put together, and I apologise to anyone who may have got caught in any flak for that. However, I had just woken up and spent much of the morning trying to get the server back online while getting the kids ready for school and heading out to work! With it being a Monday morning too, hopefully it was understandable that a rant ensued. I'll be taking several deep breaths, if (though hopefully not) it happens again!

As mentioned earlier in the post, and in the previous summary, I did plan to release a new site during January. The CPAN Testers Administration site is still planned to go live, just not yet. With all the changes to the underlying software for CT2.0, there are some changes required for the Administration site that also need to be done. As this isn't live yet, I now consider it a low priority to getting CT2.0 completed, and will now wait until after CT2.0 has gone live, before finishing off the release.

In the last weekend of January, the biggest changes to the current databases went into effect, with the Metabase GUID now being used. Although the full extent of the change won't be seen until we're using the Metabase for submitting reports, this first shift is an important one. There were a few glitches as I brought the processing tools back online, as I soon discovered little parts that were affected by the change that I hadn't anticipated. Thankfully the errors were minor and all were quickly fixed. The server is now catching up on processing from the weekend, and I anticipate all will be back to normal service within the next day or two.

To reduce the processing load, as mentioned in a previous post, the database backups are now happening a little less frequently. The CVS backups have now been disabled, with the uploads and release databases both backed up once a day (usually between 00:30 and 02:30 Central European Time). The cpanstats database is currently backed up once an hour, but seeing as the bzip version seems to be only popular with a few people (one being Yahoo! Slurp :)) and only downloaded at most once a day by any single IP (including Yahoo! Slurp ... see Microsoft, some search engines can get it right), I'm considering only generating the bzip version only once a day. I'll watch the logs and see if there are any changes, but if aren't I will likely adjust the backups inline with current requests for the files.

Along with the backup changes, various other daily server processes have been reviewed and many have been rescheduled to reduce server load. The end result has been to reduce the nightly overheads and hopefully the server will be in a better position to process reports once the CPAN Testers switch to the Metabase and unleash their smokers from the current limiters.

Last month we had a total of 162 tester addresses submitting reports. The mappings this month included 21 total addresses mapped, of which 7 were for newly identified testers. Another low mapping month, due to work being done on CPAN Testers as a whole.

A long summary this month, but then a lot has been happening. Expect updates throughout the month as various parts of CT2.0 undergo testing, and we start to see the results of all the hard work of the past couple of years. The future is nigh.

Posted by Barbie
on 19th November 2008

One of the problems with the CPAN Testers website resources, is that where an author listing of distributions, or the list of versions for a distribution, is required, a lot of backend trawling is done. This is due to the current backends having to refer to 3 sources to get those lists. Even then the resulting lists aren't quite correct, as the version sorting can be slightly weird when you have to take into account every author has slightly different perception of versioning. Sort::Versions goes a long way, but it isn't 100% accurate. The only really accurate way of sorting is on the release date of a distribution, which until now hasn't existed in a single form.

For a couple of months now, CPAN Testers has had it's own BACKPAN and CPAN mirrors. Of the 3 sources these are represented by Parse::BACKPAN::Packages and Parse::CPAN::Distributions. and the 2 index files they use. These can take a long time to parse, and as they don't parse and return any release date for distributions and their version, using Sort::Versions is a reasonable alternative. However, there is a third source and that is the CPAN Uploads that are announced by PAUSE. Due to the time lag of the mirrors, very often a release can be made and not be available to CPAN for several hours, so while no CPAN Testers reports might exist, it's still important to know the latest version.

Previously the last source is the only one that contained any release date information, which prompted me to think about doing it for the other sources. Surprisingly quickly, using the local CPAN Testers copies of BACKPAN and CPAN, I was able to build a basic database of upload data, and tag each with 'backpan', 'cpan' or 'upload', to indicate in what state the release was currently at. Queries now take fractions of seconds instead of several seconds. But, and perhaps more importantly, the sorting of distributions actually makes more sense!

The new database is being integrated into the backend code at the moment, but for those that might wish to have this information available for their own uses, the complete database is publicly available at the following locations:

These will now be updated daily, and once everything else is in place will eventually updated hourly.