CPAN Testers is only made possible with the support of our sponsors.
For more information on sponsoring, please visit the I CPAN Testers website.

Upgrade Notice

The CPAN Testers Blog site has been upgraded since you last accessed the site. Please press the F5 key or CTRL-R to refresh your browser cache to use the latest javascript and CSS files.

News & Views

Posted by Barbie
on 25th July 2009

If you don't normally hit the front page of the CPAN Testers Reports site (the dynamic one), you might not have noticed the visual updates made this weekend to the site. The change to the front page now allows you to see the current status of the builder, working away on the backend. The status update reports the number of requests in the queue and the request time of the oldest request. This should hopefully reduce the number of emails I get, asking whether the server is working, and whether its coping well with the volume of reports.

At 19:10 on Saturday 25th July, the numbers looked like this:

+---------------------+-------+-------+
| oldest              | total | count |
+---------------------+-------+-------+
| 2009-07-23 14:01:39 | 11668 |  7900 |
+---------------------+-------+-------+

From this you can see that the oldest request is just over 2 days old, there are a total of 11,668 requests waiting, of which 7,900 are unique. That's a low number, which is good. If the total goes over 20k or the oldest request is over 5 days old, then something has gone wrong. If that happens then feel free to send me an email point it out.

On the backend, some of the changes now ensure that the oldest requests are at least processed periodically, so that they don't stay in the queue too long. Also requests which usually have only 1 request in the total, are often quick hits, and every couple of hours, these get looked at too. It all means that the pages get manage quite nicely and the updates keep a nice balance across the whole site.

Posted by Barbie
on 24th July 2009

Now that the various bot crawlers have backed off trying to kill the server over the last month, the server itself is running quite nicely, and doesn't seem to be getting too stressed with the loads. Having watched the logs quite a bit, and not seeing anything untoward, I've now changed the update frequency to every hour for updating the cpanstats and uploads databases. As a consequence the Reports sites should get the chance to update more frequently too.

However, to rebuild the whole site takes about 5 days, but thanks to the weighting system used, some pages are more equal than others. As such popularly tested authors/distros and popular pages are likely to built within a few hours (if not a few minutes). If you visit the site daily then you'll probably find the pages you look at most are only an hour or so behind.

More updates coming this weekend.

Posted by Barbie
on 18th July 2009

On the old reports site there was, a sometimes misleading, account of the date the page was last updated. It was usually the right date, but it was rarely the right time :) As a consequence it was confusing for some as to when the page was truly updated, as it often would take some time before reports would show up for a particular distribution, and for a new release to appear.

With all the changes made to now create the dynamic and static Reports sites, the report information is updated into the database every two hours. However, due to the sheer volume of reports, distributions and authors, generating all the pages takes a considerable amount of time. There are measures in place to try and not let pages that should be updated linger in the queue too long, but it still useful to know when a page was last updated.

As such an update to the reports site today, now includes that information, providing the page has been built since today of course! So you'll now know how old a page is. On the dynamic site, if the warning of the page being in an update queue is not displayed, then regardless of how old the page is, the page contains all the report information available. 

Hopefully this will help to make it a little easier to appreciate how current a particular page is. As it can take a week to generate the whole site, if you spot any page with an update queue message, lingering for longer than about 5 days, please let me know, and I'll investigate

Posted by Barbie
on 18th July 2009

A couple of months ago, Adam Kennedy ask if it was possible for me to provide a summary of the reporting for each distribution/version available on CPAN. His goal to incorparate this into his CPANDB project, which is a pretty comprehensive look at CPAN gathering together data from a vareity of sources.

As I didn't record anything like that at the time, I felt it might be worth including. At the very least it might make the dynamic reports site a little easier to reference when anyone makes changes to the preferences.

Last week saw the first working version release, which you can now getr from the Development site. The SQLite database is available in Bzip2 and Gzip formats, and I'll look at adding LZMA when I have some time to set it up. The database is updated daily, and I'll look to make it more frequent as soon as the server settles down.

The Release Summary database contains a count of the PASS, FAIL, NA and UNKNOWN reports submitted for each distribution/version that has at least one report submitted for it. The counts are also ONLY attributed to reports that are for mature Perl versions, i.e. no reports for blead, patched or development releases of Perl are included.

I may look at providing the full Release Data database at some point, which includes all the flags for Perl maturity, Distribution Maturity, On CPAN and Patch Status. However, there are many other things on my TODO list at the moment, so I'll let you know when it's available. In the meantime, I hope you find the Release Summary useful.

Posted by Barbie
on 13th July 2009

If you've wondered why the CPAN Testers has been having problems lately, then look no further than the various web crawler bots, particularly Yahoo! Slurp, which appears to have gone into overkill mode .. literally .. in recent days. As a consequence I've started looking at making the robots.txt a little more restrictive to stop them taking down the site.

At one point I figured that I could harness the web crawlers to help create the pages for the dynamic and static sites, but they're proved more of a hindrance than a help. So I've been looking at ways to slow down their site crawling for CPAN Testers. As such I came across a forum post that was very helpful in explaining the Crawl-Delay feature of the robots.txt. Yahoo! themselves try and explain it, but exceedly badly in my book, as they failed to mention the basic part of the value .. it's in seconds! Thankfully one of the forum posters mentions this and even provides a useful example (Yahoo! please take note).

I've now updated the robots.txt, so hopefully that will take affect soon.

File Under: server
NO COMMENTS
<< August 2009 (2) June 2009 (2) >>