[Update June 12: Data.tables functions have been improved (thanks to a comment by Matthew Dowle); for a similar approach see also Tal Galili's post]

I always asked myself, how many people actually download my packages. Now I finally can get an answer (… with some anxiety to get frustrated
Here are the complete, self-contained R scripts to analyze these log data:

### Step 1: Download all log files in a subfolder (this steps takes a couple of minutes)

# Here's an easy way to get all the URLs in R
start <- as.Date('2012-10-01')
today <- as.Date('2013-06-10')

all_days <- seq(start, today, by = 'day')

year <- as.POSIXlt(all_days)$year + 1900 urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz') # only download the files you don't have: missing_days <- setdiff(as.character(all_days), tools::file_path_sans_ext(dir("CRANlogs"), TRUE)) dir.create("CRANlogs") for (i in 1:length(missing_days)) { print(paste0(i, "/", length(missing_days))) download.file(urls[i], paste0('CRANlogs/', missing_days[i], '.csv.gz')) } ### Step 2: Combine all daily files into one big data table (this steps also takes a couple of minutes…) ## ====================================================================== ## Step 2: Load single data files into one big data.table ## ====================================================================== file_list <- list.files("CRANlogs", full.names=TRUE) logs <- list() for (file in file_list) { print(paste("Reading", file, "...")) logs[[file]] <- read.table(file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", as.is=TRUE) } # rbind together all files library(data.table) dat <- rbindlist(logs) # add some keys and define variable types dat[, date:=as.Date(date)] dat[, package:=factor(package)] dat[, country:=factor(country)] dat[, weekday:=weekdays(date)] dat[, week:=strftime(as.POSIXlt(date),format="%Y-%W")] setkey(dat, package, date, week, country) save(dat, file="CRANlogs/CRANlogs.RData") # for later analyses: load the saved data.table # load("CRANlogs/CRANlogs.RData") ### Step 3: Analyze it! ## ====================================================================== ## Step 3: Analyze it! ## ====================================================================== library(ggplot2) library(plyr) str(dat) # Overall downloads of packages d1 <- dat[, length(week), by=package] d1 <- d1[order(V1), ] d1[package=="TripleR", ] d1[package=="psych", ] # plot 1: Compare downloads of selected packages on a weekly basis agg1 <- dat[J(c("TripleR", "RSA")), length(unique(ip_id)), by=c("week", "package")] ggplot(agg1, aes(x=week, y=V1, color=package, group=package)) + geom_line() + ylab("Downloads") + theme_bw() + theme(axis.text.x = element_text(angle=90, size=8, vjust=0.5)) agg1 <- dat[J(c("psych", "TripleR", "RSA")), length(unique(ip_id)), by=c("week", "package")] ggplot(agg1, aes(x=week, y=V1, color=package, group=package)) + geom_line() + ylab("Downloads") + theme_bw() + theme(axis.text.x = element_text(angle=90, size=8, vjust=0.5)) Here are my two packages, TripleR and RSA . Actually, ~30 downloads per week (from this single mirror) is much more than I’ve expected! To put things in perspective: package psych included in the plot: Some psychological sidenotes on social comparisons: • Downward comparisons enhance well-being, extreme upward comparisons are detrimental. Hence, do never include ggplot2 into your graphic! • Upward comparisons instigate your achievement motive, and give you drive to get better. Hence, select some packages, which are slightly above your own. • Of course, things are a bit more complicated than that … All source code on this post is licensed under the FreeBSD license. ### 12 Responses to “Finally! Tracking CRAN packages downloads” 1. Tal Galili says: Hi Felix, I found your post only after writing the code for my own post here: http://www.r-statistics.com/2013/06/answering-how-many-people-use-my-r-package/ If you’re interested, I’d be happy to include your code in the “installr” package, feel free to send changes/pull-requests to: https://github.com/talgalili/installr/blob/master/R/RStudio_CRAN_data.r Cheers, Tal • FelixS says: Hi Tal, I guess many people started working simultaneously on this after the logs have been released! It’s a good idea to put this functionality into a package. Feel free to use any code from this post (I’ve added an official license statement); when I find the time I’ll send some changes. I think the data.tables functions should be much faster than the rbind approach! Cheers, Felix 2. [...] map which highlights the countries based on how much people there use of R. Felix Schonbrodt wrote a great post on Tracking CRAN packages downloads. In the meantime, I’ve started crafting some basic functions for package developers to easily [...] 3. Matthew Dowle says: Hi, Very nice! Just a few improvements of data.table usage … the rbindlist is good, but then those 5 dat$<- will copy all of dat each time, just like base R. That's what := is for; e.g., dat[,date:=as.Date(date)] should be faster than dat$date <- as.Date(dat$date), times 5. Also in section 3 the idea is to avoid variable name repetition by not needing d1$; e.g., instead of d1 <- d1[order(d1$V1), ] it's just d1 <- d1[order(V1),].
