I currently program an automated report generation in R – participants fill out a questionnaire, and they receive a nicely formatted pdf with their personality profile. I use knitr, LaTex, and the sendmailR package.
Some participants did not provide valid email addresses, which caused the sendmail function to crash. Therefore I wanted some validation of email addresses – here’s the function:
[cc lang=”rsplus” escaped=”true”]
isValidEmail <- function(x) {
grepl("\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>“, as.character(x), ignore.case=TRUE)
}
[/cc]
Let’s test some valid and invalid adresses:
[cc lang=”rsplus” escaped=”true”]
# Valid adresses
isValidEmail(“felix@nicebread.de”)
isValidEmail(“felix.123.honeyBunny@nicebread.lmu.de”)
isValidEmail(“felix@nicebread.de “)
isValidEmail(” felix@nicebread.de”)
isValidEmail(“felix+batman@nicebread.de”)
isValidEmail(“felix@nicebread.office”)
# invalid addresses
isValidEmail(“felix@nicebread”)
isValidEmail(“felix@nicebread@de”)
isValidEmail(“felixnicebread.de”)
[/cc]
The regexp is taken from www.regular-expressions.info and adapted to the R style of regexp. Please note the many comments (e.g., here or here) about “Is there a single regexp that matches all valid email adresses?” (the answer is no).
7 thoughts on “Validating email adresses in R”
Comments are closed.
Nice snippet, thanks! Please note, though, that .office may soon be a valid TLD (as can almost any other string) with the opening up of gTLDs:
http://en.wikipedia.org/wiki/Generic_top-level_domain#New_top-level_domains
Kind regards,
Marc
That’s true, thanks for the hint! I changed the last part of the regexp from {2,4} to {2,} – now any TLD with >= 2 characters is valid.
Could you give a hint on how you create the nicely formatted pdf?
Does sendmailR have the capability to send emails with attachments of any kind?
I will post about it soon – for your second question: yes.
body <- list(emailText, mime_part(path_to_file)) sendmail(from, to, subject, body, control=list(smtpServer="mailout.xyz.de", verbose=TRUE))
Looking forward to reading your upcoming post on report generation!
I couldn’t help but think of this discussion on stackoverflow:
http://stackoverflow.com/questions/201323/how-to-use-a-regular-expression-to-validate-an-email-addresses
Here’s the regex they recommend:
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
OMG 😉 – who has the time to build and test such regexps? I think I’ll stick to the 99.9% solution. Several changes would be needed to convert it to R-style …