Validating email adresses in R

I currently program an automated report generation in R – participants fill out a questionnaire, and they receive a nicely formatted pdf with their personality profile. I use knitr, LaTex, and the sendmailR package.

Some participants did not provide valid email addresses, which caused the sendmail function to crash. Therefore I wanted some validation of email addresses – here’s the function:

isValidEmail <- function(x) {
    grepl("\\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\>", as.character(x),

Let’s test some valid and invalid adresses:

# Valid adresses
isValidEmail("  ")

# invalid addresses

The regexp is taken from and adapted to the R style of regexp. Please note the many comments (e.g., here or here) about “Is there a single regexp that matches all valid email adresses?” (the answer is no).

Comments (7) | Trackback

7 Responses to “Validating email adresses in R”

  1. Marc Richter says:

    Nice snippet, thanks! Please note, though, that .office may soon be a valid TLD (as can almost any other string) with the opening up of gTLDs:

    Kind regards,

    • FelixS says:

      That’s true, thanks for the hint! I changed the last part of the regexp from {2,4} to {2,} – now any TLD with >= 2 characters is valid.

  2. gd047 says:

    Could you give a hint on how you create the nicely formatted pdf?
    Does sendmailR have the capability to send emails with attachments of any kind?

    • FelixS says:

      I will post about it soon – for your second question: yes.

      body <- list(emailText, mime_part(path_to_file))
      sendmail(from, to, subject, body, control=list(smtpServer="", verbose=TRUE))
  3. Gary Moser says:

    Looking forward to reading your upcoming post on report generation!

    • FelixS says:

      OMG 😉 – who has the time to build and test such regexps? I think I’ll stick to the 99.9% solution. Several changes would be needed to convert it to R-style …

Leave a Reply

Send this to a friend