Basic text string functions in R

To get the length of a text string (i.e. the number of characters in the string):
[code lang=”R” gutter=”false”] nchar()[/code]
Using length() would just give you the length of the vector containing the string, which will be 1 if the string is just a single string.

To get the position of a regular expression match(es) in a text string x:
[code lang=”R” gutter=”false”]
pos = regexpr(‘pattern’, x) # Returns position of 1st match in a string
pos = gregexpr(‘pattern’, x) # Returns positions of every match in a string
[/code]

To get the position of a regular expression match in a vector x of text strings (this returns the index of the matching string in the vector, not the position of the match in the text string itself):
[code lang=”R” gutter=”false”]
pos = grep(‘pattern’, x)
[/code]

To extract part of a text string based on position in the text string, where first and last are the locations in the text string, usually found by the regexpr() function:
[code lang=”R” gutter=”false”]
keep = substr(x, first, last)
[/code]

To replace part of a text string with some other text:
[code lang=”R” gutter=”false”]
sub(‘pattern’, replacement, input) # Changes only the 1st pattern match per string
gsub(‘pattern’, replacement, input) # Changes every occurrence of a pattern match
[/code]

The pattern argument in the various regular expression functions can include include regular expressions enclosed in square brackets. See ?regex for the explanation of regular expressions. For example, to make a pattern that matches any numerical digit, you could use '[0-9]' as the pattern argument. You may also use several predefined patterns such as '[:digit:]', which also finds any numerical digit in the string, same as the [0-9] pattern.

File name stuff

To get a list of file names (and paths) in a directory:
[code lang=”R” gutter=”false”]
fnames = dir("./path/to/my/data", full.names=TRUE)
[/code]

To extract just the filename from a full path:
[code lang=”R” gutter=”false”]
fname = basename(path)
[/code]

To extract the directory path from a file path:
[code lang=”R” gutter=”false”]
directory = dirname(path)
[/code]

If you have a text string assigned to a variable in the R workspace, and you want to parse it using various other functions, you can use the textConnection() function to feed your string to the other function.
[code lang=”R” gutter=”false”]
mydataframe = read.csv(textConnection(myString)) # If myString contained comma-separated-values, this would convert them to a data frame.
[/code]