Lecture Notes: Information Processing in R
The R Programming Language
The R programming language is one of the most commonly used languages for ad hoc data analysis in business when Excel is not sufficient or not able to handle large data sets. It is also often used to develop data mining or machine learning solutions, although production code is often re-written in Java or C++.
Programming with R
Environments:
Variables & Identifiers:
- Variables hold values: scalars, text strings, data frames, results from calls to functions
- Variables have names that must follow certain rules: must start with a letter, can contain letters digits and certain special characters including dot (.) and underscore (_)
- The dot (.) as a regular character can be confusing as other language (e.g., Java) use dot to designate property or method access
- Legal Variable Names:
- df
- df2
- df.txns
- df_all2017
- Illegal Variable Names:
- 2df -- cannot start with a digit
- rs$all -- cannot contain a $; the $ is used to access columns in a dataframe
- rs# -- only . and _ are allowed in addition to digits and letters
Data & Object Types:
- Variables are typed in R based on what value they contain:
- The types can change simply by assigning a new value
- Types can be converted using as.xxxx functions, e.g., as.numeric or as.date or as.string
- Conversions are often necessary when data is read from CSV or XML files or databases
Variables, Objects, Assignment
Data Frames
Data frames are essentially spreadsheets; they are tables of rows and columns where each row is a data case and each column is a value.
Working with Data Frames
Loading Data from Files
Data Sets for code below:
Load Data from CSV into Dataframe
Queries on Data Frames
Queries on Data Frames
Dealing with Missing Data Values
Loading Data from Database into Data Frame with SQL
Data Sets & SQLite database for code below:
Database Queries in R
Loading XML File into R
- requires the library XML, so after installing the library from CRAN, use the command library(XML)
Processing XML in R via XPath
Processing XML in R via Tree
Processing XML in R via XPath