R is a widely used programming language and software environment for statistical analysis, reporting and graphical display of data. R is developed by the community under the GNU (General Public License) and its source code is freely available. Free availability of the source code allows developers to create software in R with greater flexibility. Let’s see how to use R with its package library to open a file stored in Github.
“How To Open Source Code In R” is a concisely written book that teaches readers how to use computers to access and work with online open source code repositories. It helps many people working in the R programming environment (RStudio) find out how to use Git/GitHub systems and begin to integrate them into their own programming environments
Installing R varies slightly depending on your operating system or distribution. Refer to the installation guide found at the Comprehensive R Archive Network (CRAN) website. CRAN offers detailed instructions for installing R on various Linux distributions, Fedora, RHEL, and derivatives, MacOS, and Windows.
I was using Ubuntu and, as specified at CRAN, added the following line to my
deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu artful/
Then I ran the following commands in the terminal:
$ sudo apt-get update
$ sudo apt-get install r-base
According to CRAN, “Users who need to compile R packages from source [e.g. package maintainers, or anyone installing packages with
install.packages()] should also install the
Organization of source code
Many different programs exist to create source code. Here is an example of the source code for a Hello World program in C language:
/* Hello World program */
Even a person with no background in programming can read the C programming source code above and understand that the goal of the program is to print the words “Hello World.” In order to carry out the instructions, however, this source code must first be translated into a machine language that the computer’s processor can understand; that is the job of a special interpreter program called a compiler — in this case, a C compiler.
After programmers compile source code, the file that contains the resulting output is referred to as object code.
Object code consists mainly of the numbers one and zero and cannot be easily read or understood by humans. Object code can then be “linked” to create an executable file that runs to perform the specific program functions.
Source code management systems can help programmers better collaborate on source code development; for example, preventing one coder from inadvertently overwriting the work of another.
Using R and RStudio
Once I installed R, I was ready to learn more about using this powerful tool. Dr. Gallagher recommended “Start learning R” on DataCamp, and I also found a free course for R newbies on Code School. Both courses helped me learn R’s commands and syntax. I also enrolled in an online course in R programming at Udemy and purchased the Book of R from No Starch Press.
After more reading and watching YouTube videos, I realized I should also install RStudio. RStudio is an open source IDE for R that’s easy to install on Debian, Ubuntu, Fedora, and RHEL. It can also be installed on MacOS and Windows.
According to the RStudio website, the IDE can be customized to your preferences by selecting the Tools menu and, from there, Global Options.
R provides some great demonstration examples that can be accessed from the console by entering
demo() at the prompt. The
demo(perspective) options provide great illustrations of the power of R. I experimented with some simple vectors and plotting at the command line in the R console, which is shown below.
You may want to start learning ways to use R with some sample data, then later apply that knowledge to yield descriptive statistics on your own data. Not having an abundance of data of my own to analyze, I searched for datasets that I could use; one such source (which I didn’t use for this example) is economic research data provided by the Federal Reserve Bank of St. Louis. I was intrigued by a dataset I found titled “Passenger Miles on Commercial US Airlines, 1937-1960,” so I imported it into RStudio to test out the IDE’s capabilities. RStudio can accept data in a variety of formats, including CSV, Excel, SPSS, and SAS.
Once the data is imported, I used the
summary(AirPassengers) command to get some initial descriptive statistics of the data. After pressing Enter, I got a summary of monthly airline passengers from 1949-1960, as well as other data, including the minimum, maximum, first quarter, third quarter, median, and mean number of air passengers.
I knew from my summary statistics that the mean of this sample of airline passengers is 280.3. Entering
sd(AirPassengers) at the console yields the standard deviation, seen here in the RStudio console:
I next generated a histogram of my data, which shows this dataset graphically, by entering
hist(AirPassengers); RStudio can export the data as a PNG, PDF, JPEG, TIFF, SVG, EPS, or BMP.
In addition to generating statistics and graphical data, R keeps a history of all my operations. This enables me to return to a previous operation, and I can save this history for future reference.
In RStudio’s script editor, I can write a script of all the commands that I issue, then save that script to run again if my data changes or I want to revisit it.
R Foundation’s System Development Life Cycle (SDLC)
The R Core team develop, release, and maintain R code. R Core members come from many statistical backgrounds, and are located all over the world.
Since R is open source, all of the source code is available to be reviewed by members of the user community. The user community is estimated between the tens and hundreds of thousands. Therefore, the functionality is subject to constant evaluation and improvements. This amount of testing in the real world is unique, and lends itself to a high-quality product.
Source Code Management
R’s source code is managed in a Subversion (SVN) repository with write access limited to the R Core team. R Core defines procedures to protect the source code and the hosting server including:
- Maintaining separate source code branches for the Release Branch and the Development Version
- Logging code changes daily within the SVN repository
- Maintaining a “NEWS” file that allows users to track all changes made to R
Testing and validation
R Core maintains and updates a set of validation tests. These tests test source code against known data and known results. All errors found while testing are fixed before release.
These tests are available to end users to ensure the validation of their R installation.
R Foundation monitors feedback from users by the r-devel e-mail list and the R Bug Tracking System. This process allows for more extensive testing, and increases the likelihood that bugs are fixed before releases.
The R Foundation provides more information in their documentation, including:
- Release cycle description
- Maintenance, support, and retirement details of R
- Qualifications of R Core members
- Physical security
- IT security
- Disaster recovery plans
- Responses to various sections of 21 CFR Part 11
Licensing of source code
When a user installs a software suite like Microsoft Office, for example, the source code is proprietary, and Microsoft only gives the customer access to the software’s compiled executables and the associated library files that various executable files require to call program functions.
By comparison, when a user installs Apache OpenOffice, its open source software code can be downloaded and modified. https://www.youtube.com/embed/exUCyp8PbD4?autoplay=0&modestbranding=1&rel=0&widget_referrer=https://www.techtarget.com/searchapparchitecture/definition/source-code&enablejsapi=1&origin=https://www.techtarget.com
Typically, proprietary software vendors like Microsoft don’t share source code with customers for two reasons: to protect intellectual property and to prevent the customer from making changes to source code in a way that might break the program or make it more vulnerable to attack. Proprietary software licenses often prohibit any attempt to discover or modify the source code.
Open source software (OSS), on the other hand, is purposely designed with the idea that source code should be made available because the collaborative effort of many developers working to enhance the software can, presumably, help make it more robust and secure. Users can freely take open source code under public licenses, such as the GNU General Public License.
Purposes of source code
Beyond providing the foundation for software creation, source code has other important purposes, as well. For example, skilled users who have access to source code can more easily customize software installations, if needed.
Meanwhile, other developers can use source code to create similar programs for other operating platforms — a task that would be trickier without the coding instructions.
Access to source code also allows programmers to contribute to their community, either through sharing code for learning purposes or by recycling portions of it for other applications.
Help can easily be found by entering
help() at the R prompt. Specific help information can be found by entering the specific topic you are looking for information about, e.g.,
help(sd) for help with standard deviation. Information on contributors to the R project can be obtained by entering
contributors() at the prompt. You can find out how to cite R by entering
citation() at the prompt. License information for R can be easily obtained by entering
license() at the prompt.
R is distributed under the terms of the GNU General Public License, either Version 2, June 1991, or Version 3, June 2007. For more information about licensing R, refer to the R Project website.
In addition, RStudio provides an excellent Help menu within the GUI. This area includes links to an RStudio cheat sheet (which can be downloaded as a PDF), online learning at RStudio, RStudio documentation, support, and license information.
Are you interested in the open source code? Here is an example of an informative post about open source code. You will be able to find information about what is the source code and how to use it in R.