How To Open Source Code In R

R is a widely used programming language and software environment for statistical analysis, reporting and graphical display of data. R is developed by the community under the GNU (General Public License) and its source code is freely available. Free availability of the source code allows developers to create software in R with greater flexibility. Let’s see how to use R with its package library to open a file stored in Github.

“How To Open Source Code In R” is a concisely written book that teaches readers how to use computers to access and work with online open source code repositories. It helps many people working in the R programming environment (RStudio) find out how to use Git/GitHub systems and begin to integrate them into their own programming environments

Installing R

Installing R varies slightly depending on your operating system or distribution. Refer to the installation guide found at the Comprehensive R Archive Network (CRAN) website. CRAN offers detailed instructions for installing R on various Linux distributionsFedora, RHEL, and derivativesMacOS, and Windows.

I was using Ubuntu and, as specified at CRAN, added the following line to my /etc/apt/sources.list file:

deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu artful/

Then I ran the following commands in the terminal:

$ sudo apt-get update
$ sudo apt-get install r-base

According to CRAN, “Users who need to compile R packages from source [e.g. package maintainers, or anyone installing packages with install.packages()] should also install the r-base-dev package.”

Organization of source code

Many different programs exist to create source code. Here is an example of the source code for a Hello World program in C language:

/* Hello World program */

#include<stdio.h>

main()
{
printf(“Hello World”);

}

Even a person with no background in programming can read the C programming source code above and understand that the goal of the program is to print the words “Hello World.” In order to carry out the instructions, however, this source code must first be translated into a machine language that the computer’s processor can understand; that is the job of a special interpreter program called a compiler — in this case, a C compiler.

After programmers compile source code, the file that contains the resulting output is referred to as object code.

Object code consists mainly of the numbers one and zero and cannot be easily read or understood by humans. Object code can then be “linked” to create an executable file that runs to perform the specific program functions.

Source code management systems can help programmers better collaborate on source code development; for example, preventing one coder from inadvertently overwriting the work of another.

Using R and RStudio

Once I installed R, I was ready to learn more about using this powerful tool. Dr. Gallagher recommended “Start learning R” on DataCamp, and I also found a free course for R newbies on Code School. Both courses helped me learn R’s commands and syntax. I also enrolled in an online course in R programming at Udemy and purchased the Book of R from No Starch Press.

After more reading and watching YouTube videos, I realized I should also install RStudio. RStudio is an open source IDE for R that’s easy to install on Debian, Ubuntu, Fedora, and RHEL. It can also be installed on MacOS and Windows.

According to the RStudio website, the IDE can be customized to your preferences by selecting the Tools menu and, from there, Global Options.

r_global-options.png

RStudio global options

R provides some great demonstration examples that can be accessed from the console by entering demo() at the prompt. The demo(plotmath) and demo(perspective) options provide great illustrations of the power of R. I experimented with some simple vectors and plotting at the command line in the R console, which is shown below.

r_plotting-vectors.png

Plotting vectors

You may want to start learning ways to use R with some sample data, then later apply that knowledge to yield descriptive statistics on your own data. Not having an abundance of data of my own to analyze, I searched for datasets that I could use; one such source (which I didn’t use for this example) is economic research data provided by the Federal Reserve Bank of St. Louis. I was intrigued by a dataset I found titled “Passenger Miles on Commercial US Airlines, 1937-1960,” so I imported it into RStudio to test out the IDE’s capabilities. RStudio can accept data in a variety of formats, including CSV, Excel, SPSS, and SAS.

rstudio-import.png

Importing data into RStudio

Once the data is imported, I used the summary(AirPassengers) command to get some initial descriptive statistics of the data. After pressing Enter, I got a summary of monthly airline passengers from 1949-1960, as well as other data, including the minimum, maximum, first quarter, third quarter, median, and mean number of air passengers.

r_air-passengers.png

Summary data on air passengers

I knew from my summary statistics that the mean of this sample of airline passengers is 280.3. Entering sd(AirPassengers) at the console yields the standard deviation, seen here in the RStudio console:

r_sd-air-passengers.png

Standard deviation on air passenger data

I next generated a histogram of my data, which shows this dataset graphically, by entering hist(AirPassengers); RStudio can export the data as a PNG, PDF, JPEG, TIFF, SVG, EPS, or BMP.

r_histogram-air-passengers.png

Histogram of air passenger data

In addition to generating statistics and graphical data, R keeps a history of all my operations. This enables me to return to a previous operation, and I can save this history for future reference.

r_history.png

History of commands

In RStudio’s script editor, I can write a script of all the commands that I issue, then save that script to run again if my data changes or I want to revisit it.

r_script-editor.png

RStudio script editor

R Foundation’s System Development Life Cycle (SDLC)

Operational Overview

The R Core team develop, release, and maintain R code. R Core members come from many statistical backgrounds, and are located all over the world.

Since R is open source, all of the source code is available to be reviewed by members of the user community. The user community is estimated between the tens and hundreds of thousands. Therefore, the functionality is subject to constant evaluation and improvements. This amount of testing in the real world is unique, and lends itself to a high-quality product.

Source Code Management

R’s source code is managed in a Subversion (SVN) repository with write access limited to the R Core team. R Core defines procedures to protect the source code and the hosting server including:

  • Maintaining separate source code branches for the Release Branch and the Development Version
  • Logging code changes daily within the SVN repository
  • Maintaining a “NEWS” file that allows users to track all changes made to R

Testing and validation

R Core maintains and updates a set of validation tests. These tests test source code against known data and known results. All errors found while testing are fixed before release.

These tests are available to end users to ensure the validation of their R installation.

R Foundation monitors feedback from users by the r-devel e-mail list and the R Bug Tracking System. This process allows for more extensive testing, and increases the likelihood that bugs are fixed before releases.

More information

The R Foundation provides more information in their documentation, including:

  • Release cycle description
  • Maintenance, support, and retirement details of R
  • Qualifications of R Core members
  • Physical security
  • IT security
  • Disaster recovery plans
  • Responses to various sections of 21 CFR Part 11

Licensing of source code

Source code can be proprietary or open, and licensing agreements often reflect this distinction.

When a user installs a software suite like Microsoft Office, for example, the source code is proprietary, and Microsoft only gives the customer access to the software’s compiled executables and the associated library files that various executable files require to call program functions.

By comparison, when a user installs Apache OpenOffice, its open source software code can be downloaded and modified. https://www.youtube.com/embed/exUCyp8PbD4?autoplay=0&modestbranding=1&rel=0&widget_referrer=https://www.techtarget.com/searchapparchitecture/definition/source-code&enablejsapi=1&origin=https://www.techtarget.com

Typically, proprietary software vendors like Microsoft don’t share source code with customers for two reasons: to protect intellectual property and to prevent the customer from making changes to source code in a way that might break the program or make it more vulnerable to attack. Proprietary software licenses often prohibit any attempt to discover or modify the source code.

Open source software (OSS), on the other hand, is purposely designed with the idea that source code should be made available because the collaborative effort of many developers working to enhance the software can, presumably, help make it more robust and secure. Users can freely take open source code under public licenses, such as the GNU General Public License.

Purposes of source code

Beyond providing the foundation for software creation, source code has other important purposes, as well. For example, skilled users who have access to source code can more easily customize software installations, if needed.

Meanwhile, other developers can use source code to create similar programs for other operating platforms — a task that would be trickier without the coding instructions.

Access to source code also allows programmers to contribute to their community, either through sharing code for learning purposes or by recycling portions of it for other applications.

Getting help

Help can easily be found by entering help() at the R prompt. Specific help information can be found by entering the specific topic you are looking for information about, e.g., help(sd) for help with standard deviation. Information on contributors to the R project can be obtained by entering contributors() at the prompt. You can find out how to cite R by entering citation() at the prompt. License information for R can be easily obtained by entering license() at the prompt.

R is distributed under the terms of the GNU General Public License, either Version 2, June 1991, or Version 3, June 2007. For more information about licensing R, refer to the R Project website.

In addition, RStudio provides an excellent Help menu within the GUI. This area includes links to an RStudio cheat sheet (which can be downloaded as a PDF), online learning at RStudio, RStudio documentation, support, and license information.

Conclusion

Are you interested in the open source code? Here is an example of an informative post about open source code. You will be able to find information about what is the source code and how to use it in R.

0 Comments

No Comment.