sing the Open Source Code Node
Do you want to learn about open source code node sas in a friendly tone? If yes then this post contains sas vdmml open source code node. Use this post to get latest open source code node
If you are a developer and are looking for open source code node sas, then this is the right place for you. In the following article you will read about the different collection of libraries and tools that can be used for developing software applications.
When running the Open Source Code node, additional Python or R code is generated and added before and after your code. The precursor code creates target, input, and partition variables; input handles; and other necessary data items. The posterior code, when applicable, converts the scored data frame to a CSV file.
You can use the Open Source Code node to produce a custom model that is incorporated into your pipeline. To indicate that your Code node will produce a model that should be assessed and processed by the Model Comparison node:
- Right-click the node, and select Move.
- Select Supervised Learning.
- The Open Source Code node is now connected to a Model Comparison node.
In order for the model produced by the Open Source Code node to be assessed, it must produce a scored data set with the appropriate prediction variables names. Otherwise, no assessment reports are generated. To ensure that reports are generated, ensure that the following criteria are met:
- If the Generate data frame property is selected, the scored data set must be saved to dm_scoreddf. Otherwise, it must be saved to node_scored.csv.
- The prediction variables in the scored data set must be named according to the following convention:
- Interval targets must be named P_<targetVariableName>. For example, if MSRP is the target variable then the prediction variable must be P_MSRP.
- Categorical targets must be named P_<targetVariableName><targetLevel>. All target level properties must be computed. For example, if BAD is the target variable with levels 0 and 1 then the posterior probabilities should be P_BAD0 and P_BAD1.
- The scored data set must contain the same number of observations as the input data set.
When an open-source model is the project champion, a classification variable in the form I_<targetVariableName> must be created. The classification variable is necessary to generate the variable importance plot in the Insights tab. The classification variable can be identified with the variable dm_classtarget_intovar. The various levels of the target variable are defined in dm_classtarget_level. The I_<targetVariableName> is also required to produce the Event Classification chart and the Nominal Classification chart on the Assessment tab of the Results window.
Create the Project and Import the Input Data
This example assumes that you are signed in to SAS Drive. To create the project that you use in this example:
- In the upper left corner of the SAS Drive window, click , and select Build Models.
- Select New Project in the upper right corner of the page.
Open Source Code Node Examplefor Name in the New Project window.
- Select Data Mining and Machine Learning for Type.
- Ensure that Blank Template is specified for Template.
- In the Data field, select Browse. The Choose Data window appears.
- If HMEQ is listed on the Available tab of the Choose Data window, select the HMEQ data set and click OK.If the HMEQ data set is not listed on the Available tab, import it:
- Click the Import tab.
- Select Local filesLocal file.
- In the Open window, navigate to the location where you saved the HMEQ data set and select hmeq.csv.
- Click Open.
- In the upper right corner, click Import Item.
- After the data set is successfully imported, click OK.
- Click the Advanced button below the Description text box, and the New Project Settings window appears. Select Partition Data in the upper left corner of the window.
- Ensure that the Create partition variable option is selected, and click Save in the lower right corner of the window. This brings you back to the New Project window.
- In the lower right corner of the New Project window, click Save. You are redirected to the Data tab, where you can modify the variables in your data set.
On the Data tab, variable roles are indicated in the Role column. To change the role of a variable:
- Select a variable by clicking the corresponding check box to the left of the Variable Name column. The options pane for the selected variable appears to the right of the variable table on the Data tab.
- Expand the drop-down list under Role, and select the role type that you want to assign to the selected variable. Changes made to each variable are automatically applied and saved.CAUTIONTo avoid making unwanted changes to variable properties, you must manually deselect each variable that you modify when you are finished making changes to its properties.
- Using the steps above, adjust the property of Role for each of the following variables:
- Set BAD to Target.
- Ensure that all other variables are set to Input.
Create the Pipeline
This example requires you to complete the steps in the previous sections. This example also assumes that you have not created any other pipelines before starting this section. To create the pipeline that contains an open-source model:
- Navigate to the Pipelines tab. This tab should contain a single pipeline with only a Data node.
- Right-click the Data node and select Add child nodeData Mining PreprocessingImputation.
- Right-click the Imputation node and select Add child nodeSupervised LearningLogistic Regression.
- Right-click the Data node and select Add child nodeData Mining PreprocessingImputation. This ensures that all missing data is imputed because some open-source packages cannot handle missing data.
- Right-click the Imputation node that you added in step 4 and select Add child nodeMiscellaneousOpen Source Code.
- Right-click the Open Source Code node and select MoveSupervised Learning. This ensures that the node performs model assessment and can be compared to the Logistic Regression node.Your current pipeline should resemble the following image.
- Select the Imputation node that is above the Open Source Code node. In the options pane, select Impute non-missing variables.
- Select the Open Source Code node. In the options pane, set the value of Language to R.
- In the options pane, click Open Code Editor. Enter the following code in the code editor:
# Build logistic regression model dm_model <- glm(BAD ~ IMP_CLAGE + IMP_CLNO + IMP_DEBTINC + IMP_LOAN + IMP_MORTDUE + IMP_VALUE + IMP_YOJ, data=dm_traindf, family=binomial(link="logit")) # View model output summary(dm_model) # Score pred <- plogis(predict(dm_model, dm_inputdf)) dm_scoreddf <- data.frame(pred) colnames(dm_scoreddf) <- c("P_BAD1") dm_scoreddf$P_BAD0 <- 1 - dm_scoreddf$P_BAD1
- In the upper right corner of the code editor, click .
- Click Close.
- Right-click the Model Comparison node and select Run.
- After the pipeline has successfully run, right-click the Open Source Code node and select Results.
- Expand the R code results to view the actual code that was generated by Model Studio and submitted. Notice that this code is a combination of precursor, user, and posterior code. The precursor and posterior code is added based on the node properties and whether the node is in the Preprocessing or Supervised Learning group.
- Expand the R Output results to view the input variables that are significant in the model.
- On the Assessment tab, notice that assessment measures such as lift and ROC were computed for the open-source model.
- Close the Results window.
- Right-click the Model Comparison node and select Results. The SAS logistic regression model was chosen as the champion model based on having a better Kolmogorov-Smirnov statistic.The Assessment tab lets you compare the results of the open-source model against the SAS logistic regression model .
- Close the Results window. The example is now complete