From 1cfcf901715aef6c566d3eac024357a2ad053feb Mon Sep 17 00:00:00 2001 From: Tushar Banik Date: Thu, 28 Mar 2024 19:31:30 +0530 Subject: [PATCH] update easy test analysis --- output/easy/README.md | 101 ++++++++++++++++++++++----------------- output/easy/analysis.Rmd | 74 +++++++++++++++++----------- 2 files changed, 105 insertions(+), 70 deletions(-) diff --git a/output/easy/README.md b/output/easy/README.md index f9d8a96..2361862 100644 --- a/output/easy/README.md +++ b/output/easy/README.md @@ -1,6 +1,13 @@ Easy Test Analysis ================ +# Easy Test + +The easy test focuses on basic XML parsing using the xml2 package. It +involves extracting specific information from a simple XML document. The +code snippet below demonstrates how to load the xml2 package and parse a +simple XML document to extract the director name for the second movie. + ## Introduction This document demonstrates the analysis of XML data using R, focusing on @@ -10,18 +17,13 @@ manipulate XML data. ## Setting Up the Environment -## XML Data - -The XML string contains information about two movies, including their -titles, directors, release years, and genres. The structure of the XML -string is hierarchical, with each movie enclosed within `` tags. +### **Section 1: Loading Libraries and XML String** ``` r -library(xml2) +library(xml2) library(stringr) -xml_string <- c( - '', +xml_string <- c( '', '', '', 'Good Will Hunting', @@ -44,12 +46,17 @@ xml_string <- c( '') ``` -## Parsing XML Data +**Explanation:** + +- The **xml2** library is loaded to handle XML data in R. -To analyze the XML data, we first need to parse it into an R object. The -`read_xml` function from the `xml2` library is used for this purpose. -This function converts the XML string into an XML document object, which -can then be manipulated using R. +- The **stringr** library is loaded for string manipulation, though it’s + not used in this snippet. + +- An XML string representing a list of movies is defined, including + details like **title**, **director**, **year**, and **genre**. + +### **Section 2: Parsing the XML Document** ``` r doc <- read_xml(paste(xml_string, collapse = '')) @@ -61,16 +68,21 @@ doc ## [1] \n Good Will Hunting\n \n Y tu mama tambien\n **Section 3: Navigating the XML Document** ``` r -mama_tambien <- xml_child(doc, search = 2) -mama_tambien +tu_mama <- xml_child(doc, search = 2) +tu_mama ``` ## {xml_node} @@ -81,7 +93,7 @@ mama_tambien ## [4] drama ``` r -xml_children(mama_tambien) +xml_children(tu_mama) ``` ## {xml_nodeset (4)} @@ -90,39 +102,31 @@ xml_children(mama_tambien) ## [3] 2001 ## [4] drama -## Displaying Results +**Explanation** -The `xml_name` function is used to display the name of the XML node, -while the `xml_attrs` function shows the attributes of the node. This -provides a clear overview of the movie’s information. +- The **xml_children** function lists all child nodes of the XML + document. -``` r -xml_name(mama_tambien) -``` +- The **xml_child** function is used to select a specific child node by + its index, in this case, the second movie. - ## [1] "movie" +### **Section 4: Extracting director Information** ``` r -xml_attrs(mama_tambien) +director <- xml_child(tu_mama,"director") +director ``` - ## mins lang - ## "106" "spa" - -## Extracting Director Information - -To extract information about a specific movie, we use the `xml_child` -function to select the movie by its position in the XML document. We -then use the `xml_children` function to access the child nodes of the -movie, such as the title, director, year, and genre. + ## {xml_node} + ## + ## [1] Alfonso + ## [2] Cuaron ``` r -director <- xml_child(mama_tambien,"director") -director +xml_contents(director) ``` - ## {xml_node} - ## + ## {xml_nodeset (2)} ## [1] Alfonso ## [2] Cuaron @@ -131,3 +135,14 @@ xml_text(director) ``` ## [1] "AlfonsoCuaron" + +**Explanation** + +- The **xml_child** function is used again to select the “director” + child node of the selected movie. + +- The **xml_contents** function lists all nodes within the “director” + node. + +- The **xml_text** function extracts the text content of the “director” + node, providing the **director’s name**. diff --git a/output/easy/analysis.Rmd b/output/easy/analysis.Rmd index 00e1c6d..d9de480 100644 --- a/output/easy/analysis.Rmd +++ b/output/easy/analysis.Rmd @@ -3,6 +3,10 @@ title: "Easy Test Analysis" output: github_document --- +# Easy Test + +The easy test focuses on basic XML parsing using the xml2 package. It involves extracting specific information from a simple XML document. The code snippet below demonstrates how to load the xml2 package and parse a simple XML document to extract the director name for the second movie. + ## Introduction This document demonstrates the analysis of XML data using R, focusing on extracting information about movies from an XML string. The analysis leverages the `xml2` and `stringr` libraries in R to parse and manipulate XML data. @@ -13,16 +17,14 @@ This document demonstrates the analysis of XML data using R, focusing on extract knitr::opts_chunk$set(echo = TRUE) ``` -## XML Data +### [**Section 1: Loading Libraries and XML String**]{.underline} -The XML string contains information about two movies, including their titles, directors, release years, and genres. The structure of the XML string is hierarchical, with each movie enclosed within `` tags. +```{r message=FALSE} -```{r xml_string, echo=TRUE} -library(xml2) +library(xml2) library(stringr) -xml_string <- c( - '', +xml_string <- c( '', '', '', 'Good Will Hunting', @@ -45,41 +47,59 @@ xml_string <- c( '') ``` -## Parsing XML Data +**Explanation:** + +- The **xml2** library is loaded to handle XML data in R. + +- The **stringr** library is loaded for string manipulation, though it's not used in this snippet. + +- An XML string representing a list of movies is defined, including details like **title**, **director**, **year**, and **genre**.\ -To analyze the XML data, we first need to parse it into an R object. The `read_xml` function from the `xml2` library is used for this purpose. This function converts the XML string into an XML document object, which can then be manipulated using R. +### [**Section 2: Parsing the XML Document**]{.underline} + +```{r message=FALSE} -```{r read_xml, echo=TRUE} doc <- read_xml(paste(xml_string, collapse = '')) doc ``` -## Extracting Movie Information +**Explanation:** -To extract information about a specific movie, we use the `xml_child` function to select the movie by its position in the XML document. We then use the `xml_children` function to access the child nodes of the movie, such as the title, director, year, and genre. +- The **read_xml** function from the xml2 package is used to parse the XML string into an XML document object. -``` {r extract_movie, echo=TRUE} -mama_tambien <- xml_child(doc, search = 2) -mama_tambien -xml_children(mama_tambien) -``` +- The paste function with **collapse = ''** is used to concatenate the XML string into a single string before parsing. -## Displaying Results +- The **parsed** XML document is stored in the variable doc. -The `xml_name` function is used to display the name of the XML node, while the `xml_attrs` function shows the attributes of the node. This provides a clear overview of the movie's information. +### [**Section 3: Navigating the XML Document**]{.underline} -``` {r display_info, echo=TRUE} -xml_name(mama_tambien) -xml_attrs(mama_tambien) +```{r message=FALSE} + +tu_mama <- xml_child(doc, search = 2) +tu_mama +xml_children(tu_mama) ``` -## Extracting Director Information +**Explanation** -To extract information about a specific movie, we use the `xml_child` function to select the movie by its position in the XML document. We then use the `xml_children` function to access the child nodes of the movie, such as the title, director, year, and genre. +- The **xml_children** function lists all child nodes of the XML document. -``` {r director_info, echo=TRUE} -director <- xml_child(mama_tambien,"director") -director +- The **xml_child** function is used to select a specific child node by its index, in this case, the second movie. +### [**Section 4: Extracting director Information**]{.underline} + +```{r message=FALSE} + +director <- xml_child(tu_mama,"director") +director +xml_contents(director) xml_text(director) -``` \ No newline at end of file +``` + +**Explanation** + +- The **xml_child** function is used again to select the "director" child node of the selected movie. + +- The **xml_contents** function lists all nodes within the "director" node. + +- The **xml_text** function extracts the text content of the "director" node, providing the **director's name**. \ No newline at end of file