Skip to content

Commit

Permalink
update easy test analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
Tushar98644 committed Mar 28, 2024
1 parent 0729806 commit 1cfcf90
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 70 deletions.
101 changes: 58 additions & 43 deletions output/easy/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
Easy Test Analysis
================

# Easy Test

The easy test focuses on basic XML parsing using the xml2 package. It
involves extracting specific information from a simple XML document. The
code snippet below demonstrates how to load the xml2 package and parse a
simple XML document to extract the director name for the second movie.

## Introduction

This document demonstrates the analysis of XML data using R, focusing on
Expand All @@ -10,18 +17,13 @@ manipulate XML data.

## Setting Up the Environment

## XML Data

The XML string contains information about two movies, including their
titles, directors, release years, and genres. The structure of the XML
string is hierarchical, with each movie enclosed within `<movie>` tags.
### <u>**Section 1: Loading Libraries and XML String**</u>

``` r
library(xml2)
library(xml2)
library(stringr)

xml_string <- c(
'<?xml version="1.0" encoding="UTF-8"?>',
xml_string <- c( '<?xml version="1.0" encoding="UTF-8"?>',
'<movies>',
'<movie mins="126" lang="eng">',
'<title>Good Will Hunting</title>',
Expand All @@ -44,12 +46,17 @@ xml_string <- c(
'</movies>')
```

## Parsing XML Data
**Explanation:**

- The **xml2** library is loaded to handle XML data in R.

To analyze the XML data, we first need to parse it into an R object. The
`read_xml` function from the `xml2` library is used for this purpose.
This function converts the XML string into an XML document object, which
can then be manipulated using R.
- The **stringr** library is loaded for string manipulation, though it’s
not used in this snippet.

- An XML string representing a list of movies is defined, including
details like **title**, **director**, **year**, and **genre**.

### <u>**Section 2: Parsing the XML Document**</u>

``` r
doc <- read_xml(paste(xml_string, collapse = ''))
Expand All @@ -61,16 +68,21 @@ doc
## [1] <movie mins="126" lang="eng">\n <title>Good Will Hunting</title>\n <dir ...
## [2] <movie mins="106" lang="spa">\n <title>Y tu mama tambien</title>\n <dir ...

## Extracting Movie Information
**Explanation:**

- The **read_xml** function from the xml2 package is used to parse the
XML string into an XML document object.

To extract information about a specific movie, we use the `xml_child`
function to select the movie by its position in the XML document. We
then use the `xml_children` function to access the child nodes of the
movie, such as the title, director, year, and genre.
- The paste function with **collapse = ’’** is used to concatenate the
XML string into a single string before parsing.

- The **parsed** XML document is stored in the variable doc.

### <u>**Section 3: Navigating the XML Document**</u>

``` r
mama_tambien <- xml_child(doc, search = 2)
mama_tambien
tu_mama <- xml_child(doc, search = 2)
tu_mama
```

## {xml_node}
Expand All @@ -81,7 +93,7 @@ mama_tambien
## [4] <genre>drama</genre>

``` r
xml_children(mama_tambien)
xml_children(tu_mama)
```

## {xml_nodeset (4)}
Expand All @@ -90,39 +102,31 @@ xml_children(mama_tambien)
## [3] <year>2001</year>
## [4] <genre>drama</genre>

## Displaying Results
**Explanation**

The `xml_name` function is used to display the name of the XML node,
while the `xml_attrs` function shows the attributes of the node. This
provides a clear overview of the movie’s information.
- The **xml_children** function lists all child nodes of the XML
document.

``` r
xml_name(mama_tambien)
```
- The **xml_child** function is used to select a specific child node by
its index, in this case, the second movie.

## [1] "movie"
### <u>**Section 4: Extracting director Information**</u>

``` r
xml_attrs(mama_tambien)
director <- xml_child(tu_mama,"director")
director
```

## mins lang
## "106" "spa"

## Extracting Director Information

To extract information about a specific movie, we use the `xml_child`
function to select the movie by its position in the XML document. We
then use the `xml_children` function to access the child nodes of the
movie, such as the title, director, year, and genre.
## {xml_node}
## <director>
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>

``` r
director <- xml_child(mama_tambien,"director")
director
xml_contents(director)
```

## {xml_node}
## <director>
## {xml_nodeset (2)}
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>

Expand All @@ -131,3 +135,14 @@ xml_text(director)
```

## [1] "AlfonsoCuaron"

**Explanation**

- The **xml_child** function is used again to select the “director”
child node of the selected movie.

- The **xml_contents** function lists all nodes within the “director”
node.

- The **xml_text** function extracts the text content of the “director”
node, providing the **director’s name**.
74 changes: 47 additions & 27 deletions output/easy/analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ title: "Easy Test Analysis"
output: github_document
---

# Easy Test

The easy test focuses on basic XML parsing using the xml2 package. It involves extracting specific information from a simple XML document. The code snippet below demonstrates how to load the xml2 package and parse a simple XML document to extract the director name for the second movie.

## Introduction

This document demonstrates the analysis of XML data using R, focusing on extracting information about movies from an XML string. The analysis leverages the `xml2` and `stringr` libraries in R to parse and manipulate XML data.
Expand All @@ -13,16 +17,14 @@ This document demonstrates the analysis of XML data using R, focusing on extract
knitr::opts_chunk$set(echo = TRUE)
```

## XML Data
### [**Section 1: Loading Libraries and XML String**]{.underline}

The XML string contains information about two movies, including their titles, directors, release years, and genres. The structure of the XML string is hierarchical, with each movie enclosed within `<movie>` tags.
```{r message=FALSE}
```{r xml_string, echo=TRUE}
library(xml2)
library(xml2)
library(stringr)
xml_string <- c(
'<?xml version="1.0" encoding="UTF-8"?>',
xml_string <- c( '<?xml version="1.0" encoding="UTF-8"?>',
'<movies>',
'<movie mins="126" lang="eng">',
'<title>Good Will Hunting</title>',
Expand All @@ -45,41 +47,59 @@ xml_string <- c(
'</movies>')
```

## Parsing XML Data
**Explanation:**

- The **xml2** library is loaded to handle XML data in R.

- The **stringr** library is loaded for string manipulation, though it's not used in this snippet.

- An XML string representing a list of movies is defined, including details like **title**, **director**, **year**, and **genre**.\

To analyze the XML data, we first need to parse it into an R object. The `read_xml` function from the `xml2` library is used for this purpose. This function converts the XML string into an XML document object, which can then be manipulated using R.
### [**Section 2: Parsing the XML Document**]{.underline}

```{r message=FALSE}
```{r read_xml, echo=TRUE}
doc <- read_xml(paste(xml_string, collapse = ''))
doc
```

## Extracting Movie Information
**Explanation:**

To extract information about a specific movie, we use the `xml_child` function to select the movie by its position in the XML document. We then use the `xml_children` function to access the child nodes of the movie, such as the title, director, year, and genre.
- The **read_xml** function from the xml2 package is used to parse the XML string into an XML document object.

``` {r extract_movie, echo=TRUE}
mama_tambien <- xml_child(doc, search = 2)
mama_tambien
xml_children(mama_tambien)
```
- The paste function with **collapse = ''** is used to concatenate the XML string into a single string before parsing.

## Displaying Results
- The **parsed** XML document is stored in the variable doc.

The `xml_name` function is used to display the name of the XML node, while the `xml_attrs` function shows the attributes of the node. This provides a clear overview of the movie's information.
### [**Section 3: Navigating the XML Document**]{.underline}

``` {r display_info, echo=TRUE}
xml_name(mama_tambien)
xml_attrs(mama_tambien)
```{r message=FALSE}
tu_mama <- xml_child(doc, search = 2)
tu_mama
xml_children(tu_mama)
```

## Extracting Director Information
**Explanation**

To extract information about a specific movie, we use the `xml_child` function to select the movie by its position in the XML document. We then use the `xml_children` function to access the child nodes of the movie, such as the title, director, year, and genre.
- The **xml_children** function lists all child nodes of the XML document.

``` {r director_info, echo=TRUE}
director <- xml_child(mama_tambien,"director")
director
- The **xml_child** function is used to select a specific child node by its index, in this case, the second movie.

### [**Section 4: Extracting director Information**]{.underline}

```{r message=FALSE}
director <- xml_child(tu_mama,"director")
director
xml_contents(director)
xml_text(director)
```
```

**Explanation**

- The **xml_child** function is used again to select the "director" child node of the selected movie.

- The **xml_contents** function lists all nodes within the "director" node.

- The **xml_text** function extracts the text content of the "director" node, providing the **director's name**.

0 comments on commit 1cfcf90

Please sign in to comment.