COSI 12B – Advanced Programming Techniques

Programming Assignment 6

高级编程代写 In this assignment, you will build a simplified HTML validator. Though this assignment relates to web pages and HTML…

Overview

In this assignment, you will build a simplified HTML validator. Though this assignment relates to web pages and HTML, you do not need to know how to write HTML to complete it.

Background Information 高级编程代写

Web pages are written in a language called Hypertext Markup Language, or HTML. An HTML file consists of text surrounded by markings called tags. Tags give information to the text, such as formatting (bold, italic, etc.) or layout (paragraph, table, list). Some tags specify comments or information about the document (header, title, document type).

tag consists of a named element between less-than < and greater-than > symbols. For example, the tag for making text bold uses the element b and is written as <b>. Many tags apply to a range of text, in which case a pair of tags is used: an opening tag indicating the start of the range and a closing tag indicating the end of the range. A closing tag has a / slash after its < symbol, such as </b>. So to make some text bold on a page, one would put the text to be bold between opening and closing b tags, <b> like this </b>. Tags can be nested to combine effects, <b><i> bold italic </i></b>.

Some tags, such as the br tag for inserting a line break or img for inserting an image, do not cover a range of text and are considered to be “self-closing.” Self- closing tags do not need a closing tag; for a line break, only a tag of <br> is needed. Some web developers write self-closing tags with an optional / before the >, such as <br />. 高级编程代写

The distinction between a tag and an element can be confusing. A tag is a complete token surrounded by <> brackets, which could be either an opening or closing tag, such as <title> or </head>. An element is the text inside the tag, such as title or head. Some tags have attributes, which are additional information in the tag that comes after the element. For example, the tag <img src=”cat.jpg”>

specifies an image from the file cat.jpg. The element is img, and the rest of the text such as src are attributes. In this assignment, we will ignore attributes and focus just on elements and tags.

HTML Validation: 高级编程代写

One problem on the web is that many developers make mistakes in their HTML code. All tags that cover a range must eventually be closed, but some developers forget to close their tags. Also, whenever a tag is nested inside another tag,

<b><i>like this</i></b>, the inner tag (i for italic, here) must be closed before the outer tag is closed. So the following tags are not valid HTML, because the </i> should appear first: <b><i>this is invalid</b></i>.

Below is an example of a valid HTML file, with its tags in bold. A tag of <!– …–> is a comment.

高级编程代写
高级编程代写

In this assignment, you will write a class that examines HTML to figure out whether it represents “valid” sequences of tags. Instructor-provided code will read HTML pages from files and break them apart into tags for you; it’s your job to see whether the tags match correctly.

You will write a class named HtmlValidator. Your class must have the following constructors and methods. It must be possible to call the methods multiple times in any order and get the correct results each time. Several methods interact with HtmlTag objects, described later in this document.


public HtmlValidator()

public HtmlValidator(Queue<HtmlTag> tags)

Your class should have two constructors. The first should initialize your validator to store an empty queue of HTML tags. The second should initialize your validator to store a given queue of HTML tags. For example, the queue for the page shown previously would contain the tags below. Further tags can be added later if the client calls addTag. 高级编程代写

front [<!doctype>, <!-- -->, <html>, <head>, <title>,

</title>, <meta>, <link>, </head>, <body>, <p>, <a>,

</a>, </p>, <p>, <img>, </p>, </body>, </html>] back

If the queue passed is null, you should throw an

IllegalArgumentException. An empty queue (size 0) is allowed.


public void addTag(HtmlTag tag) In this method you should add the given tag to the end of your validator’s queue. If the tag passed is null, you should throw an IllegalArgumentException.


public Queue<HtmlTag> getTags() In this method you should return your validator’s queue of HTML tags. The queue should contain all tags that were passed to the constructor (if any) in their proper order, plus/minus any tags added or removed using addTag or removeAll. If any methods manipulate your queue, you must restore it to its prior state before the method is finished.


public void removeAll(String element) In this method you should remove from your validator’s queue any tags that match the given element. For example, if your validator is constructed using the tags from the page shown previously and removeAll(“p”)were called on it, its queue would be modified to contain the following tags. Notice that all <p> and </p> tags have been removed: 高级编程代写

front [<!doctype>, <!-- -->, <html>, <head>, <title>,

</title>, <meta>, <link>, </head>, <body>, <a>, </a>,

<img>, </body>, </html>] back

If the element passed does not exactly match any tags (such as an empty string), your queue should not be modified. You may not use any auxiliary collections such as extra stacks or queues, though you can create simple variables.

If the element passed is null, you should throw an 高级编程代写

IllegalArgumentException.

public void validate() In this method you should print an indented text representation of the HTML tags in your queue. Each tag displays on its own line. Every opening tag that requires a closing tag increases the level of indentation of following tags by four spaces until its closing tag is reached. The output for the HTML file on the first page would be:

高级编程代写
高级编程代写

To generate the output for this method, analyze your queue of tags with a Stack.

The other class to write is HtmlTag.java (objects that represent HTML tags for you to process). An HtmlTag object corresponds to an HTML tag such as <p> or </table>. You don’t ever need to construct HtmlTag objects in your code, but you will process them from your queue. Each object has the following methods:

Error Handling 高级编程代写

Your validate method should print error messages if you encounter either of the following conditions in the HTML:

  • A closing tag that does not match the most recently opened tag (or if there are no open tags at that point).
  • Reaching the end of the HTML input with any tags still open that were not properly

For example, the following HTML is valid:

<p><b>bold text <i>bold and italic text</i> just bold again</b> <br/> more </p>

But the following HTML is not valid, because the </b> appears before the

</i>:

<p><b> bold text <i>bold and italic text</b> just italic</i> neither</p>

The following HTML is also not valid, because the <html> tag is never closed:

<html><body> <b><i>bold italic</i></b> normal text</body>

Suppose the previous short HTML file were modified to add several errors, as follows: an added unwanted </!doctype> tag, a deleted </title> tag, an added second </head> tag, and a deleted </body> tag:

高级编程代写
高级编程代写

The resulting output for this invalid file should be the following: 高级编程代写

The reason that there are two error messages for </head> are because neither </head> tag seen matches the most recently opened tag at the time, which is <title>. The four unclosed tags at the end represent the fact that those four tags didn’t have a closing tag in the right place (or, in some cases, no closing tag at all).

Because of the simplicity of our algorithm, a single mistake in the HTML can result in multiple error messages. Near the end of the file is a </html> tag, but this is not expected because body, title, and head were never closed. So the algorithm prints many errors, such as saying that the html tag is unclosed, though the underlying problem is that the body tag was never closed. Also notice that an unexpected closing tag does not change the indentation level of the output. 高级编程代写

Your revised validation algorithm: Examine each tag from the queue, and if it is an opening tag that requires a closing tag, push it onto a stack. If it is a closing tag, compare it to the tag on top of the stack. If the two tags match, pop the top tag of the stack. If they don’t match, it is an error. Any tags remaining on the stack at the end are errors.

Provided File

ValidatorMain.java: A testing program to run your HtmlValidator code and display the output.

 

Test Case (mytest.html) 高级编程代写

In addition to HtmlValidator.java, you will also create a test case of your own that helps verify your validator. Create a file mytest.html with any contents you like, so long as it has at least 10 total HTML tags. This can be a file you create from scratch, or you can go to an existing web page and save its contents to a file. The purpose of this is to make you think about testing.

Submission

Your Java source code should be submitted via Latte in a form of Eclipse export. Remember to include Javadocs in your export, name your eclipse project and the archive firstname_lastnamePA6. For due date and late policy check the syllabus.