03-02: HTML: Machine-readable Text

Matt Price

HTML

  • Hyper
  • text
  • Markup
  • Language
  • structures web pages & their ocntent

Tag Structure

<tag att1="value1" att2="value2">content</tag>
<p class="normal" align="right">Paragraph Content</p>
<a href="http://www.google.com">Link to google</a>
<img src="http://1.bp.blogspot.com/-CzqzzBV2tMk/TxBM3ar18MI/AAAAAAAAPm0/6faLPO9BM8w/s1600/i-can-has-cheezburger.jpg" title="I can Haz Cheezburger?" alt="greedy cat saying 'I can haz cheezburger?'" />
  • tags:
    • tag identifier
    • attributes
    • content

content

Paragraph Content

Link to google greedy cat saying 'I can haz cheezburger?'

HTML at work, and some consequences

This is the top-level heading

This is a paragraph. It can contain further markup and also more complex content.

On the web, text is "Marked up"

<h1>This is a top-level heading</h1>
<p>
  This is a paragraph. It can contain <i>further markup</i> and also 
  <a href="http:/some.where.com">more complex content</a>.
</p>
<aside>
  Sometimes you'll see <i>semantic</i> tags, like "aside",
  "header", "footer", "article", or "section".  
</aside>

  • Programs can scan this text, interpret it…
  • then treat it as data which can be combined, analyzed, etc.
  • point of learning HTML is
  • Understand how to achieve a certain "look"
  • Understand how a complex computer algorithm might treat it as "data".

A Few HTML tags/elements you should know

Page Structure (block-level elements):

<html></html>
Opens/closes every page
<head></head> and <body></body>
two main sections for metadata and display
<div></div>
often-invisible tag that divides page into "divisions"
<section></section>, <article></article>, <header></header> <footer></footer>
also invisible-by-default "semantic" tags that create divisions in page
<p></p>
basic paragraph unit
<blockquote></blockquote>
semantic tag distinguishing quoted text
<table>, <tr>, <th>, <td>
building tables (don't overuse!)

Found inside structural elements (inline elements):

<a href="http://link.address"></a>
The essential hyperlink tag that makes the web what it is
<img src="http://file.location" alt="text to display for non-visual browsers/viewers"/>
"self-closing" image display tag
<em></em> <strong></strong>, <i></i> <b></b>
emphasized and strong text
<ol>, <ul>, <li>
building "ordered" and "unordered" lists

HTML in this class

  • learn to recognize generic tag structure
  • work with some of the most common tags
  • learn how to learn more

Take a break from lectures now and find a place to edit some code.

The Mozilla Developer Network Introduction to HTML "active learning" modules are one pretty good place to start.

However, I mostly recommend forking & working with the Habermas HTML repo I made special for this class:

https://github.com/DigitalHistory/HabermasCode