Saturday, March 12, 2011

Web Site Languages Part I

Languages Utilised in Internet Technologies

This is part of a larger lecture-blogg series which is aimed at marketing students at the University of Strathclyde, and non IT managers. In this lecture, and the following, we introduce some more terminology and a behind the scenes look at how modern web sites are programmed and how they interact with you the "client".

Introduction:

How and Where Does All This Fancy Stuff Get Done?

We have considered the simple protocol languages which enable communication to be ordered and policed on the physical internet. Once connected to a server, in order to present and interact with web sites, there are computational tasks mediated both by programmes on the web server itself, like Apache and PHP, and on your own home computer, mediated via your web browser programme.

Both "server side" and "client side" language-engines rely on reading instructions ( code) from files which are either downloaded or refered to, called upon if you like, to perform tasks in presenting the web pages, shutteling information and performing calculations.

For many programmers, the goal is to have most of the work done server side, such that communication is quick to and from the client ( ie you and your IE Explorer/ FireFox). This approach is known as the "thin client" and has become popular with the growth of mobile phones being used on variable connections. This approach limits the number of http requests and the size of code in each download to the client machine, thus reducing front end server load and outward bandwidth requirements.

The downside of "thin client" is that with more complex web sites and interactions, there is a heavy computational load on the part of the server structure, usually a second layer of computers, which are then required to work more processes. This requires more thought in "vertical scaling", a discussion taken up in another lecture-blogg. Also, although the processing reduces code download, it may in some cases increase the amount of raw data because little can be "expanded upon" on the client end.

However, to provide the most rapid web experience, a compromise is often reached for the more data-rich web sites, like Google maps, whereby the local client runs an API ( see below) and requests packets of information without needing to reload the entire web page. ( more detail on this in the second lecture in this series). Many mobile phones have specific "apps" for web mediated services like mail and social media sites, which offer a slightly thicker client at a single download, but are better optmised for speed and screen presentation for the small screen environment.

In some more technical depth:
In essence most languages used in building and running web sites, which run either client- or server- side are actually instructing a mother programme to perform computational tasks: that is to say the language coding itself is not interpreted or compiled directly.

Exceptions to this would perhaps include running Java modules on your local 'client' PC, some PERL script used in communications, or using "C" and other interpreted/autocompiled languages running server side from instructions sent forward from other languages like XHTML, PHP or Javascript.

At the client end, ie you, the mother programme is most often the internet browser ( apart from Java, which is presented through the browser window though) , and some of the updates you will receive to browsers are important because they allow for the latest advances in the languages to operate on your machine. For example the "javascript engine".

Server side, it is really dependent on what you install: php , cfm etc all come with an installation disc and these "interpreter" programmes are really very large relative to say the javscript element in a browser.

HTML


HyperText Mark-up Language is the standard language type for web sites, but it does vary and evolve so that the earlier web browsers like IE version 5 for example, do not work with more modern web sites. Also some browsers do not automatically fix bugs in the HTML code.

In essence, HTML does as it says, it allows for text to be formatted in a simple, high level ( ie "near english instructions" in the code) language which is easy to learn for anyone slightly interested in computers with the motivation to make web sites! Apart from formating text into headings, fonts, paragraphs etc, the code "coralls" the structure of the web page and how it expands etc.

HTML only web sites are very outdated today, but the language is still at the core of structuring web pages and allowing for communicaiont. Indeed in its later forms, it coralls both this stucture, communication to-and-from the browser, and also the inclusion of elements which utilise other languages or call upon other data sources.

It is VERY worthwhile as a marketing student to take a course on HTML and learn how to read the "anatomy" of web pages using tools like Firebug. Even if you never actually build an entire web site yourself, it will allow you to edit things like links or titles quickly or paste up emergency notices, or take out elements immediately for legal reasons. We go into little detail here because it is a subject in its own right.

So HTML still forms, for now, the core of most all web sites, but other elements mediated by other coding in the languages below, and the use of API elements, are reducing the volume of HTML code to be found in a modern web site. Hence we take up the thread with the most closely related language to HTML, XML:

XML

Historically, XML's roots predate HTML in it being related to a form of sharing ASCII/ Unicode based documents across formats. This goes back to the 1970s, and the forerunner "standard-general text Mark up Language" : SGML. XML takes this cross platform, simple form of sharable text mark up to a form which is inclusive of many desirable features and flexibilites which now mean it is used for both static text and very dynamic data-shutteling.

XML is very like HTML: it is near to being easily interpreted in english, ie high level, and in that it marks up text with mark up tag and then allows for content to be basic Unicode text. There are elementary mark ups as in HTML which are simple and allow for documents to be parsed into other web sites and presented in different formats based on the key heading and paragraph strcuture. In fact XML is to a large extent, a simplified text handling approach than HTML, or cleaned up because HTML has evolved to have many other functions while its text functions have become somewhat limited.

Features of XML

Mark Up : Text Document Standard, Shareable Formatting:


Presentation: Integrates / interoperates across platforms and repurposed documents like pdf output, or .doc/.xdoc. Apart from text, there are also there are some vector based graphic elements like simple flash which can be useful.


Exchange: used to shuttle data in HTML, javascript/JSON/AJax and also between non internet browser systems: like from different ERP or legacy systems, or from web site orders over to ERP systems by an indirect system ( not a true "Back End" database set up )

Programmability : Eg X-Query ( SQL liknende for XML) , X-Path routing/indexing tools to search data/operate data in the larger XML file(s) Simple API Xml integration SAX.
XML celebrated 10 years in 2009, and has come into extrememly frequent use in web sites and shared news or document sources. XML documents.


XML has become "interoperable": because of its simplicity, other programmes apart from HTML engines (now XHTML) in browsers can integrate quickly to the language and extract information from XML files readily. In terms of the language, it is 'neutral' in that data can be exchanged between languages and operating systems / platforms. Some XML formats are very data centric: The benefit over 'flat' csv/psv/tab.sv files is that the information is "marked up" with syntax/formatting tags or other tags, and also custom field name-tags eg"
........" can be applied to text strings, thus rendering XML a database resource.


One benefit is that the structure of files is easily readable, in fact a pure XML file is usually easier to understand than a corresponding HTML. Also, older ASCII text sources or scan-read sources can be readily parsed into XML and then it is easily shared to different web sites with their own formats.

Mark up tags can be programmer definable, which is one area of true advanced capability with XML. The second area, often lumped into the umbrella term "XML" is using APIs and server side app'lets to access the data held in XML repositories: often this is like a simplified SQL.


The evolution of XML continued until a point when the drawbacks of all the embedded text in HTML became so large that the simpler root to text data was attractive enough. In other words, the "atmosphere" around text embedded was suffocating the flow of information on the internet and necessitating re-purposing of text between web sites and systems.


With the widespread use of Cascading Style Sheets, XML became even more attractive, as the one source of information can be repurposed to different styles automatically or in a browser type/ version dependent way after sniffing the browser.

Yahoo YUI has good javascript applets, like a pop out calendar.

Another benefit of using XML files and information sources is in serving the same web site to different devices or bandwidths: so for example it is still the case that javascript on many mobile devices ( phones) is either very limited or not there: therefore you can have a simpler web site with the same source in XML for text information: the same is true in optimising screen layout for the window in iPhone, Windows Mobile or Win CE

What this all means is that text documents can be published once in simple XML and then accessed from all over the internet and republished in the format of the web site requesting the XML from the link or serveer side source file. When using APIs, this means that the document content can be accessed and presented in more dynamic ways by using small programmes rather than having to place the XML document or text in a back-end server-database system with perhaps three layers of access: apache web server; MY SQL interface ; Routing interface and data repository. Also this allows the programmer to develop very defined small client side javascrip applets which will have very specific, targeted functionality: for example, find the latest news items relating to the US Senate from three XML sources ( RSS is an XML subscription app')

XML is fast and simple for text based dynamic web sites with several internal and external sources of updated text.

Also another very common use of XML today is in form submission and handeling of orders on e-commerce sites, especially when there are "Under vendors" ie suppliers external to the main web site. Using XML the information can be passed in a simple and cross platform message, compiled during the session. Here XML is an intermediate, common format which allows for specific tags useful eg ...

In fact any UID database which can be output toa tab/pipe seperated file can be parsed into XML and reverse parsers for ORACLE, SQL Server etc exist: field names move into the tags
for example and then the data can be imported intelligently to many different systems, with manual reference made somewhat idiotproof by the use of english and a simple mark up coding.

It need not be documents, any ASCII Unicode file can be used, so for example a tab or Pipe separated database could be encoded and referred to by simple searches and operations server side, with the form going as XML.

One powerful capability with XML is being able to link different sources of data with a unique ID : for example you can combine some graphic like maps with post codes or coordinates (longditude/latitude) , with a text source like population. This is used in the tiles for google map and satellite images for example, and to link the simple geotags for the icons which appear with a pop out text box. This is often done in APIs using the standart API Xml module SAX.
Structure in XML files: XML files should take the schmatic that the structure of the text is like a tree: there is a sequence of nesting data within its tags, or branching out if you like.


So these form a simpler version of tables in a database, and you should consider structing different XML files with common, strict UIDs so as to be tidy and offer more functionality, while keeping individual file size small and closely clustered in relevance of data. XPATH helps navigate this efficiently. So XPATH may be used to create an efficient, exclusive search which goes to
and returns for example.

You can ad alot of tag attributes in a text document you want to be highly structured/relational/ nested in searching, to give exclusive search results or actually just serve up the relevant information in terms of pages, paragraphs or specific hit count with links to the lines of text ( the actual scentences) : so you tag up "US presidents" beside each relevant name and maybe add a period-of-office date range whereby you can find each president, who was when, and which of them reigned over a period you are interested in, without knowing the term of office dates themeselves.

ISO8601 is an ISO standard on date-formats contained within data fields, which is adopted into XML easily and you can find it easily. The same is true for long-latitude Geotags.

RSS ( really simple subscription)
RSS is a subscription news feed service which works through XML and both allows you to get updates on news or alerts, view them in any browser and then also re-publish the xml: so the example on the Harvard E75 course is using a javascript interface to show both google maps and "geo tagged" news feeds as links in pop up bubbles upon rolling over town locations. RSS can also be used for other not text based updates or as a quick way to move (push!) small amounts of information to a web site a user is subscribed on: so for example Pod casts which should not really be included in RSS by the spirit of the original convention, ie this pushes a file and not jsut a simple news item.

Podcasts on iTunes/ other clients are actually based on RSS with a media file linked in.
A web source will publish RSS to a URL and your insertion to your web page or server side parsing.

Now RSS is useful too for capturing information form syndicated/external sources because it is so simple and allows you to parse, then search, store, reformat and summarise/ abbreviate information from a web site which otherwise is not doing XML code you can integrate to ie parse in the whole page and dig out the XML marked up text you want, called actually "page scraping".




XHTML : this is the most XML friendly version of HTML, which has some other syntax and case sensitivity to XML tags, but all later version browsers run it because it is XML optimised. XHTML also allows for otherwise illegal characters,

XPATH
is a sub-language which allows for SQL style querying of such data. Thus you can have either XML on the page, or hidden, as either permanent dataset or something dynamic, even transient, and query it without the need for a complex back end database interaction at this point.

AJAX was a synthesis of javascript and xml, which allowed for client side javascripting to request required data as simple XML /XHMTL, rather than having that intelligent computation on client behaviour, happening by http to the server and back. The best known example of this is google maps, which was the first to run a credible javascript scrolling system for showing graphic and bitmap link. The javascript detects the scrolling locally and asks only for the packets of information you require, and incorporates them without any more server intensive processes needed. The term "ajax" is now used to include XHTML and JSON as well, we will return to this in the next lecture.

API: application programming interface:
a function, or a little premade applications, which is available for your to call up and utilise in your web site. This differs somewhat from a "web service" which is a web server which performs a service for you on their processing time: eg tweet decks, so you use a GET$ URL perhaps to send in the data or request for computation to happen there. An API works in javascript on the client side, and these are shared as programmes to allow web sites to propagate their service externally while most likely holding the upidated data and say related push advertising from their source: eg Google maps, Twitter, Facebook linking/write to FB. We will also return to APIs, like google maps in the next blogg.

User definable tags means that whole new mark up languages can be made within XML: for example DNA sequence mark up language.


The next lecture will focus on the leading dynamic languages, starting with PHP and moving on to the most modern, Ajax/Json which keep Facebook and Google Maps hyper-dynamic!

No comments:

Post a Comment