Thursday, 11 April 2013

Reflection 10 : Forensic Linguistics


Assalamualaikum w.b.t
Today we learn about Forensic Linguistics which is very interesting and fascinating. People might think of the famous CSI television series when we say the word “forensic”. But this Forensic Linguistics is a branch of Applied Linguistics. Forensic Linguistics takes linguistics knowledge and methods and applies them to the forensic context of law, investigation, trial, and punishment. There are three main areas of application for linguists working in forensic context which are understanding language of the written law, understanding language use in forensic and legal process, as well as the condition of linguistics evidence.

Forensic Linguists involve in areas related to crime, crime solving, and assisting wrongly accused people. Some of these areas include voice identification, author identification, and discourse analysis. Few examples of corpora available in Forensic Linguistics are ransom notes, threatening letters, suicide notes, and examination fraud. While the applications of Forensic Linguistics in language research are author or speaker identification, intertextuality, text typing and linguistic profiling.

Reflection 9 : Computational Stylistics


Assalamualaikum w.b.t
          This time we learned about Computational Stylistics. In pure Computational Stylistics, computers are used to study the stylistic characteristics of particular texts, authors, genres and periods. For example, Raben & Lieberman (1976) used automatically produced indexes to study vocabulary similarity in Milton’s Paradise Lost and Shelley’s Prometheus Unbound. While Burton used a concordancer to compare Anthony & Cleopatra and Richard II.
           Computational Stylistics  is a sub-discipline of computational linguistics. It evolved in the 1960s, in the area of “stylometry,” where the computer is used to generate data on the types, number and length of words and sentences. However, there are risks faced by this application where it forecloses the possibility of an author changing his style from text to text and there is a possibility of two authors writing alike.
          Few of the fields in Computational Stylistics are machine translation, social sciences and humanities, and literary fields; play, poems, novel, short stories and many others. Scope of Computational Stylistics are to count the frequency of common words, and rare words, to detect writing style, producing distinct and unmistakable “literary fingerprint” that can be used to determine if and when there have been collaborations with other text, detection of idiosyncratic uses of language which distinguish one author from another, determining the sentiment of a text, analyzing variation in rhetorical style among scientific articles and few others.
          An example of corpus in Computational Stylistics is anything that are related to literary works that will be chosen, for example; Shakespeare (Romeo and Juliet) and Emily Dickinson’s poems.
          We can also use this application of Computational Stylistics to analyse the stylistics of arabic poems which majorly contribute to da'wah and one of the ways to spread Islamic teachings.

Tuesday, 9 April 2013

Reflection 8 : Lexicography

Assalamualaikum w.b.t
Today we came across a somewhat familiar, but new term in class; lexicography. Lexicography is basically the layers of dictionary production where the editing, compiling, writing or making of a dictionary take place. We were also being introduced to someone namely Professor Kev Nair, who is being regarded as the father of fluency lexicography. Fluency lexicography came into existence as a separate branch of dictionary writing. Interestingly, lexicography is not merely restricted and focusing on English language only, but also other languages like Arabic Lexicography and German Lexicography.
A linguist whose specific expertise is in writing dictionaries is called Lexicographer. A lexicographer is concerned with what words are, what they mean, how the vocabulary of a language is structured, how speakers of the language use and understand the words, how the words evolved and what relationships exists between words.
There are two related disciplines in lexicography which are practical lexicography and theoretical lexicography. Practical lexicography is the art or craft of compiling, writing and editing dictionaries. Practical lexicographic work involves several activities because the compilation of well crafted dictionaries requires careful consideration of few difficult steps such as shaping the intended users, selecting and organizing the components of the dictionary, selecting words and affixes for systemization as entries, selecting collocations, phrases and examples, defining the words, organizing definitions, specifying pronunciations of words, labeling definitions and pronunciations for register and dialect where it is appropriate, as well as designing the best way in which users can access the data in printed and electronic dictionaries.
The other scope is theoretical lexicography. It is basically the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon or vocabulary of a language. Theoretical lexicography is also related to the idea of developing theories of dictionary components and structures linking the data in dictionaries. It is sometimes being referred to as metalexicography. It concerns the same aspects as lexicography but is meant to lead to the development of principles that can improve the quality of future dictionaries. There are several branches of academic dictionary research such as dictionary criticism which evaluates the quality of one or more dictionaries, dictionary history that involves tracing the traditions of a type of dictionary in a particular country or language, dictionary typology which deals with classifying the various genres of reference works like monolingual versus bilingual dictionary, dictionary structure which involves formatting the various ways in which the information is presented in a dictionary,  and not to forget the branch of dictionary use where observing the reference acts and skills of dictionary users are required. Lastly, the branch of dictionary IT which involves applying computer aids to the process of dictionary compilation. The words in any dictionary compilation were decided upon few main points which are how current they are, reliable, user friendly, more information, and relevancy of the words.

Reflection 7 : Concordancer


Assalamualaikum w.b.t
At the end of learning Corpus Linguistics the last time, we were briefly introduced to concordancer. In this entry, we will further discuss the topic. Concordance is a collection of the occurences of word-form or an index of word-form. Concordancer is the software that analyzes the occurances of collections of the word-form. Be mindful that concordancer does not translate, but to analyze.
What concordancer does?
·         Make wordlists. Can be arranged.
·         Can include frequency and percentage of words.
·         Make wordlists of occurrence of each word in its context.  Contexts can be selected and arranged.
·         Can handle large texts.
·         Can save and print the selected wordlist
Concordancer is widely used in language teaching and learning as well as data mining and data clean-ups. There are many other places where concordancer is helpful such as literary and linguistics scholarship. Some Major Concordance Program for PC are the Oxford Concordance Program (OCP) and Word Cruncher. They are widely used, reliable, flexible and straightforward.

Reflection 6 : Corpus Linguistics



Assalamualaikum w.b.t
Today we learn about Corpus Linguistics. Such an interesting term, isn’t it? Corpus Linguistics is the study of language as expressed in samples or in this case is known as corpora or ‘real world’ text. It is an approach to derive at a set of abstract rules by which a natural language is governed or relates to another language. It was originally done by hand, but corpora are now largely derived by an automated process.
The word ‘corpus’ is derived from the Latin word, meaning ‘body’. It may be used to refer to any text in written or spoken form. In modern Linguistics, this term is used to refer to large collections of texts which represent a sample of a particular variety or use of languages that are presented in machine readable form. Scope of studies in corpus lingiustics related to the possible words, structures or uses in a language, their probable occurrence in a language, as well as the description and explanation of the nature, structure and use of language with particular matters such as language acquisition, variation and change.
There are few types of Corpora available nowadays including written or spoken (transcribed) language, modern or old texts, texts from one language or several languages, texts from whole books, even in newspapers, journals, speeches, and extracts of varying length. Corpus Linguistics is now seen as the study of linguistics phenomena through the large collections of machine-readable texts, corpora. These are used within a number of research areas going from the Descriptive Study of the Syntax of a Language to Language Learning. The availability of corpora which are so similar in structure is a valuable resourse for researchers interested in comparing different language varieties. Interestingly, there is also Quranic Corpus. We Muslims can surely benefit from this insightful thing by attending it in a profound manner. 
As we are learning Computer Assisted Language Learning, of course the role of computers in Corpus Linguistics is essential. Among the role of computers in Corpus Linguistics are to store huge amount of text, quickly retrieve huge amounts of texts, retrieve words, phrases or whole texts in context, sort out linguistic items, increase reliability in searching, counting and sorting linguistic items, as well as provide accurate probability of occurrence of specific linguistic items. 
Some of the Corpus-Related Researches are Computational Linguistics, Historical Linguistics, Lexicography, Machine Translation, Natural Language Processing (NLP), Social Psychology, Sociolinguistics, Stylistics, Computational Linguistics, and many more interesting branches of study.
Later, we learn about something called Concordancer. It is an example of software used for corpus linguistics. Madam Rozina showed us few examples of concordance programs and showed some simple demonstrations on how to use it. Using concordancer, we can do amazing thing such as find out how many times the word ‘Muhammad’ or ‘Islam’ appears in the Quran. We are so thrilled to use the software in the class and search on our own names!

Graded Assignment 2 : Analysis on Idiolect of Chatroom Conversationalist


 




ENGL 4740 Computer assisted language studies : WRITTEN ASSIGNMENT
ANALYSIS ON IDIOLECT
BASED ON REAL TIME CHATROOM CONVERSATIONS




Prepared by:

Asma’ binti Ab.Razak 0921578
Aimi Farhein bt Ramli 0921742
Julaiha Shazmira bt Malizan 0923078
Section 2  

Dr. Rozina bt Abdul Ghani

INTRODUCTION
Computer-mediated communication, or best known as CMC is a communication that happen through the usage of two or more computers, or devices that are connected to the Internet or network, such as instant messaging, electronic mail (email), chat rooms and text messaging. As the technologies are getting more and more advanced and sophisticated, the communication varies thus encouraged the software engineers to create and invent software that can support the needs of the communication.
There are two types of CMC, synchronous and asynchronous modes. In synchronous communications all participants are online at the same time (e.g. chat room), while asynchronous communications occurs with time constraints (e.g. email). People choose synchronous communication for a fast responses and short messages while asynchronous communication like email for delayed, controlled and longer messages.
Chat room, being the synchronous communication, is chosen by many people as it provided fast responses and people from all over the world can easily access and be connected to other people without having to disclose much information about them. However, based on the language used, the place of origin of someone can be easily guessed. It is better known as the idiolect. A person has a different idiolect than the others. Idiolect acted like a language print where a person has a certain pattern in their speech and writing, thus indirectly provided information of their background.
This paper aimed to analyze the idiolect of the chatters in chat room and their language usage in terms of vocabulary and grammar structures.

OBJECTIVES

1.      To analyze particular idiolect belongs to particular language users
2.      To realize the diverse use of language in different countries
3.      To identify the choice of words and grammar structure in the conversations recorded

LITERATURE REVIEW
         Human and language are interrelated, very close to each other. People use language as to communicate within one place to another, one setting to another and one person to another. Different people possess different idiolect. The idiolectis based on their own preferences shaped by their speaking environment.
In the case of idiolect, people usually are either happy with their idiolect or they can be ashamed of it. According to Ridhwan (2011) as he quoted The Daily Mail, he said that children will try to adapt linguistically when in different environments. He also describe that the upper class’ dialect to have a more ‘grizzly’ base – lower frequency. This may be seen as a lack of confidence in the idiolect and accent. He noted that one does not speak very comfortably with people of higher intelligence or authority as he or she is always conscious of his or her idiolect and sociolect. Hence, one usually aware of the way he performs his idiolect.
The other matter in Ridhwan (2011) research stated that some people were encouraged to use of their body language, face gestures or paralinguistic language as they believed that it shows interest or they are interested whereas the use of phatic language did not leave one to an open conversation. This situation gives the idea that gestures play important role in language and communication. However, gestures cannot be identified in chatroom conversations. Thus emoticons play an important role in portraying the electronic gestures for the participants of the chat rooms.     
Panicheva (2009) made questionnaire and distributed them among several people that she regularly talk with. After those several people completed the questionnaire bases on her idiolect, she found that people can interpret her speech differently to how she thinks it sounds. This shows how idiolect gives different output to different person, particularly different to the speakers and specifically different to the listeners.
The use of idiolect can illustrate one’s place of origin, influences, and sometimes interest. However the conclusion made is under a very small sampling used in completing this research paper.

METHODOLOGY
The excerpt that we analyzed was taken from wireclub.com where the chatters usually discuss about music and fashion. They exchanged information about their favorite singers and their favorite songs. There are 22 participants in total with 15 male participants, four female participants, and 3 other participants with unknown gender. The dominant origin of the chatters is United States. There are some participants that chat from Canada, India, Ireland, Australia, Afghanistan and United Kingdom.
The excerpt taken was analyzed in terms of the vocabularies, specifically the choice of words chosen by the participants and any abbreviation that occurs in the conversations. The other component analyzed is the sentence structures in the conversations, or the grammar, which includes omission of the verb, addition of particle and fragments or incomplete sentence.
Below is the table showing the details of the participants in the chat room and their place of origin;
Origin
ID
Total Number
Canada

Spitfireace
ItchyMacDoogle
VerbingTheNoun
3
United States
Brandonicusasaurus
still-frame
DEEP dysporia
yeaginator
Abigailvongoat
5
Ireland
Dodgeviper2010
GaZzZz27
2
India
Yeti finder
arun-1234
Lovekush 1989
3
Afghanistan
Mobareez
1
Australia
michaelemphaty
Don_Quixote-1
2
United Kingdom
kingseamonkey
1
South Africa
Princess storm
1
Netherlands
Just_pinkfloyd
1
No Data
Greeneyedblond20003
baby-fetishm
beatlesarecool2010
3

TABLE 1



ID and Origin
Vocabulary
Grammar
Choice of words
Abbreviation
Omission of verb
Adding of particle
Fragment
(incomplete sentence)
Princess Storm (SA)
Simple
/
-
-
/
Spitfireace (Canada)
Simple
/
-
/
-
Greeneyedblond20003
Simple
/
/
-
-
Babyfetishm
Simple
/
-
-
-
Mobareez (Afghanistan)
Simple
/
-
-
-
Arun_1234 (india)
Simple
-
-
-
-
Beatlesarecool2010
Simple
-
-
-
-
Brandonicusasaurus (US)
Simple
/
-
-
-
Lovekush1989 (india)
Simple
/
/
/
-
Still-frame (US)
Simple
-
-
-
-
itchyMacDoogle (Canada)
Simple
-
-
-
-
DodgeViper2010 (Ireland)
Simple
/
-
-
-
VerbingTheNoun (Canada)
Simple
-
-
-
-
GaZzZz27 (Ireland)
Simple
-
-
-
-
Just_pinkfloyd
Simple
-
-
-
-

TABLE 2


FINDINGS
Table 2 Analysis
Looking from the table 1, most of the chatters use abbreviation.  The reason being is firstly for chatting purpose, they prefer shorter words that can be speedily typed. There is specific choice of words used by certain people from certain countries.
DodgeViper2010: At the moment I'm listening to a candian artist called Grimes
ItchyMacDoogle: frank?


In this example, the chatter namely ItchyMacDoogle is from Canada. He used the word ‘frank’ to ask for clarification from one of the chatters. Instead of using ‘seriously’, or ‘are you serious’ which are commonly used, he prefer the work ‘frank’ which can be widely use in Canada. There is another example showing the use of words and abbreviation by the chatters in the chat room.
spitfireace : ya'll survivalists?
greeneyedblond20003 : u talkin bout that looney tune in NK

           
The chatters in the example above use abbreviation for the word ‘you all’ which turns out to be ‘ya’ll’ and the word ‘talking’ was shortened into ‘talkin’. The choice of using ‘talkin’ instead of ‘talking’ might somehow show the origin of greeneyedblond20003 or his way of talking. Not to fall into generalization, but this chatter can be an African American or people from North America from the way he was chatting.
            The choice of words by the Greeneyedblond20003, ‘looney tune’ can be referred to someone who is not right in mind, but he chose a more polite word rather than directly using the word crazy. This situation shows how he is concern with the situation that he will be facing by directly pointing the word crazy to some people that he did not even know.
princess storm: so maybe I should change my profile pic
lovekush1989: u from south Africa

           
In this example, lovekush1989 is from India. He is omitting the verb ‘are’ when he is making clarification whether Princess Storm comes from South Africa or not. English is India’s official language, but lovekush1989 does not use correct grammatical structure. This shows lovekush1989’s use of English language might be short compare to his usage of Hindustan in his daily conversation.

ID and Origin
VOCABULARY
GRAMMAR
Choice of word
abbreviation
Omission
of the verb
Addition
of particle
Fragments
(incomplete sentences)
DEEP dysporia
United States
Simple
-
-
/
-
michaelempathy
Australia
Simple
/
-
-
-
abigailvongoat
United States
Simple
-
-
-
-
Yeti Finder
India
Simple
/
-
-
-
CopyyyCattt
No data
Simple
-
-
-
-
yeaginator
United States
Simple
-
-
-
-
kingseamonkey
United Kingdom
Simple
-
-
-
-

TABLE 3

Table 3 Analysis
Even though most of the chatters learned English as their mother tongue, they are still cannot prevent themselves from making mistakes. For example,
michaelempathy: theres going to be a world wide shortage of chocolate in the next 2 years
Yeti Finder: gm deep how are u good to see u


The word “theres” in the phrase should be written as “there’s”. However, the chatter omitted the (‘) mark, which means that he is not abided to the punctuation mark. Also, the “world wide” should be “worldwide”. But the sentence produced is a complete sentence. So does the chatter called “yeti finder”. He used an abbreviation for good morning (gm). And the sentence does not have the correct punctuation mark. The sentence is supposed to be,
Good morning Deep. How are you? Good to see you.


In addition, the chatters seemed to make up their own rule, where they used extra alphabet to highlight their message. Consider the following sentence, the chatter added suffix ‘-es” to the word “no”, to highlight that she did not want the chocolate to be out of the stock in the next two years.
DEEP dysporia: ooh, noes

            There is a chatter called “abigailvongoat” who constantly produced complete sentences instead of just fragments.
abigailvongoat: I will perish
abigailvongoat: I'll just binge on my stocks...
abigailvongoat: well at least you have something to look forward to!
abigailvongoat: hmm why isn't this chat room discussing Plato's Republic or something?
abigailvongoat: heh, it's been a long while for me


The sentences produced are complete sentences. Most of the times she even used the correct punctuation marks, thus the process of delivering the meaning to the other chatters happened faster than the others who do not use the correct punctuation marks.
Generally, the word choices of the chatters are simple and easily understood by the others. Considering that they are discussing about music, the sentences are short and complete, and depicted the colloquial style of English is being used. What is being written on the chat room page resembles the style of verbal communication by the people.

ID & origin
VOCABULARY
GRAMMAR
Choice of word
Abbreviation
Omission of the verb
Addition of particle
Fragments
(incomplete sentences)
VerbingTheNoun.
Calgary, Alberta, Canada. Female
/
-
-
-
-
DodgeViper2010,Cork, Cork, Ireland. Male
/
-
-
/
-ish
/
still-frame. Tucson, Arizona, United States. Male
/
-
-
-
-
Spitfireace Vancouver, British Columbia, Canada. Male
/
-
/
-
-
just_pinkfloyd Schiedam, Zuid-Holland, Netherlands. Female
/
-
/
Whodere?
Still there?
-
/
ItchyMacDoogle Halifax, Nova Scotia, Canada. Female
/
-
-
-
/
Pineapple Lumps Unknown details
/
-
-
-
/

TABLE 4

Table 4 shows another set of analysis involving a few new chatters who joined the chat room later. The conversation revolves around their preferences in music and they  noticed that among the users of the chat room, some have the same music flavour and so they decided to be friends rather than just one-time chat buddy.
User still-frame, from Arizona, United States shows a tendency to use ‘–ah’ as a filler in the conversation.  Aside from using filler, this particular user also uses complete albeit informal sentences and he uses the correct punctuation mark when typing.
still-frame.: Ah, excellent.
still-frame.: night itchy!
still-frame.: itchy, verb, is it cool if I add you guys?


Another user, just_pinkfloyd, from Holland, Netherlands shows very little of her idiolect inclination as she only participate once in the conversation in page 5 and 5 of the data. In it, she does not use a sentence but stated only her facts. Her point is set across in correct punctuation. This can be seen in the table below.
just_pinkfloyd: Marillion - Script For A Jester's Tear


From the example, it is evident that she applies the right punctuation marks in delivering her message. She capitalizes each and every letters in her sentence indicating that it is a title of a song. It cannot be generalized that people from Holland most likely use perfect grammar, but it this case, just_pinkfloyd is very comfortable using correct grammar showing that she must somehow frequently speaks with grammatically correct sentences with the people around her to be able to be comfortable with it.

CONCLUSION

The analysis of the users of Wireclub chat room shows a variety of styles in communicating. Despite using English as a medium of communication, different users employ different ways to deliver their message. These influences can be considered as a part of idiolectal grammar. As the conversation happened in a non-formal manner, most users are more inclined to use colloquial English in their conversation. For example, the use of ‘hey’ as a greeting instead of ‘hi’.  Also, instead of typing in full sentence, users either use abbreviation of write only the fact without further elaboration as portrayed by the user from The Netherlands. Since this study is done using a sampling method with very minute data, the results cannot be overgeneralized to representing the whole community of their origin but it is safe to conclude that through their preferred choice of words, their grammatical structures and vocabulary aspects distinguishes their idiolects from one to another. CMC makes it possible for linguists to conduct studies on this matter by way of making data collection and analysis easier and systematic.