基于语料库的雅思写作词汇分析

 2022-02-20 07:02

论文总字数:52903字

摘 要

雅思考试日渐成为备受瞩目的一项国际化的考试,它是中国学生走向世界的一道门槛。但是我国考生的雅思写作成绩并不令人乐观,这反映出我们对于雅思写作的认识不足,所以要想更好地进行雅思教学,提高中国考生的雅思平均成绩,我们需要更加深入和透彻地了解雅思写作,从语料库角度分析雅思写作词汇特点,可以帮助我们完成上述目标。

词汇特点的三个观测点是词汇密度(lexical density, LD)、词汇多样性(lexical variation)和词汇复杂性(lexical sophistication)。词汇密度主要测量考生笔语产出中的实词所占的比例,词汇多样性采用形符类符比(type-token ratio,TTR),词汇复杂度由“词频概貌”测量(lexical Frequency Profile,LEP),反映考生所用到的词汇在第一、第二、第三个1000词词汇表以及不在这三个词汇表(off-list)的比例。

本研究所用的语料库建立在三组每组20篇共60篇不同分数段的雅思作文的基础上,按照分数,我们分成高中低三组。雅思作文的来源主要是雅思官方出版的《剑桥雅思》系列参考书以及南京环球雅思学校学生的限时作文。

本研究的主要发现为:词汇密度对于雅思写作成绩的影响并不大,而词汇多样性则很大程度上影响了雅思写作的得分,也就是说在进行雅思写作的时候,要尽量使用不同的表达方式,学会用不同的词汇表达重复的意思对于雅思写作很重要。词汇复杂性对于雅思成绩有影响但影响并不大,减少错误是雅思写作的关键,在此基础上一定量地使用较为高级的词汇可以为雅思作文增光添彩。

关键词:雅思语料库词汇密度词汇丰富度词汇复杂度

Table of Contents

Acknowledgements i

English Abstract ii

Chinese Abstract iii

Chapter One: Introduction 1

1.1 Background of the Study 1

1.2 Significance of the Study 2

1.3 The Layout of the Thesis 2

Chapter two: Literature Review 4

2.1 Defining Corpus Linguistics 4

2.2 Previous Studies on Corpus Linguistics 5

2.2.1 Previous Studies on Corpus Linguistics in China 5

2.2.2 Previous Studies on Corpus Linguistics Abroad 6

2.3 Previous Studies on Lexical Items 6

2.3.1 Previous Studies on Lexical Density 7

2.3.2 Previous Studies on Lexical Sophistication 8

2.3.3 Previous Studies on Lexical Variation 9

2.4 Measures For Lexical Research 10

2.5 Summary 10

Chapter Three: Methodology 11

3.1 Research Questions 11

3.2 Participants 11

3.3 Procedures of Data Collection 11

3.3.1 Implementing Writing Tasks 11

3.3.2 Data Selection and Sampling 12

3.3.3 Composition Evaluation 12

3.4 Data Documentation and Modification 12

3.5 Corpus Tools Used 12

Chapter Four: Results 14

4.1 Results of Three Dimensions of Lexical Research and Their Discussion 14

4.1.1 Results of Lexical Density and its Discussion 14

4.1.2 Results of Lexical Variation and its Discussion 15

4.1.3 Results of Lexical Sophistication and its Discussion 17

4.2 Results of Writing Quality 18

Chapter Five: Conclusion 20

5.1 Suggestion for IELTS writing 20

5.2 A Summary of the major findings 20

5.3 Limitations and suggestions 21

Chapter One: Introduction

1.1 Background of the Study

IELTS — International English Language Testing System, is an international testing system aiming at efficiently evaluating the candidates’ English skills in terms of reading, listening, spoken English and writing. It is jointly managed by the British Council, IDP, IELTS Australia and the University of Cambridge ESOL Examinations. Its main purpose is to measure candidates' ability of living and studying abroad. IELTS, which enjoys a high degree of authority and influence all over the world, is one of the most important tests for those students who want to go abroad and broaden their horizons. In an age of globalization nowadays, communications and cooperation between countries are becoming more and more important. So it is easy for us to understand the important role IELTS plays in our experiencing the cultures of the Commonwealth countries and even the whole world.

Owing to the influence of the education system, reading and listening are not the most difficult parts for the candidates in China. Actually Spoken English and writing test are what baffle them. The writing ability for Chinese candidates is particularly worrisome. In term of the score, the maximum for the writing test is nine and the average score for Chinese candidates is only around five — the second-lowest around the world, barely higher than United Arab Emirates and Qatar. If you try to summarize the problems reflected in this phenomenon, it is easy to figure out that the education in terms of writing in China has some common problems in both teachers and students. However, it is hard to draw a firm conclusion only through analyzing and comparing a small amount of materials in order to get to the roots of these issues. We can only get an insight into this problem and draw reliable conclusions through a large amount of statistical data. So the study based on corpus is worth our attention. The method has a long history and is becoming more and more widely accepted today with the development of the processing power and memory of computers. It is earning its own place in academic circles and is becoming a new subject called corpus linguistics.

1.2 Significance of the Study

A lot of students spend a large amount of time and energy on IELTS writing. They are equipped with a large vocabulary and a good mastery of grammar, but their scores are still unsatisfactorily low. There are four main dimensions for the evaluation of a writing task, of which the dimension concerning word choice is of great significance, but it is hard for students to have a good command of wording. A significant number of candidates prefer to use unfamiliar words while paying less attention to the accuracy of words and their context, so their words are often quite unnatural and hard to be appreciated by the examiners. There are also examinees who can only repeat the same expressions again and again without knowing how to improve their phrase-slinging ability, so their words are tedious and boring and, therefore, it is quite unlikely for them to secure a high score. So what words to use and how to use them in the writing task is worth careful exploration.

Although during the 20th century, in the area of study of second language acquisition, the word acquisition was ignored to a large degree, which made it a cinderella subject. However, during the past twenty years, researchers have been paying more and more attention to the issue of what word resources are used in evaluating the output activities of second language learners. With more and more researches being carried out in this area, a lot of methods and tools concerning lexical research are created, which provides us with great benefits for the understanding of lexical usage.

If we can build a corpus and analyze it from different perspectives, we can find the problem of IELTS writing in an objective way and offer feasible recommendations for IELTS writing. Hopefully, this paper will contribute to improving the effect of learning and teaching of IELTS writing.

1.3 The Layout of the Thesis

The thesis is organized as follows:

Chapter One serves as the general introduction, in which the background of the study, the significance of the study and the layout of the thesis are stated.

Chapter Two is a literature review, including the definition of corpus linguistics, its previous studies abroad and at home, studies on lexical items and their measures which are subsequently identified.

Chapter Three specifies the methodology of this research. It begins with research questions, followed by a description of what the participants of the research are. Then it provides a detailed description of the procedures of data collection, corpus construction and data analysis.

Chapter Four reports the results of our study with a detailed account and interpretation of these results.

Chapter Five is a discussion of the findings.

Chapter Six is my major conclusion and its implications as well as the limitations of the present study and suggestions for further research.

Chapter two: Literature Review

2.1 Defining Corpus Linguistics

The term corpus linguistics first appeared in 1980s; actually the research method similar to it had a long history. In the early 1930s, under the influence of positivism and behaviorism, those American Structuralist language scientists represented by Bloomfield considered the method of collecting and analyzing a strictly selected and genuine series of corpus as a important part of language research. However, in the 1950s, this method was strongly rejected by the transformational-generative grammar linguistics, especially by a famous linguist Chomsky. Chomsky believed that the main task for linguistic research is language competence rather than performance, and the intuition of a native speaker is the best research material. Whatever way we use to collect corpus, we cannot escape the destiny of being partial (McEneryamp;Wilson, 2011). While Quirk was against this point, he emphasized the significance of natural corpus. He once quoted Aldous Huxley, “The most accurate scientific theories, the most detailed descriptions, are all just rough and unreasonable simplification of the real-life situation. Actually even the simplest living example is incomparable complicated.” In 1959, Quirk announced the establishment of The Survey of English Usage (SEU) corpus plan. Although the corpus collected then cannot be read by computers, Quirk and other partners used the corpus to publish the book A Modern English Grammar in 1972, which could be considered as the earliest evidence for the usage of corpus for research. It is Quirk and his partners’ effort that helped to maintain the vitality of corpus research under the stress of Mentalism. Thanks largely to him, this method is well recognized by a lot of scientists. At the end of the 1980s, corpus linguistics took advantage of the development of the computer processing and storage capacity to grow stronger. With the construction of corpuses like Collins Birmingham University International Language Database(COBUILD), LONGMAN, British National Corpus and International Corpus of English and the thesis and books published based on those corpus, corpus linguistics earns its international presence in academic areas all over the world, and has grown into a new subject — corpus linguistics (Xu Guoliang, 2005).

The history of corpus linguistics can be roughly divided into three stages — manual corpus, the first generation of electrical corpus and the second generation of electrical corpus. BROWN corpus in America, London-Lund Corpus of Spoken English created by Svartvik and Lancaster-Oslo/Bergen Corpus represent the first generation of electrical corpus. After 1990s, the development of computer science has provided technological possibilities for the large-scale corpus, corpus like COBUILD, LONGMAN, and British National Corpus (BNC) and International Corpus of English (ICE) has emerged.

According to the definition of Yang and Wei, corpus is “a digital text library designed according to specific standards and rules and collected through scientific sampling method, it is of special usage to record second language learners’ interlanguage.”(Leech, 1998: xiv-xx.)

Grander suggests some rules for building a corpus

  1. according to a clear design philosophy
  2. aiming to research the second/foreign language
  3. digitalizing the real output of second/foreign language learners
  4. compiling under consolidate standards
  5. providing information including the learners’ identities

According to Li Wenzhong (2002), the purposes of corpus are as follows: firstly, to make a clear distinction between mother tongue and foreign language in order to observe and describe the influential degree of attachment for foreign language. Secondly, to figure out the main difficulty for learners by comparing with the corpuses based on native speakers in order to provide scientific guidance for the learning and teaching of foreign language. Thirdly, to enhance the understanding of language learning mechanism or even the language itself by quantitative analyses of large amount of learners’ output materials.

2.2 Previous Studies on Corpus Linguistics

2.2.1 Previous Studies on Corpus Linguistics in China

The corpus linguistics study in China started from the late 1970s. The first corpus in China is called Jiao Da English for Science and Technology (Yang Huizhong; Huang Renjie, 1982), which provides a basis for the creation of college English teaching programs and college English vocabulary (Huang Renjie, Yang Huizhong,1984, 1985) thus making positive contribution to foreign language teaching in China. Also, the thoughts and rules put forward by Yang Huizhong and other pioneers in the field of corpus processing technique has deeply influenced the coming research (Li Wenzhong, 2003). After 1990s, the corpus study in China has been on the way to establishing and researching learners’ corpus. There were lots of corpora created at that time, including Chinese Learners English Corpus(CLEC) (Gui Shichun, Yang Huizhong, 2003), College Learners’ Spoken English(COLSEC) (Yang Huizhong, Wei Naixing, 2005), Spoken and Writing English Corpus of Chinese Learners (SWECCL) (Wen Qiufang, 2005, 2008), Public English Tests and Spoken English Corpus of Chinese Learners (SECOPETS) corpus (Xiao Defa, 2005).

These corpora are based mainly on written materials accompanied with some spoken English materials. The materials of CLEC mainly come from the writing part of College English Test band 4 and 6, and the materials of COLSEC mainly come from the oral test part of College English Test. Because the College English Test imposes national uniformity, it ensures the reliability and comparability of corpus, and its national scale makes the corpus more representative. But there is no corpus in China based on international English Test such as IELTS nowadays, which makes it difficult to find Chinese candidates' weaknesses as compared with the international standards of the test. Therefore, it is necessary to be grounded this research from an international perspective, using international English Test materials as resources to establish corpus and conduct research, which is of great significance for the guidance of IELTS English teaching in China.

2.2.2 Previous Studies on Corpus Linguistics Abroad

Since 1980s, the number of corpus based on learners from different countries has grown quickly, and numerous corpora have been built. Those corpora can be divided into two kinds according to the purpose of their establishment. The first one is used for commercial purposes, for example, Longman Learner’s Corpus(LLC) and Cambridge Learner Corpus(CLC). These corpora are mainly used for providing information as to learners’ characteristics and common pitfalls for dictionary compilers. The other kind of corpus is created for academic research and education; they are represented mainly by the corpus based on learners from such countries as China, Belgium, Hungary, Poland and Japan (Xiao, 2008).

2.3 Previous Studies on Lexical Items

For second language learners, vocabulary often serves as their prime target in learning. Hyltenstam once remarked that “having a large vocabulary is one of the key factors for effective communication”. As for the writing tests, the vocabulary standards in holistic rating systems are always microscopic and subjective. So researchers are eager to quantify the assessment with indicators of lexical richness. But how to achieve this goal is always a controversial topic.

As researchers have different starting points and use various tools, the main measures include lexical richness (LR), lexical density (LD), lexical sophistication (LS), lexical originality (LO) and lexical variation (LV).

Then, the relationship between different measures is quiet complicated. For Malvern amp; Richards and Yu, lexical density and lexical diversity mean the same while for Read, lexical richness consists of lexical diversity, lexical sophistication and lexical variation. In China, some researchers like Bao Gui use lexical sophistication, lexical diversity, lexical density and lexical originality to describe the characteristics of lexical items, others like Lin Jun prefer to use Lexical Frequency Profile (LEP) to describe it. Also, the definition of these terms does not have a unified standard. Some researchers use the term lexical density to refer to different computing methods.

2.3.1 Previous Studies on Lexical Density

The term lexical density was coined by Ure in 1971; it refers to the proportion of lexical words in total. According to his research, lexical density for writing materials are always greater than 40% in general, while for oral materials the statistics are always under 40%. Lexical word is a vague term; different researches have different definitions for it. O’Loughlin (1995) restricts adverb to adverbs of time, place and manner while Eager defines it as the words transferred from adjectives having the suffix “-ly”. Lu (forthcoming) argues that lexical words should include nouns, verbs (including auxiliary verb, be, have and modal verb), adjectives and adverbs.

Although it is a general belief that lexical density should be integrated with language proficiency, there is no proof in research until now to justify it. It is probably because lower-level learners use fewer functional words, which leads to the high rate of lexical density.

Linnarud’s study in 1986 proves that lexical density for native speakers (0.44) is slightly higher than second language learners (0,42), and the difference is close to its statistically significant plt;0.07. Hyltensam’s research in 1988 showed that there is no difference in terms of lexical density between native speakers and advanced second language learners. Lexical density has no relationship with holistic ratings as Linarud, Nihalani and Engber point out. The reason may be the influence of sample length or of the repetitive usage of lexical words by lower-level learners.

2.3.2 Previous Studies on Lexical Sophistication

Lexical sophistication (LS), also called lexical rareness, refers to “the proportion of advanced words that is rarely used in the output materials”(Read, 2000). As to the definition and computing method, however, different researchers have different opinions. Although the computing method of Linnarud and Hyltenstam are both calculating the proportion of complicated notional words in all words, they have different definitions of complicated notional words. For example, Linnarud defined complicated notional words as the words learned after grade nine in Sweden while Hyltenstam considered it as the words beyond 7000 basic words for Sweden learners.

West created General Service List for ESL/EFL learners in 1953, the words in which were extracted from a corpus of five million words, including the most commonly used 2000 word families. He took full consideration of the frequency of occurrence, the difficulty in learning and the type of writing. Hirsh's study found that the coverage rates of GLS word families in literary works are 90% (Hirsh, 1993), in non-literary works are 75% (Hwang, 1989), and in academic materials are 76% (Coxhead, 1998). As early as the word list came into being, it has become irreplaceable for research and education for a long time.

Another important word list is University Word List created by Xue amp; Nation in 1984; the word list was re-edited and re-combined by four different word lists (Coxhead, 2000). University Word List includes word families “not including in the most commonly used 2000 word families, but are still commonly used in academic texts.” Then the software Range came into being in 1996, which had a huge influence in the lexical research areas. Learners, teachers and researchers made extensive use of the word list. But the shortcoming for this world list is that it lacks specific selecting rules.

Coxhead created an academic corpus of 3, 513, 330 word families to make up for the shortfall of Xue amp; Nation’s word list. According to strict and clear standards, he selected 570 words from this corpus form up a Academic Word List. In this word list, there are commonly used word families such as analysis, concept, data and research as well as uncommonly used word families such as convince, ongoing, persist and whereby. As compared with UWL, which has 836 word families, AWL has a smaller number of word families but the coverage rate in academic materials is 10%, higher than UWL (9.8%) (Coxhead, 2000). Because of its wider range, the word list is often used in situations where low-frequency words are not counted.

Laufer and Nation designed Lexical Frequency Profile in 1995, dividing words into four groups according to word families. They are most commonly used 1000 word families, the second commonly used 1000 word families, UWL and the word families off list. UWL and off-list parts are complicated high-frequency and low-frequency words.

2.3.3 Previous Studies on Lexical Variation

Lexical variation refers to the variation range of lexical items in texts. Some scholars equate lexical variation with lexical diversity or lexical range (Crystal, 1982).

Using type-token ratio (TTR) to measure vocabulary richness is a common way for the researches of the mother tongue for children and the study of second language learning for adults. But it is blamed for its sensitivity to the length of materials. Because the longer the material is, the smaller the TTR is (Arnaud, 1992; Hesset al, 1986; Richards, 1987). So, scholars tried different ways to solve the problem by creating numerous formulas for TTR. The main remedies are as follows:

剩余内容已隐藏,请支付后下载全文,论文总字数:52903字

您需要先支付 80元 才能查看全部内容!立即支付

该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;