背景
根據 Scholastic,
E.B. White 的 Charlotte's Web 閱讀年級落在 2 - 4 年級,
Lois Lowry 的 The Giver 閱讀年級落在 8 - 12 年級。
這是什麼意思,為書籍指定特定的閱讀年級?
在大部分的情境,
一名專家可以看完一本書之後大概為其決定等級(學校年級),
但演算法應該也可以做到。
所以什麼樣的特徵會被斷定是較高的閱讀年級?
像是,比較長的單字或是比較長的句子。
許許多多得 "可讀性測試" 已開發多年,為了定義能夠計算文字可讀性的公式。
其中一種可讀性測試叫做 Coleman-Liau index。
它設計了一種公式,會得出需要到幾年級才適合看某些文字。
公式是:
index = 0.0588 * L - 0.296 * S - 15.8
其中 L
是 每一百個單字的平均長度,S
是 每一百個單字平均句子數量。
我們來寫一隻叫 readability
的程式吧,根據給他的文字判定其閱讀等級。
例如,如果使用者輸入一行 Dr.Seuss
的文字,
這個程式應該要有以下行為:
$ ./readability
Text: Congratulations! Today is your day. You're off to Great Places! You're off and away!
Grade 3
這段文字使用者輸入了 65 個字母,4 個句子,14 個單字。
14 個單字有 65 個字母其平均是每一百單字 464.29 個字母 (65 / 14 100 = 464.29)。
還有 14 個單字有 4 個句子其平均是每一百單字 28.57 個句子 (4 / 14 100 = 28.57)。
將其導入 Coleman-Liau 公式並取近似整數,我們可以得到答案是 3 (0.0588 464.29 - 0.296 28.57 - 15.8 = 3)。
所以這個輸入是小三程度的。
我們來試試另一個:
$ ./readability
Text: Harry Potter was a highly unusual boy in many ways. For one thing, he hated the summer holidays more than any other time of year. For another, he really wanted to do his homework, but was forced to do it in secret, in the dead of the night. And he also happened to be a wizard.
Grade 5
這段文字有 214 個字母,4 個句子,56 個單字。
可以得出 每一百單字有 382.14 個字母,7.14 個句子。
導入 Coleman-Liau 公式,我們得知這個是小五程度。
隨著平均字數跟句子的增加,Coleman-Liau index 會給予我們更高的閱讀年級。
如果你把這段句子丟去測試,這段句子比上面兩個範例的單子跟句子都還要長,
公式應該會判定是高三程度。
$ ./readability
Text: As the average number of letters and words per sentence increases, the Coleman-Liau index gives the text a higher reading level. If you were to take this paragraph, for instance, which has longer words and sentences than either of the prior two examples, the formula would give the text an twelfth-grade reading level.
Grade 12
規格
設計並實作出 readability
程式,計算出某段文字的 Coleman-Liau index.
- 在
readability
資料夾中的readability.c
實作你的程式. - 你的程式要透過
get_string
向使用者拋入string
. - 你的程式需要計算出文字中的字母,單字跟句子的數量。
你可以當作字母是指任一小寫字母a
到z
跟 大寫A
到Z
,
所有被空白分隔的字串視為單字,
以及任何結尾出現標點符號的皆視為句子。 - 你的程式需印出
"Grade X"
其中X
是 Coleman-Liau 公式,取近似整數的結果。 - 如果其結果數字大於等於 16 ,你的程式需要輸出
"Grade 16+"
取代實際的 index 數值。
如果 index 小於 1 ,則程式要輸出"Before Grade 1"
。
取得使用者輸入
讓我們開始寫些 C 程式碼來取得使用者的文字輸入,並把它印出來。
在 readability.c
的 main
函式透過 get_string
函式,
顯示"Text: "
要求使用者提供文字並將其文字透過 printf
完整輸出出來。
注意 #include
必須的 header 檔案。
程式應該要有以下行為:
$ ./readability
Text: In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
字母
現在讓你已經蒐集到使用者輸入了,
讓我們開始分析使用者輸入的文字中有多少字母。
只需要考慮大小寫的 alphabetical 文字,不用理會 標點符號,數字或是其他符號。
在 readability.c
的 main
底下新增 count_letters
函式,
它只接收一個 string
參數也就是文字,並回傳 int
也就是文字的字數。
記得要在檔案的上方加入你的函式宣告,所以 main
才知道如何執行它。
int count_letters(string text)
然後執行 main
函式,現在你的程式應該除了印出文字本身,也可以印出字數。
程式應該要有以下行為:
$ ./readability
Text: Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice "without pictures or conversation?"
235 letters
單字
Coleman-Liau index 不只要考慮字數也是要考慮到句子中的單字量。
在這個題目,我們只需要考慮被空白分隔的單字即可(所以,用 hyphen 分隔的,像是 "sister-in-law"
視為一個單字,而不是三個)。
在 readability.c
的 main
底下新增 count_words
函式,
它只接收一個 string
參數也就是文字,並回傳 int
也就是文字的單字數。
記得要在檔案的上方加入你的函式宣告,所以 main
才知道如何執行它。
然後執行 main
函式,現在你的程式應該可以印出單字數。
請假定句子不會有首尾空白的情況,也假定句子中間不會出現多重空白。
程式應該要有以下行為:
$ ./readability
Text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.
250 letters
55 words
句子
Coleman-Liau index 最後要考慮的部分是,句子的數量。
判別句子的數量有些技巧。
可能剛開始根據結尾是句號下去判定句子,但也是有結尾是驚嘆號跟問號的情況。
而且,也不是每個句號都表示這是句子的結尾,例如以下句子:
Mr. and Mrs. Dursley, of number four Privet Drive, were proud to say that they were perfectly normal, thank you very much.
這只是一個句子,但卻有三個句號!
在這個題目中,我們要求你忽略這個問題:
將 .
!
?
在結尾的情況皆視為一個句子 (所以上述的句子,會被算成三個句子)。
但實際情況,句子的邊界測定需要一些智能判定,我們現在先不用考慮這個。
在 readability.c
的 main
底下新增 count_sentences
函式,
它只接收一個 string
參數也就是文字,並回傳 int
也就是文字的句子數。
記得要在檔案的上方加入你的函式宣告,所以 main
才知道如何執行它。
然後執行 main
函式,現在你的程式應該可以印出句子數。
程式應該要有以下行為:
$ ./readability
Text: When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh.
295 letters
70 words
3 sentences
組合在一起
現在該把這些拼裝在一起了!回想一下 Coleman-Liau index 的計算公式:
index = 0.0588 * L - 0.296 * S - 15.8
其中 L
是 每一百個單字的平均長度,S
是 每一百個單字平均句子數量。
改寫 readability.c
中的 main
,
所以不用在印出 字數,單字數,句子數,他只需要印出 Coleman-Liau index 判定的年級結果
("Grade 2"
或是 "Grade 8"
之類)。
記得要將得到的結果做近似整數。
Hints
- 記得round
被定義在 math.h
- 記得在 C 語言,
int
在除法的時候任何小數位會被捨棄。你可能需要透過 float
來處理完除法的運算後在轉回 L
跟 S
!如果其結果數字大於等於 16 ,你的程式需要輸出 "Grade 16+"
取代實際的 index 數值。
如果 index 小於 1 ,則程式要輸出 "Before Grade 1"
。
如何測試你的程式碼
試著用以下的文字來測試你的程式,
確認你有看到指定的閱讀年級。小心只有複製文字,不要複製到額外的空白。
- One fish. Two fish. Red fish. Blue fish. (Before Grade 1)
- Would you like them here or there? I would not like them here or there. I would not like them anywhere. (Grade 2) Congratulations! Today is your day. You're off to Great Places! You're off and away! (Grade 3)
- Harry Potter was a highly unusual boy in many ways. For one thing, he hated the summer holidays more than any other time of year. For another, he really wanted to do his homework, but was forced to do it in secret, in the dead of the night. And he also happened to be a wizard. (Grade 5)
- In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since. (Grade 7)
- Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice "without pictures or conversation?" (Grade 8)
- When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. (Grade 8)
- There are more things in Heaven and Earth, Horatio, than are dreamt of in your philosophy. (Grade 9)
- It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. (Grade 10)
- A large class of computational problems involve the determination of properties of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean formulas and elements of other countable domains. (Grade 16+)