Throughout late February and early March of 2024, around 270 students took tests in hopes of earning the Illinois Seal of Biliteracy. The most popular test taken was the AAPPL Performance test, which tested students’ proficiency in many languages–namely Spanish, French and German.
Originating in California, the Seal of Biliteracy allows students to receive college credit for demonstrating a high proficiency in their native language and a foreign language. When taken through the AAPPL test, students are given different scores for each of the tests they take: interpretive reading, presentational writing, interpretive listening and interpersonal listening/speaking–ranging from Novice (1-4) to Intermediate (1-5) to Advanced. Once a student scores an Intermediate High (I5), they’ve passed that section and don’t need to retake it. Not only that, but testing can also act as a valuable tool for both students and teachers to see their language learning progress.
This year, however, students and teachers expressed frustration with unusually quick and inconsistent score results in which they felt their scores didn’t accurately reflect their language proficiency.
Junior Ellie Jahoda expressed frustration when describing her repeated efforts to improve her writing score in French. After getting an Intermediate Mid (I4) two years in a row, she retook the writing section in hopes of a better score.
“I spent like three hours on the test, it was the best I’ve written on the Seal, [but] this time I got an I1 [Intermediate Low],” said Jahoda. “[My teacher] looked at my responses and was like, ‘I don’t understand why you got an I1, this is better than your previous one.’”
She also expressed suspicion towards the mention of AI being used to help score tests this year. “I don’t think [the test] is accurate because it should be a human who speaks the language grading it, not AI.”
To address these concerns, X-Ray staff interviewed World Language Department Chair Meghan Mitchell.
The AAPPL exam has been used ever since STCE started offering the Seal of Biliteracy in 2016; however, Mitchell noted that this year was faced with more frustrations than usual.
The most notable concern was the inconsistency of many scores. Some students scored highly on all of the tests except for one; others got unusually lower scores on retakes when they felt they’d performed better, as did Jahoda.
“It’s hard because it could be a completely different person from the company grading it than the first one,” said Mitchell. “However, they should all be held to the same standards.”
Mitchell noted that these issues could be affected by the potentially overwhelming number of students taking the test than in previous years.
The typical response to these inconsistencies would be to submit a request for review. However, this was when a second frustration came about: an unusually quick turnaround for scoring decisions. Not only was there a quick turnaround for grading the original tests–specifically for writing tests, sometimes only taking a few days to get scores back to students–but when a batch of around 13 requests for review were sent from English teachers, results came back in less than 48 hours with no change to any scores. This contrasted with when a test request was sent out before the batch, which was met with a higher score after five to six days of review.
Students and staff expressed suspicion of the supposed AI grading system due to these quick turnarounds. However, after talking to AAPPL representatives, Mitchell said the influence of AI in grading has its limitations.
The listening and reading sections have always been computer-graded since those tests only contain multiple choice questions. “It was confirmed [that] speaking was not AI graded,” said Mitchell. “Writing is both computerized and by person.”
In writing responses, AI looks for inappropriate language, blank answers, answering in the wrong language, harmful/violent language and other similar elements that get immediately flagged. After these are filtered through, the rest of the responses are still sent to human graders to properly grade. In theory, this makes grading faster and more efficient since it allows graders not to waste time grading blatant errors.
A small problem with this technology occurred when many writing scores had been incorrectly flagged with a comment stating “Learner chose not to respond to the prompt.” After bringing it to AAPPL’s attention, the issue was resolved within a few days and all responses were graded by a human.
“[Since] they’re trying this technology out this year, because we brought it to their attention, they were able to fix it for the rest of other schools who were testing,” said Mitchell. “Whenever you try something new, there could be glitches.”
However, some of these issues may be resolved with AAPPL’s implementation of a composite score as per the Illinois State Board of Education (ISBE). Similar to AP language tests, composite scores take the average score between the four sections and award the Seal to students who receive an average of at least an I5. If a student did retakes, the composite score takes the highest score out of all of the tests taken. This helps to eliminate potential score inconsistencies that may be bringing a student down.
After STCE requested composite scores for this year, 54 more students were able to receive the Seal as of May 1, including Jahoda.
“I’m very happy with the new composite score grading. I think it makes more sense,” said Jahoda. “However, I think they should have started this scoring method a while ago, as I could have previously gotten the seal without having to retake it.”
Though there are discussions of potentially switching companies for the Seal, nothing has been decided yet. The department is considering switching to the STAMP test, a test currently being used to test Hebrew, Hindi, Polish, Russian and American Sign Language. The STAMP test only uses three writing prompts as opposed to AAPPL’s six and gives individual scores and feedback for each of those prompts. However, this would make it difficult for students who only need to retake certain sections of the AAPPL test.
“Regardless, we will do what we feel is best for our students to be successful and get the seal,” said Mitchell. “Everybody in [the language department], everybody at the district, everybody at North–we all have that same mentality of, ‘We want to do what is gonna be best for you guys.’ So we’ll figure that out.”
Even with these frustrations, the accomplishments of the Seal over time are not to go understated. More and more students are taking the Seal each year with improving scores, especially in the listening and reading sections. This year, seniors who received the Seal were recognized at the Spring Recognition Assembly.
“It is refreshing and very helpful that our administration really does support us and what we do to help you guys, supports you guys, and wants you guys to do well on this,” said Mitchell. “They celebrate the fact that we want to celebrate you guys for learning a second language.”
Though there were frustrations, language teachers don’t want to discourage anybody from testing for the Seal. “There’s ups and downs, right? Overall, I still think that it’s a good tool to have to see where our language learners are,” said Mitchell. “And maybe not everyone’s perfect or every category is great, but you guys are rocking it.”