AI Luigi Audio Drama Production Process，and Initial Thoughts on AI Screenwriting Workflow

About 2583 wordsAbout 9 min

2025-02-22

This article was originally written in Chinese, and the English version was translated by GPT-4o
To test the scriptwriting ability of large models, I wrote a short script with Deepseek—Luigi version of "12 Angry Men". Here, I'll share the production process, my thoughts, and my initial considerations about AI scriptwriting workflows. Feel free to discuss!

Conclusion

This small project validated some predictions I made in my article AI Screenwriting History. If you're interested, you can refer to this transcript
Good news: Current large models are fully capable of meeting production needs, significantly improving screenwriting efficiency at an extremely low cost.
Bad news (though it’s still good news): So far, to use AI for screenwriting, you still need to be a screenwriter. A bad writer will still write bad scripts even with AI.
Training data for general-purpose models is heavily polluted, making them less ideal for specialized tasks. A dedicated model would be better.
Purely AI-generated content probably won't achieve the goal of 'cost reduction and efficiency improvement' — it took me over a week to get this 4-minute short film to a barely usable level. This indicates that —
Human-driven workflows with AI as a local assistant remain the optimal solution, which should be much more efficient than pure AI generation (I'll verify this further at another time).
The era of universal creativity has arrived.

Production Process

When you see this open-ended conclusion, you should already know that all the lines were generated by AI—if I wrote it, the ending would definitely be "not guilty."

However, this process wasn’t as simple as it seemed — it took over a week to generate 38 lines of dialogue, and on the final night, I stayed up all night and didn’t finish the final draft until 7:30 AM. It probably would have been faster for most people to handwrite it... So I’m writing down the process as a record.

This project aimed to test AI's creative abilities.

So, in this experiment, I set the limitation for myself: I cannot directly modify AI-generated content, but I can guide it to provide options and suggest unreasonable parts through questions.

For comparison, I also used GPT for a similar task; the two were independent and did not influence each other.

Minimal Intervention

The first film script written entirely by AI (Sunspring, 2016) had almost no human intervention. So, at the beginning, I adopted the same strategy, and the first thing I told DeepSeek was just three words:

Write a script

It gave me a scene where two young people are having a conversation in a café. Xiao Li is anxious about being laid off, and Xiao Zhang encourages him to start a business.

Given the current economic situation, Xiao Zhang's lines felt somewhat off, but it's definitely smoother than "Sunspring" (though I tried using GravityWrite for a short script a year ago, and the results were a bit better).

My approach was:

First, observe what AI can come up with with no prompts.
Gradually increase the prompts based on the feedback to see to what extent the script can be made usable.

After selecting deep thought, I typed again: Write a script

This time, it wrote a scene where the female protagonist is cleaning up old things and unexpectedly discovers that her deceased father had cheated on her mother—typical melodrama...

Bilibili Video — Triggering Bad Jokes Mode

I changed the angle and said, I want you to write a script, what do you think is the best approach?

It gave an impressively vague process:

Core positioning meeting
Character gene engineering
Conflict sandbox simulation
Dialogue stress testing
Visual anchor embedding
Interactive co-creation mode

This reminded me of those people who love using industry jargon but have no idea what they're doing.

I told it that the script was meant to be made into a video and uploaded to Bilibili.

This keyword triggered some strange switch, and it started spitting out all sorts of Bilibili clichés. More than half of them were things I hadn’t even heard of as an old Bilibili user.

As a Bilibili content creator, I was already suffering from information overload just by reading Bilibili comments, but its responses made my blood pressure spike. No matter how I adjusted the interaction style, it kept going this way. I finally realized that it has a strong stereotype about Bilibili users.

I suspect that the term “Bilibili” is a central node that triggers a chain of bad joke associations, and it cannot be corrected through conversation. In the end, I had to abandon "Bilibili" as a keyword and start over.

By the way, many people love to make DS mimic the style of a BBS troll or Bilibili comment section to insult people, thinking this proves the language ability of large models. But all I feel is annoyance. These comments, filled with internet clichés, have very little useful information. This language style is popular because, with the development of the internet, human language has become increasingly blurred through compression.

This kind of cyber cosplay is so well done that it reflects a kind of data pollution, which seriously reduces its ability in professional fields.

The worst part is that this data is fed into AI, which then outputs it back to humans, creating a loop that leads to more depression and teenage dementia

Radio Drama

Considering cost, I decided to abandon visuals and switch to a radio drama. But like with Bilibili, once the prompt involved "radio drama," the scope of topics became very narrow, and there was a noticeable reduction in quality. No matter how I adjusted it, I couldn't fix it, so I had to give up.

I guess it's because of insufficient training data, which caused AI to have an extremely narrow understanding of radio dramas.

In contrast, when it comes to movie screenwriting, based on its deep thought process, it doesn’t really understand screenwriting terminology, and its understanding of the same terminology can vary from conversation to conversation. I guess this is due to too much training data, lacking any filtering (there's just so much garbage material in this field).

Luigi version of "12 Angry Men"

Considering that the result when letting AI pick the topic itself was poor, I decided on the "Luigi" topic myself.

Through interaction, I discovered some common mistakes it made, and based on that, I consolidated the prompts, started a new conversation, and let it organize prompts to link AI voice generation workflow.

I found it likes to alter facts—even when I asked it to stick strictly to real-world events (at first I let it search itself, later I provided a few specific websites), it kept making changes. Luigi's profession changed multiple times, and none of them were correct... and countless other errors. The more I corrected, the worse it got.

I had the same experience with GPT.

One discovery is that you can’t emphasize what it shouldn’t do.

For example, I asked AI to follow the real-world jury selection rules, avoid certain professions, stakeholders, and backgrounds that would obviously lead to bias. This step was very time-consuming, and each response became more and more absurd.

I couldn’t help but directly give a firm no, breaking the rules. And I even told GPT once:

Are you fucking kidding me!

I guess negative instructions trigger related associations in AI, leading to more mistakes (this is very similar to humans). And since I didn’t narrow down the scope with specific instructions, it caused further confusion.

In the end, they both consolidated a set of prompts, and the idea was to input them all at once, dividing the creation into several steps, with each step waiting for my confirmation before proceeding.

Due to the different characteristics of DS and GPT and the errors they tend to make, the two sets of prompts are very different. I tried several rounds in both Chinese and English.

Eventually, this design idea failed, as each had its own issues.

DS always gave answers for all steps at once, and I had to keep reminding it each time. The generated content didn’t turn out well, so I had to restart the conversation. I switched to manually inputting the steps, confirming the information layer by layer.

The workflow depends on the project, and the best workflow for each model is different.

I constrained the length and character count, having DS finalize the character and argument in the first two steps, then gradually provide summaries, outlines, and even single-page scripts.

It often suggested action and sound design, but since I didn’t have time for sound effects, I had it remove them.

Script Format

In terms of voice generation, they provided some tools and told me I could use SSML format (to specify emotions, speed, pitch, etc. for each sentence). So the first version of the script generated by DS looked like this (strangely, they both gave different syntax):

<surf:script>
  <!-- Scene 1: Opening Statements -->
  <surf:scene id="opening" bgm="courtroom_ambience.mp3">
    <character ref="Juror1" voice="en-US-Wavenet-A" pitch="+5%">
      <prosody rate="medium">Alright, let’s get started. We have a serious case here, and we need to focus on the facts.</prosody>
    </character>
    <character ref="Juror4" voice="en-US-Wavenet-C" pitch="-10%">
      <break time="300ms"/>But the facts are already tainted. The blood vial broke during the helicopter transport. <audio src="sfx:blood_vial_break"/>
    </character>
  </surf:scene>
</surf:script>

This format was revised a few times during the process and later simplified.

I’ve tried several other products, but none of them were particularly suitable. Amazon does support SSML, but users in China cannot use the neural options—only the standard ones (does this neck even deserve to be stuck like this?), so I gave up.

When entering the script in Murf.ai's web version, each line had to be manually configured for character, emotion, speed, pitch, etc. To generate in bulk, I eventually had to check Murf’s official documentation and learned how to use the API for the first time. It requires writing code (though, of course, AI does the work). The examples provided on the website seemed to have not considered cases with multiple characters, and there were various errors. After a long struggle, I was finally able to somewhat understand the code...

The format used for generation in the end was like this:

import requests
import json

url = "API endpoint"

fixed_params = {
    "sampleRate": 48000,
    "format": "WAV",
    "channelType": "MONO",
    "encodeAsBase64": False,
    "variation": 1,
    "audioDuration": 0,
    "modelVersion": "GEN2",
    "multiNativeLocale": "en-US"
}

script = [
    {   # Juror 1 (Julia)
        "voiceId": "en-US-julia",
        "style": "Conversational",
        "text": "But why parade him in cuffs? To make us feel like justice is done. What if the real crime is the system he's fighting? [pause x-strong]",
        "rate": -3,
        "pitch": 2,
        "output_file": "0001julia.WAV"
    },

text is the dialogue, and the rest configures voice character, emotion, speed, pitch for each line. I could also adjust the pronunciation of individual words, but that caused errors, so I removed it. Also, the code must include file-saving instructions at the end.

Murf.ai's available characters, emotions, and accents are seriously limited. Age, ethnicity, etc., settings aren't reflected in the final product. I also checked out over a dozen other websites and tried a few of them. Two or three had more accents but were more expensive. DS also provided a complex solution that supposedly works better, but it was just too complicated.

By the way, before settling on this topic, the first idea I had was to dub it myself using software to change the voice tone (laughs). I’ll try that if I get the chance in the future.

Summary

The main work was done the night before release, and it felt like I had some kind of blessing—no hiccups after that. Maybe DS knew I was giving Luigi an innocent buff and got touched by it.

The video also triggered two limited words on YouTube—#deepseek & #luigi (laughs). Fortunately, I didn’t care about the traffic and posted it using a secondary account.

A couple of days ago, Bilibili recommended a domestic film called Victory Is in Hand, which got quite a few views and a lot of positive comments in the section. I watched a clip… I can responsibly say the dialogue in this short film is at least three levels higher than those in typical domestic bad films.

I looked it up, and Victory Is in Hand cost 150 million.

This short film was all done for free.

Overall quality and cost, saying AI has reached production standards is definitely not a problem.

Most of the responses from DS were “server busy, please try again later.” I haven’t explored it deeply enough.

GPT is fast; its initial performance was better, but after a while, it really dumbs down, and often the reasoning doesn’t seem like real reasoning.

DS also has this problem—if the context gets too long, it loses focus, repeats itself, and forgets previous settings.

After repeated tweaks to the prompts, I chose the relatively better version for the final product, but there were still repetitions in the dialogue, and the conversation didn’t flow naturally enough.

If I keep trying, I believe I could do even better, but it’s not necessary. This task turned out to be more time-consuming than I expected. One of my guesses is that generating a seamless piece of work takes more time than writing it entirely by hand. However, I don’t plan to verify this.

By integrating into a local workflow, with humans taking the lead, many of the issues mentioned earlier can be avoided (of course, this assumes the writer already has an effective workflow,Understanding story creation). Next time, I’ll try human-AI collaboration directly, rather than pure AI generation.

There are already AI-integrated screenwriting software available (see History of AI Screenwriting , and there are more such websites than last year.

My workflow is self-designed and cannot rely on existing data. If there’s a chance to deploy it locally, write my own rules, train myself, or even develop software with AI tailored to my needs, that might be a better choice (but for now, it’s just a thought).

In the end, if you’re interested in this topic ➨ Feel free to connect

Copyright

This article link:/en/post/luigi-ai-workflow/

License under:CC BY-NC 4.0