
实战拆解:如何使用 ChatGPT Agent 实现自动化多步骤任务
在网络内容爆炸的时代,AI技术正帮助创作者以极低的成本和极高的效率制作出大量短视频。特别是“AI短剧”,即利用各类AIGC(人工智能生成内容)技术自动生成剧本、图像、声音和视频,无需专业的摄影团队,即可产出精美的成片,成为观看、体验和分享的新风潮。
本文将从技术角度出发,完整分析AI短剧从创意到合成、从生成到质量控制的全流程技术细节。同时,提供实际可用的API链接和代码示例,帮助你快速开发、项目落地和商业化实践。
通过调用大型语言模型(LLM)生成剧本,包括场景分镜、对话和剧情转折。以下是推荐的API平台及调用示例:
OpenAI GPT-4o API:强大的语言生成能力,适合复杂剧情创作。OpenAI GPT-4o API
import openai
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是一个擅长短剧创作的编剧"},
{"role": "user", "content": "请写一篇关于‘逆袭上位’的爆款短剧剧本,长度300字以内,包含强情绪反转"}
]
)
print(response.choices[0].message['content'])
Claude 3.5 Sonnet API:专注于创意写作,适合生成富有情感的剧本。Claude 3.5 Sonnet API
from anthropic import Anthropic
client = Anthropic()
response = client.completions.create(
model="claude-3.5-sonnet",
prompt="Write a one-sentence bedtime story about a unicorn.",
max_tokens=300
)
print(response.completion)
通义千问API:适合生成具有中国文化特色的剧本。通义千问API
from tongyi import TongYi
client = TongYi()
response = client.generate(
model="qwen-3",
prompt="请写一篇关于‘逆袭上位’的爆款短剧剧本,长度300字以内,包含强情绪反转"
)
print(response.output_text)
Yi-Large API:适合生成多样化的剧本内容。Yi-Large API
from yi import Yi
client = Yi()
response = client.generate(
model="yi-large",
prompt="请写一篇关于‘逆袭上位’的爆款短剧剧本,长度300字以内,包含强情绪反转"
)
print(response.output_text)
使用文生图、文生视频工具生成角色图像和视频片段。以下是推荐的API平台及调用示例:
Stable Diffusion (SDXL):强大的图像生成能力,适合生成高质量的角色图像。Stable Diffusion
import requests
response = requests.post(
"https://api.stability.ai/v1/generation/stable-diffusion-xl",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"text_prompts": [
{"text": "A young woman in a white shirt, looking determined, in an urban office setting."}
],
"cfg_scale": 7,
"clip_guidance_preset": "FAST_BLUE",
"height": 512,
"width": 512,
"samples": 1,
"steps": 50,
},
)
print(response.json())
Runway ML:适合生成高质量的视频片段。Runway ML
import runway
client = runway.RunwayClient()
response = client.generate(
model="runway-gen-4",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
Pika Labs:适合生成高质量的动画角色。Pika Labs
import pika
client = pika.PikaClient()
response = client.generate(
model="pika-ai",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
Synthesia (AI虚拟人影像):适合生成虚拟角色的视频。Synthesia
import synthesia
client = synthesia.SynthesiaClient()
response = client.generate(
model="synthesia-ai",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
HeyGen:适合生成高质量的虚拟角色视频。HeyGen
import heygen
client = heygen.HeyGenClient()
response = client.generate(
model="heygen-ai",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
Fliki:适合生成高质量的虚拟角色视频。Fliki
import fliki
client = fliki.FlikiClient()
response = client.generate(
model="fliki-ai",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
支持多种音色、语系和情绪风格。以下是推荐的API平台及调用示例:
OpenAI TTS:适合生成高质量的语音。OpenAI TTS
import openai
response = openai.TTS.create(
model="tts-1",
input="你居然背叛我?我们不是朋友吗?"
)
print(response.output)
Azure TTS:适合生成多种语言的语音。Azure TTS
from azure.cognitiveservices.speech import SpeechConfig, SpeechSynthesizer
speech_config = SpeechConfig(subscription="YOUR_AZURE_SUBSCRIPTION_KEY", region="YOUR_AZURE_REGION")
synthesizer = SpeechSynthesizer(speech_config=speech_config)
synthesizer.speak_text_async("你居然背叛我?我们不是朋友吗?").get()
Google Cloud TTS:适合生成高质量的语音。Google Cloud TTS
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
synthesis_input = texttospeech.SynthesisInput(text="你居然背叛我?我们不是朋友吗?")
voice = texttospeech.VoiceSelectionParams(
language_code="zh-CN",
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
input=synthesis_input,
voice=voice,
audio_config=audio_config
)
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
讯飞TTS:适合生成中文语音。讯飞TTS
from iflytek import IflytekTTS
client = IflytekTTS()
response = client.synthesize("你居然背叛我?我们不是朋友吗?")
print(response.output)
腾讯云TTS:适合生成多种语言的语音。腾讯云TTS
from tencentcloud.tts.v20190823 import tts_client, models
client = tts_client.TtsClient(
credential=credentials.Credential("YOUR_SECRET_ID", "YOUR_SECRET_KEY"),
region="YOUR_REGION"
)
req = models.TextToVoiceRequest()
req.Text = "你居然背叛我?我们不是朋友吗?"
req.SessionId = "12345"
req.ModelType = 1
req.Volume = 5
req.Speed = 0
req.ProjectId = 0
req.VoiceType = 101010
req.PrimaryLanguage = 1
req.Codec = "mp3"
response = client.TextToVoice(req)
print(response.to_json_string())
Hume AI Octave:适合生成高质量的语音。Hume AI Octave
import hume
client = hume.HumeClient()
response = client.generate(
model="octave",
input="你居然背叛我?我们不是朋友吗?"
)
print(response.output)
将图像、音频和配音合成为动态视频。以下是推荐的API平台及调用示例:
FFmpeg:强大的视频处理工具,适合视频合成。FFmpeg
ffmpeg -i input1.mp4 -i input2.mp4 -filter_complex "[0:v][0:a][1:v][1:a]concat=n=2:v=1:a=1" -c:v libx264 -c:a aac output.mp4
MoviePy:适合视频剪辑和合成。MoviePy
from moviepy.editor import *
clip1 = VideoFileClip("input1.mp4")
clip2 = VideoFileClip("input2.mp4")
final_clip = concatenate_videoclips([clip1, clip2])
final_clip.write_videofile("output.mp4")
Runway Gen-2 API:适合生成高质量的视频。Runway Gen-2 API
import runway
client = runway.RunwayClient()
response = client.generate(
model="runway-gen-2",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
Pika API:适合生成高质量的视频。Pika API
import pika
client = pika.PikaClient()
response = client.generate(
model="pika-ai",
prompt="A young woman in a white shirt, looking determined, in an urban office setting."
)
print(response.output)
支持自动语音识别和翻译,生成字幕。以下是推荐的API平台及调用示例:
OpenAI Whisper:适合生成高质量的字幕。OpenAI Whisper
import whisper
model = whisper.load_model("base")
result = model.transcribe("short_drama_audio.mp3")
print(result["text"])
百度语音API:适合生成中文字幕。百度语音API
from aip import AipSpeech
client = AipSpeech("YOUR_APP_ID", "YOUR_API_KEY", "YOUR_SECRET_KEY")
result = client.asr("short_drama_audio.mp3", "pcm", 16000, {"dev_pid": 1537})
print(result)
DeepL Translate:适合生成多种语言的字幕。DeepL Translate
import deepl
translator = deepl.Translator("YOUR_DEEPL_AUTH_KEY")
result = translator.translate_text("你居然背叛我?我们不是朋友吗?", target_lang="EN")
print(result.text)
Google Translate:适合生成多种语言的字幕。Google Translate
from google.cloud import translate_v2 as translate
client = translate.Client()
result = client.translate("你居然背叛我?我们不是朋友吗?", target_language="en")
print(result["translatedText"])
使用评分模型自动检测生成内容的质量。以下是推荐的API平台及调用示例:
OpenAI Moderation API:适合检测生成内容的合规性。OpenAI Moderation API
import openai
response = openai.Moderation.create(
input="你居然背叛我?我们不是朋友吗?"
)
print(response.output)
Google Natural Language:适合检测生成内容的情感倾向。Google Natural Language
from google.cloud import language_v1
client = language_v1.LanguageServiceClient()
text_content = "你居然背叛我?我们不是朋友吗?"
type_ = language_v1.Document.Type.PLAIN_TEXT
language = "zh"
document = {"content": text_content, "type_": type_, "language": language}
response = client.analyze_sentiment(request={"document": document})
print(response.document_sentiment.score)
Azure Text Analytics:适合检测生成内容的情感倾向。Azure Text Analytics
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
client = TextAnalyticsClient(
endpoint="YOUR_AZURE_ENDPOINT",
credential=AzureKeyCredential("YOUR_AZURE_KEY")
)
response = client.analyze_sentiment(["你居然背叛我?我们不是朋友吗?"])
print(response[0].sentiment)
AI短剧正在经历从手工制作向自动化、工业化转型的进程,API + Prompt的组合,托管化、模板化、分段化,是其技术动力所在。当创作者们不再仅仅是“对着镜头写剧本”,而是一系列“打API + 调Prompt”的技术操作,短剧就成了产品化、产能化、商业化的正规实践场。
下一个爆款,不再是“谁能拍”,而是“谁能调API”!