JustSext · AI Roadmap · Q2/Q3 2026

An escalation
engine.

Four areas, one loop: meet the user, hook them, escalate the intimacy, deliver the moment, deepen the relationship. The LLM runs the conversation. Image, voice, and video back it up.

Author
Aloukik · AI Lead
Optimizing for
ETC · CTA · CVR
This sprint
Onboarding
Doc status
Active · v0.2
§ 01

At a glance

All four projects · one view
LLM
I · LLM
Image
II · Images
Voice
III · Voice
Video
IV · Video
Role
Brain of the loop
Visual content
Voice + phone calls
Deferred this quarter
Priority
P0
P0
P1
P2
Posture
Wire & tier
Build internal
Harden, instrument
Deliberate hold
Now
OpenRouter integration in review · backend not wired yet
Old + new model both running · no unified pipeline yet
Live in production · voice + phone working
No video generation yet · low priority
This sprint
Opener message + intimacy meter v1
SFW landing photos + 3-photo sequence
Funnel instrumentation
— (no work)
§ 02

This sprint

Onboarding · top priority
★ Sprint Focus · Onboarding Funnel

From hello to paywall — visuals from the first second.

Rebuild the first-user funnel: SFW curvy photo on landing → creator-specific opener → 3 escalating photos → soft paywall. Photos hook the eye before the chat even starts. The bet is that visual + chat working together in the first 90 seconds is what converts.

I
Step One
Landing Hook
SFW curvy photos shown right when user arrives — visual draw before any chat. Plus a creator-specific message 0. Both designed to make user reply.
II
Step Two
Photo Sequence
Three pre-generated photos drop during chat, getting more suggestive each time. Pulled from per-creator photoshoot library in escalation order.
III
Step Three
Paywall
Soft wall at peak interest — when user wants to see more. Unlock = "see her without the dress" + explicit content.
Primary: CVR ETC CTA Msg 0 → 1 lift Photo seq complete % Paywall view → pay
§ 03

End state

What this looks like when it works

Three views of the same system. The user journey shows the loop from a stranger landing to a deep paying user. The system map shows how all four projects compose around the shared intimacy state. The intimacy ladder shows the progression mechanic — and exactly where the paywall sits.

Flow · 01

The User Journey

From cold visitor to deep paying user. Top row is the user's experience; bars below show which project is active at each phase.
PHASE · 01 PHASE · 02 PHASE · 03 PHASE · 04 PHASE · 05 PHASE · 06 Arrival SFW photos + landing Hook opener · msg 0 3-Photo Sequence tease beats · pre-generated PAYWALL Unlock tier 3 · full creator Deepen voice · video · returning LLM IMAGE VOICE VIDEO runs the whole loop · decides what to say, when to tease, when to pivot SFW curvy photos · visual hook on arrival ★ 3 photos drop · suggestive escalation full library · NSFW unlocked available throughout · voice + phone live pre-onboarding · expression deepens with tier b-roll drops T0 T1 T2 T3 T3+ shared intimacy state · climbs as user progresses ↑
LLM · LLM
Image · Images
Voice · Voice
Video · Video
Paywall moment
Flow · 02

The System Map

All four projects compose around a single canonical intimacy state. LLM writes; the others read. Hydra is the substrate.
User Input text · voice · image ★ Safety Layer guardrails · content filtering LLM CONDUCTOR · LLM WRITES ↓ ↑ READS Intimacy State user × creator · persisted · canonical shared object TIER · TOPIC · MEMORY · MOOD · UNLOCKS READS READS READS Image photoshoot graph i2i · LoRA stack Voice voice · expression topic pivots Video b-roll · video peak-tier drops Hydra ROUTING · TIERING · COST TELEMETRY · FALLBACK POLICY SUBSTRATE
LLM writes (rose)
Reads (dashed)
Shared state (gold)
Substrate
Flow · 03

The Intimacy Ladder

Five tiers. The paywall sits at T2 → T3 — the moment of peak desire, not an arbitrary message count.
CLIMB ↑ PAYWALL · T2 → T3 T4 Devoted deep relationship · cross-modal · returning · loyal user UNLOCKS · video · b-roll · memory T3 Intimate paywall crossed · full creator NSFW potential unlocked UNLOCKS · explicit · voice calls ★ PAYWALL ★ T2 Engaged tease sequence · 3 photos drop · approaching peak UNLOCKS · suggestive imagery T1 Hooked opener landed · user replied · chat starts UNLOCKS · text chat · more SFW photos T0 Landed user just arrived · no chat yet SHOWS · SFW curvy photos Stages are unlocked by user behavior — not by message count or time spent. The paywall lands at the moment of peak interest, not at message N. LLM WRITES · IMAGE / VOICE / VIDEO READ
T0–T1 · Free tier
T2 · Tease (still free)
T2 → T3 · Paywall
T3 · Intimate (paid)
T4 · Devoted (deep)
§ 04

The four projects

LLM · Image · Voice · Video
ILLM Layer
LLM
Brain of the loop · routes everything · owns the intimacy meter
Priority
P0
Owner
LLM pod
Blocks
Onboarding, Pyg, Voice
Routes every conversation, owns the intimacy meter, decides when to pivot, when to tease, when to ask for a photo. OpenRouter integration is in review (300+ models accessible once it lands). Next: wire to backend, tier the calls (cheaper, faster models for guardrails and intent detection; smart models for the actual response), then build the intimacy meter and psychological hooks on top.
▸ Now
In Review
  • OpenRouter integration in review
  • 300+ models accessible (once it lands)
  • Statsig hookup planned for experiments
  • Backend wiring not started yet
▸ Next
Wire & Tier
  • Connect OpenRouter to backend
  • List every internal LLM call
  • Cheap model for guardrails + intent
  • Smart model for the actual reply
  • Set speed targets per call type
▸ Then
Intimacy Meter
  • Build intimacy meter (3–4 stages)
  • Paywall triggers at stage 2 → 3
  • Detect boredom · auto-pivot topics
  • Accept user image uploads
  • Use cheap model to read user images
▸ Later
Per-Creator
  • Fine-tune LLM per creator
  • Long-term memory of past chats
  • Intimacy carries across sessions
  • Smarter topic-matching engine
★ The Bet

Using cheaper models for routine calls (guardrails, intent, routing) cuts cost a lot without users noticing. The intimacy meter turns the conversation into a game users want to climb — and the paywall lands at the moment they want it most, not at message N.

⚠ The Risk

If we swap the global LLM without writing down every internal call first, we might silently break tone or safety. We'll only find out when conversion drops. Need to list every call and which tier it uses before swapping.

IIImage Layer
Image
Consistent creator photos · serves the tease sequence
Priority
P0
Owner
Image pod
Blocks
Voice cross-modal, Video
We have two models today: old (slow but very detailed — good for tattoos and fine features, weak at NSFW) and new (under 5 seconds, NSFW LoRAs, good consistency — but struggles with tattoos and fine details). Neither alone is perfect. We need a per-creator photo library that runs from SFW curvy photos shown on landing through suggestive tease photos in chat through full NSFW after the paywall. End state: our own internal image-to-image model with per-creator LoRAs (base identity, undress, cleavage, outfit) feeding a pre-generated photo library the LLM pulls from during chat.
▸ Now
Two Models
  • Old: slow, detailed, weak NSFW
  • New: under 5s, good consistency, NSFW LoRAs
  • New struggles with tattoos / fine details
  • No unified pipeline yet
▸ Next
Internal i2i
  • Build internal i2i prototype
  • Base identity LoRA per creator
  • Pilot on 3 creators
  • Compare quality vs nano-banana
▸ Then
Photo Library
  • SFW curvy photos for landing
  • Suggestive LoRAs (cleavage, undress, outfit)
  • 30–50 photos per creator pre-generated
  • Tagged by tier (SFW / suggestive / NSFW)
  • LLM pulls right photo at right time
▸ Later
All Creators
  • Roll out to full creator catalog
  • Self-serve LoRA pipeline
  • Generate new photos in chat
  • Style/outfit experiments
★ The Bet

Pre-generated photo libraries beat live generation. The funnel — SFW on landing, suggestive in chat, NSFW after paywall — only works if every creator has a deep, consistent library. Speed and consistency here unlock everything else.

⚠ The Risk

LoRA training takes time and compute. Train too few creators and we won't know if it works for all. Train too many before validating and we waste runs. Nano-banana is tempting as a shortcut but weakens our long-term advantage. Need a clear policy for when it's used.

IIIVoice Layer
Voice
Voice + phone calls already live · stable, just need instrumentation
Priority
P1
Owner
Voice pod
Posture
Hardening, not expanding
Voice is in good shape — do not over-optimize. The work here is instrumentation, expressiveness, and parity with the intimacy state LLM owns. Photo-drops over voice wait until Image is solid; never ship that on the old image stack.
▸ Now
Stable
  • Production-ready, holding up
  • Quality benchmark established
  • No reliability fires
▸ Next
The Funnel
  • Drop-off events instrumented
  • Init → 1st reply → 60s → 3min → end
  • Shared intimacy state w/ LLM
  • Per-stage CVR baseline
▸ Then
Expression
  • Tease/expression tuning
  • Topic-pivot parity with LLM
  • Boredom-detection in voice signals
  • Whispers, laughs, breaths catalog
▸ Later
Cross-Modal
  • Photo drops mid-call (gated on Pyg)
  • Video send-link mid-call
  • Voice → image affinity model
★ The Bet

Voice is high-perceived-value, low-marginal-improvement right now. Best ROI is instrumentation. Once we know where users hang up, surgical expression and pivot tuning move CVR more than any model swap.

⚠ The Risk

Treating voice as "done" is dangerous — owners lose urgency, regressions creep in, a competitor leapfrogs while we're heads-down on images. Set a maintenance bar, not zero attention.

IVVideo Layer
Video
Low priority · deferred · no video generation work this quarter
Priority
P2
Owner
TBD · scoped only
Depends on
Image (full)
Most powerful modality, deliberately deferred. Video LoRAs are expensive in time and compute; surface area only pays off once the rest is solid. Only video work earning roadmap space now: generic creator B-roll — short ambient loops LLM can drop like high-value emoji at peak intimacy.
▸ Now
Hold
  • Deprioritized intentionally
  • No active engineering
▸ Next
B-Roll
  • 3–5s generic creator loops
  • Ambient, not narrative
  • Drop mechanic in LLM (rare, high-tier)
  • No LoRA training
▸ Then
Pilot
  • Per-creator video LoRA spike
  • Only after Image stable
  • Pilot on 1–2 creators
  • Quality + cost benchmark
▸ Later
Spectacle
  • Full video escalation track
  • Slot-machine pic mechanic → video
  • Modal-driven reveals
  • Voice + video calls
★ The Bet

Doing nothing on video is the right move now. Engineering attention is finite; image consistency is the bottleneck for the entire product. Cheap B-roll buys perceived motion without committing to the LoRA pipeline.

⚠ The Risk

Video is where the category is heading. Hold too long and a competitor ships consistent per-creator video first — we lose a positioning beat that's hard to recover. Re-evaluate the moment Image clears its first roster milestone.

§ 05

The spine

Cross-cutting infrastructure

The four projects share a spine. If the spine is weak, none of them work in concert.

α

Shared Intimacy State

One shared object per user-creator pair, saved across sessions. LLM updates it; Image and Voice read from it. Without this, the layers don't work together — voice won't know what chat already unlocked, photos won't escalate in step with the conversation.

β

Funnel Instrumentation

ETC, CTA, CVR — measured per modality, per onboarding step, per intimacy transition. Until we know exactly where users drop, we are tuning blind. Foundational, not optional.

γ

Safety Layer

CSAM and age-verification on every user-uploaded media surface, before context. Non-negotiable. Owns its own track inside LLM's vision pipeline. Get this wrong, nothing else matters.

δ

Hydra · Routing & Infra

The infra layer underneath all four projects. Handles model routing, cost tracking, and fallback when something fails. Already in motion — this roadmap rides on top of it.