Flux LoRA Dataset Composer

Preparing Data Sets for LoRA Training: Foundational Concepts and Usage on Varoriya.com for the Flux Model

The fundamental concepts of Artificial Intelligence (AI) learning in the context of LoRA (Low-Rank Adaptation) training, including essential steps for preparing the Data Set, to achieve good and accurate image generation results. The focus is on utilizing the new Flux model technology on the Varoriya.com platform.

1. Principles of AI Learning: Context and Terminology

AI learns similarly to how humans learn a new language by attempting to connect and guess meaning from surrounding contexts, even if they do not know every word. For example, learning what an “Apple” is and that it is edible comes from encountering it in various contexts.

In LoRA training, we cannot use a pen to circle an image and tell the AI what we want it to learn. Instead, we must let the AI learn from the Context and Data that we simultaneously provide.

Trigger Word (Specific Terminology): When we want the AI to learn our face—which is not a globally famous person (like Iron Man) already known in the main Checkpoint model—we must define a Trigger Word or specific term (e.g., “1next”). Using this specific word helps the AI distinguish our face from general terms the AI already knows, such as “One Man” or “One Girl”.

Collaboration with Captions: General training data uses the images we wish to train plus accompanying text (captions). These captions help specify what is in the image; for instance, if the image shows a man wearing glasses, sitting on a chair, with a living room background, these contexts are detailed in the caption. The AI reads this text and attempts to connect the unknown word (like “1next”) to the features remaining in the image that it has not previously learned. Eventually, the AI will realize that the recurring feature across all images, despite the changing contexts it recognizes (like the living room or the shirt color), is the specific subject associated with the unknown Trigger Word.

2. Data Set Preparation: Key Factors to Avoid Errors

Clear Data Set preparation is extremely important, especially because in basic training utilizing raw image data alone (Data Set only), the AI must filter the information itself based solely on the data we provide.

2.1 Image Quality and Size

The Flux LoRA model should use images around 1024 x 1024 pixels (with minor variation). If small images (e.g., 500 x 500) are used, the resulting image will not be clear. If there are very small (e.g., 300 x 300) and blurry images, new ones should be found. However, if necessary, the platform offers tools to Upscale the image to improve clarity.

2.2 Cropping

To ensure the AI clearly learns the intended part, it is necessary to Crop out unimportant sections.

  • Face Context: If training a face, one should not crop it down to just a “round chunk”. Instead, include the neck and perhaps the collar so the AI understands that this “round chunk” is a face situated on a human neck (“One Man”). If only the face is cropped, the AI may be confused about what the “chunk” is.
  • Positional Diversity: Cropping can vary; the face does not need to be in the same position every time. For example, leaving ample space at the top or cropping to the left/right introduces data diversity, preventing the AI from assuming the face must always be in the left corner.

2.3 Background Management

The background should not be “too cluttered”. Recommended backdrops are simple or recognizable environments that the AI already knows (e.g., Green Screen, bathroom, living room, outdoor scenes). Since the AI recognizes these backgrounds, it will filter them out, allowing it to focus on the recurring features we want it to learn (like our face).

2.4 Training Issues: Overfitting and Underfitting

Overfitting (Learning Too Much): Overfitting occurs when the AI learns specific, repetitive features in the dataset too thoroughly, limiting its memory to only the observed images.

  • Example: If we use many pictures where the subject is always facing right with the same background, the AI will memorize that the face must always turn right. When generating an image, the AI will produce the face in that specific angle. Similarly, if trained only with a green screen, the AI will associate the subject only with that background color.
  • Impact: The AI will be unable to adapt or correctly generate the face from different angles. Overfitting means the AI memorizes the exact, specific form, position, and rotation seen during training, rather than learning general characteristics.

Underfitting (Insufficient Data): Underfitting happens when the AI receives too little or insufficiently complex data to establish the characteristic structure or pattern.

  • Example: If only three face images are provided (front, looking left, looking down), when the AI is commanded to generate an image looking up, the resulting face may not look similar. Using only one model’s face for LoRA training, even if backgrounds are varied, restricts the primary facial data, limiting the output to that single view.
  • Impact: The AI fails to form complete recognition or connections, causing generated images in new perspectives to look dissimilar.

Therefore, the dataset must have diversity in terms of viewing angles, poses, and situations to prevent both Overfitting and Underfitting.

3. LoRA Training Steps on Varoriya.com

Flux LoRA Training on Varoriya.com is straightforward once the Data Set is prepared.

  1. Selecting Training Type: There are two main types:
    • Explorer: Training and generation are available exclusively on the Varoriya website, and the LoRA file cannot be downloaded for external use. This is suitable for users who want to experiment first.
    • Warrior: The LoRA file can be downloaded for use elsewhere. If the results from Explorer are satisfactory, it can be upgraded to Warrior later.
    • Privacy Settings: If the file is not set to Public, the LoRA file remains Private, meaning only the user can utilize it.
  2. Data Set Preparation on the Website (Dataset Composer):
    • The user can upload image files (JPX, PNG) or a Zip file containing the images (without subfolders).
    • Tools are available for Crop and Upscale of images.
    • The Auto Caption function uses Varoriya AI to read and specify image details (e.g., man with dark hair, sitting on a chair). This can be used for checking or editing, although it might not be utilized in Data Set-only training.
    • Once all images are prepared, click the Pack Dataset button.
  3. Starting Training and Usage:
    • Define the Trigger Word (e.g., “1next”) and the contextual Keyword for the data (e.g., “Man”).
    • Select whether to use Auto Caption (for Data Set-only training, the system may manage this automatically).
    • Click Start Training, which takes approximately 30–60 minutes.
    • Once training is complete, navigate to the Single LoRA page to begin generating images by entering the Prompt and the Trigger Word.
    • The Weight of the LoRA can be adjusted (e.g., 0.6, 0.8, 0.9) to determine which value makes the face look the most realistic.

If the results are unsatisfactory, such as the side profile images not looking realistic, it suggests a lack of clear Data Set images from that specific angle, and that data must be added for complete learning. (There is also a tool called “integrate laola” for uploading LoRA files trained elsewhere to Varoriya for generation).