Image data
- methods to represent image data:
- RGB(r,g,b)
- r,g,b → [0, 255]
- White: (255,255,255)
- Grayscale
- Image size
- Size = Width * Height * #Channels
- Image datasets:
- ImageNet
- MNIST
- CIFAR-10
- …
Unsupervised: Image clustering
- Basics:
- Cluster image based on their “similarity”
- Visual, distance, correlation, …
- Extract features by whatever the fuck method and use k-means to cluster
- Useful when labels are unknown and labeling is expensive
- Can be subjective and inaccurate
Supervised: Deep Feedforward Ntwk
- Basics:
- AKA Multilayer perceptrons
- Feedforward because it’s only one directioned
- two direction shits are called feedback connections
- Deep because there are many hidden layers
- Depth: Number of hidden layers
- Width: Number of nodes on ONE hidden layer
- Why deep not fat: one layer can be overly fucking large to achieve the same thing with multiple layers, and other reasons
- Weights & Bias: associated with each neurons
- Activation function
- $\sigma(z)$
- Basically f(z) but we are using sigma for whatever the fucking reason
- How exactly does this fucking thing work:
- On each neuron, it takes all the $a$ from the previous layer
- It then calculates a value $z$ by using all the taken fucking $a$sses
- $z = a1w1 + a2w2 + … +akwk + b$
- $w$ and $b$ is constantly being adjusted
- $w$ is assigned to each edges
- Use an activation function to convert $z$ into $a$ that is to be used by next layer
- $z → \sigma(z) → a$
- That $a$ is to be used by next layer
Model training
- Parameter shits
- Parametric: logistic classifier
- Non-Parametric: KNN
- Hyperparameters: predefined / fixed
- Parameters: updated during model training
- Loss function
- How does the final result deviate form the actual result, the smaller the better
- $\large{L{data} = \sum{i=1}^NL_i}$
- For classification task, loss function can be defined by using Cross-entropy
- Need to control overfitting using regularization methods
- Gradient descent
- how to find the best:
- Calculate L
- Update W in next step as:
- $W{k+1} ← 𝑊k − \alpha∇_WL$
- $b{k+1} ← bk − \beta∇_bL$
- Repeat until gradient is small enough or k reached a limit
- Number of parameters:
- param number = input_shape x layer width (W) + layer width (b)
- 1st hidden layer: input_shape = shape of input
- later hidden layer & output layer : input_shape = layer width of prev layer
- for output layer, layer width = output types
Comments are closed