Computer Networking

INTRODUCTION#

what is internet?

ans) a network of billions of computing devices called hosts and end systems. The internet grew out of ARPANET in 1970s. the data connection is the internet the wires the routers the switches etc. ARPANET approced tcp/ip in 1983 (birthdate of modern internet)

Arpanet

what is web?

it is the worldwide network of interconnected hyper text pages. web is the application of internet and the dns. this was invernted by CERN.

Apart form web there are other applications of internet also such as email,ssh, bitTorrent,voice-over-ip telepony, FTP , remote Desktop etc.

who controlls int internet?

ICANN (Internet Corporation for Assigned Names and Numbers) assing ip addresses. operates 13 root dns servers IETF ((Internet Engineering Task Force) develops protocol of how the devices interact

Internet is distributed and loosely coupled. A network protocol defines proper communication pattern between devices.

Packet switching vs circuit switching

Circuit switching establishes a dedicated communication path between devices before data transfer begins. This path remains exclusively reserved until the session ends where as Packet switching breaks data into small chunks (packets) that travel independently through the network.

Switching here means the way link is shared by competing users.

Throughput :- it is the rate of data transfer. Latency :- It is the delay of an action.

Packet switching#

Packet switching breaks data into small chunks (packets) that travel independently through the network.

Data is divided into packets with addressing information
No pre-established path exists; packets find their own way
Network resources are shared dynamically among all users
Receiving end reassembles packets into the original data
Example: When streaming videos online, data travels as thousands of independent packets

The internet was designed to be best effort delivery system and hence no effort was made to implement confirmation that the some packet is dropped. we overcame this using protocols such as tcp.

souces of packet delay#

transmission
processing
queue
propagation

peering link and internet exchange point#

a visualization of the cabes around the world ↗

Packets and layers in the internet#

There are a lot of differnt layers and protocols which basically serve as an abstraction.
Layers add additonal header bytes to the packets. These layers are designed on the basis of popular software engineering pattern called separation of concerns.

7 layer osi model

sockets#

these are at a high level the connection medium for the computer to talk to the web. we just have to specify which port to connect and which ip and each TCP UDP packet has a unique 16 bit port number. The OS manages ports and only one process may listen on a given port. This is also how the different tabs on same browser get the correct reponse even though the whole ip is same on one pc.

Sockets are also viewed as a software abstraction of a network connection.

Application laeyer protocol#

Peer-to-peer architecture#

Napster technically built p2p

HTTP#

HTTP is the foundation of data communication on the World Wide Web, enabling the transfer of hypertext documents between clients and servers. It’s an application layer protocol that forms the basis for web browsing, APIs, and many other internet services.

HTTP transaction/flow steps#

URL parsing
DNS Resolution
TCP Connection establishment
HTTP Request Construction
Request Example
Server Processing
HTTP Response Construction
Response processing by browser
Content Handling by the browser
Connection is closed or kept alive depending upon the use-case

SMTP#

It is Simple mail transfer protocol and uses port 25 to send and recieve emails.Below is an example smtp connection and conversation

sender’s side

$ telnet mail.example.com 25
Trying 192.168.1.100...
Connected to mail.example.com.
Escape character is '^]'.
220 mail.example.com ESMTP Postfix

EHLO client.example.org
250-mail.example.com
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-STARTTLS
250-AUTH PLAIN LOGIN
250-AUTH=PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN

MAIL FROM:<alice@example.org>
250 2.1.0 Ok

RCPT TO:<bob@example.com>
250 2.1.5 Ok

DATA
354 End data with <CR><LF>.<CR><LF>
From: "Alice Smith" <alice@example.org>
To: "Bob Jones" <bob@example.com>
Subject: Project Status Report
Date: Wed, 2 Oct 2025 14:30:22 -0400

Hi Bob,

I've completed the network configuration task we discussed yesterday. 
All the routing tables are updated, and the new firewall rules are in place.

Could you please review the changes and confirm if everything looks good?

I've also attached the documentation in our shared folder.

Best regards,
Alice
.
250 2.0.0 Ok: queued as 8F3E92A5B3

QUIT
221 2.0.0 Bye
Connection closed by foreign host.

plaintext

$ telnet mail.example.com 25
Trying 192.168.1.100...
Connected to mail.example.com.
Escape character is '^]'.
220 mail.example.com ESMTP Postfix

EHLO client.example.com
250-mail.example.com
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-STARTTLS
250-AUTH PLAIN LOGIN
250-AUTH=PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN

MAIL FROM:<bob@example.com>
250 2.1.0 Ok

RCPT TO:<alice@example.org>
250 2.1.5 Ok

DATA
354 End data with <CR><LF>.<CR><LF>
From: "Bob Jones" <bob@example.com>
To: "Alice Smith" <alice@example.org>
Subject: Re: Project Status Report
Date: Wed, 2 Oct 2025 16:15:45 -0400

Hello Alice,

Thanks for your quick work on this! I've reviewed the changes and everything 
looks perfect. The firewall rules are working as expected.

One small suggestion: could we add a rule to allow SSH access from the 
development subnet (10.20.30.0/24)?

Otherwise, great job on the documentation too.

Regards,
Bob
.
250 2.0.0 Ok: queued as 9G4F03B6C4

QUIT
221 2.0.0 Bye
Connection closed by foreign host.

plaintext

smtp is also stateful

DNS#

DNS maps host names to IP addresses. Nearby IP addresses are usually physically closer togeater in the network and left-most bits are for usaually the global location and right-most are for local.

Dns hirearchy#

There are boadly 3 level to the DNS server hirearchy.
1. Root DNS servers :- There are 13 root DNS servers maintained by the ICANN around the world. These are the servers where IP addresses of the all top-level-domains are hosted.
2. Top Leve Domain (TLD) servers :- These store the IP addresess of the authorative name servers for that domain.
3. Authorative Name Servers :- Thse server store the final ip address for the resolution of the request.

DNS Records Types#

A records - map host names to ip addresses.
AAAA record - same as A record but for IPv6.
NS record - list the nameservers for a given domain.
MX record - list the email servers.
CNAME record - list the aliases.
SOA record - store the information about who created these DNS record and how they should be cached.
SRV record - list server for any generic service.
TXT record - these are generic key-value storage.
SPF record - list the outbound mail servers for the domain.
PTR record - store reverse DNS records, IP address -> hostname.

Round Robin DNS#

Round Robin DNS is a load-balancing technique that distributes network traffic across multiple servers by rotating through a list of IP addresses in response to DNS queries.

How Round Robin DNS Works

Multiple A Records: The DNS administrator configures multiple A records (or AAAA for IPv6) with the same hostname but different IP addresses.
Rotation Mechanism: When a client requests the IP address for the hostname, the DNS server returns all IP addresses associated with that hostname but rotates the order with each response.
Client Connection: The client typically attempts to connect to the first IP address in the list it receives.

Reiable Transport System Across the Internet#

UDP#

UDP is a connectionless, lightweight transport layer protocol that provides a minimal set of services for application-level communication. Unlike TCP, UDP focuses on simplicity and speed rather than reliability and order.

UDP Packet structure

  0      7 8     15 16    23 24    31  
+--------+--------+--------+--------+
|     Source      |   Destination   |
|      Port       |      Port       |
+--------+--------+--------+--------+
|     Length      |    Checksum     |
+--------+--------+--------+--------+
|                                   |
|              Payload              |
|                                   |
+-----------------------------------+

plaintext

Source Port (16 bits): Optional; identifies the sending application
Destination Port (16 bits): Identifies the receiving application
Length (16 bits): Length of UDP header and data in bytes
Checksum (16 bits): Error detection for header and data
Payload: The actual application data

TCP#

TCP is usually implemented by the os. It handles the following:-

Ordering - Data must be chunke by sender and reassembled at the receiver end
Acknowladgement - Delivery of each packet is acknowladged Pacing - Sender adjusts the packet send rate so that neither reciever nor sender is over loaded

Naive ACK corrctness#

Do not send n util n-1 is received.
TImeout is used to decide when a packet needs to be resend.
It’s called stop and wait method also.
An acknowladge is always sent by the reciver

Pipelining#

Mnay packets are sent in parallel.
Many acknowladges are also sent as pckets are recieved.
All the packets are needs to be kept as any of them can be lost.
window size is a mechanism to control pace of network.
We must handle out of order delivery
ACK carry a acknowladgement number so sender can know it is for which packet

HOW IS PIPELING IMPLEMENTED#

GO BACK N#

demo ↗

Selective repeat#

demo ↗
max packet number size > half the window size there is also a problem that if we send to much data over some time in the network (like for example streaming a large movie from the netflix) then there can be case that the sequence number will be reset and for newer packets and sequence number is acknowladged for the okder packets which were already acknowladeged.

TCP#

TCP requires setting up a connection between two clients before start communication
In this we choose a set of random initial numbers to start with so that the new connections are not confused with the old packets and connections.
This way attacker will also have a difficulty in injecting the false packets.

TCP Packet structure#

This is a lot more complex protocol and has lot more headers.

tcp packet structure

TCP sequence numbers and acknowladgement numbers#

sequence numbers :- These incdicate the offset of the byte stream of the segment’s first byte.
Cummulative acknowadgements :- these send the next expected sequence numebres (like go back N). above is a sample exchange of request and response of server and client after handshake
- TImeouts are an important parameter here in tcp. TCP keeps only one timer for the oldetst sgment that is unacknowladeged.
- What should be the ideal value of the timer? because if the timer time is to short you send unwanted packets and if long the response will be slow. hence TCP mesures the round trip time and sets the time or more specifically adapts the timer according to that.
- How is the above mentioned time calculated for the setting the timer? the general formula to estimate that is
  NewEstimatedRTT = (1-ɑ)*PreviousEstimatedRTT + ɑ*LatestRTT .
  Here ɑ(alpha) is just a constant and is choosed as random but typically a small value
- Why have influnce of previousRTT in the formula? this is because if there is a sudden jump in the RTT but only once the value isn’t much deviated but if there are increae in RTT for a significant amount of time then the value will be adjusted slowly.
- After this we set the timeout time to be slightly greter than the NewEstimatedRTT. Why? Because the actual time of response maybe still greter than this calclualted average value.
- Now how much above to set it? this is where comes new thing called jitter. Now what is jitter ? jitter refers to the average variance in RTT.
```
DeviationRTT = (1-β)*PreviousDeviationRTT + β*(LatestRTT-NewEstimatedRTTFormPreviousFormula)  
```
plaintext
Typically β is taken around 0.125.
- Final TimeoutIntervalFormula
```
TimeoutInterval = NewEstimatedRTT + 4*DeviationRTT  
```
plaintext
- Intitially some value is taken like 1 second when we do not have any value form network.
- This allows us to adapt to changes in connection in realtime for over a very long open connection like watching a movie.
- TCP specification generally receommends that the the receiver waits before sending an acknwladgement as generally the applications send data hence the acknowladgement will be sent along with it automatically. This helps to reduce the load on the trafic. This is called delayed acknowladgement.
Different Events cases sumamry#

Event at Receiver	TCP Action Taken	Example Scenario
Arrival of in-order segment with expected seq #. All data up to expected seq # already ACK’ed.	Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK.	A web server sends a 1200-byte HTML file to a browser. The browser receives the first 600 bytes (Seq=79, Len=600). Since the browser has nothing to send back immediately, it starts a timer instead of sending an ACK right away.
Arrival of in-order segment with expected seq #. One other segment has ACK pending.	Immediately send a single cumulative ACK, ACK’ing both in-order segments.	Before the browser’s delay timer expires, it receives the second 600 bytes of the file (Seq=679, Len=600). The browser now immediately sends a single ACK=1279, acknowledging the receipt of all 1200 bytes.
Arrival of out-of-order segment (with higher-than-expect seq #). In other words, a gap was detected.	Immediately send duplicate ACK, indicating seq. # of next expected byte.	A server sends segment A (Seq=1000) and then segment B (Seq=2000). Due to network conditions, the browser receives segment B first. The browser, expecting Seq=1000, immediately sends a duplicate ACK=1000 to signal that it’s still waiting for that specific segment.
Arrival of segment that partially or completely fills gap.	Immediately send ACK if segment starts at beginning of gap.	The browser, having received segment B (Seq=2000) and sent a duplicate ACK=1000, now finally receives the missing segment A (Seq=1000). Because this segment fills the gap, the browser immediately sends a new cumulative ACK=3000.

TCP fast retransmit#

If ever there is a scenario that there are 3 or more than 3 dubplicate acknwladgements are recieved for a lost packet case which is case 3rd in above table then the sender immediately feels that something is wrong and dosen’t wait for the timer to expire but immediately sends the lost packet. More about this here ↗.

TCP flow and congestion control#

Peak throughput is proportional to the window size and inversly to the round trip time.
Controlling sender’s window size to prevent packet loss.
FLow Control -> prevent receiver’s receive buffer overflow. The simple TCP header called receive window is used to keep track of how many new bytes of data can the receiver get.
congestion control -> prevent overflow of router’s packet queues and network.Tracking this is a lot more complex process. more on this later.

TCP connection setup - three way handshake#

3 way handshake

SYN - Initiatior sends its parameters.
SYN-ACK - Listner sends its acknowladgements including its own parameters. +1 is done
ACK - Initiatior acknowladages or possibly share some data also if needed. +1 is done

Congestion control in TCP#

why is congestion control difficult?
When network is congested as router queues are full packets can be dropped, or long queues can also lead to expired timers. These lead to inefficency.
TCP does end to end congestion control.
Sender tries to estimate a congestion window(cwnd) which is number of packets that can be safely sent without causing congestion.
number of in-flight packets <= MIN(congestion window,recieve window)

TCP “RENO”#

it has three phases
- Slow start - initially no idea about the capacity and every RTT the amount of data sent is doubled until we have reached a congestion.
- Congestion Avoidance - we drop down to half the maximum and and then linearly the sending speed is incresed. this is again done until we face a congestion.
- Fast recovery - getting acknowladgements then continue increasing it until we stop recieving acknowladgements then repeat form congestion avoidancec phase.
Problems with TCP reno
- Even though the increase is exponential for very high throughput connection like optical fiber cables witch 10Gbs speeds even starting form something like 1500 bits/sec will take about 3.5 hrs of time and trillions of packets being already being sent at very low speeds. To overcome this we use higer exponential increase times such as cubinc increase.
- When RENO reaches the full capacity the queue start to fill up before dropping down and in some cases even resulting to full queues. This is not good as this results in what we are trying to control congestion along with increase in RTT.
- Increased Latency: Full queues lead to higher Round-Trip Times (RTTs) before congestion is detected.

Vegas congestion control#

Developed in the 1990s at the University of Arizona, Vegas was the first major delay-based congestion control algorithm.These algorithms help us to counter full queues problem that the RENO had previously.

Delay-Based Detection: Vegas monitors RTT variations rather than waiting for packet drops.
Calculates expected throughput: cwnd / BaseRTT (where BaseRTT is the minimum observed RTT)
Early Congestion Detection: By detecting increasing RTTs before packet loss occurs, Vegas can adjust sending rate proactively.
advantages

Stable Throughput: Maintains consistent sending rates with fewer oscillations
Lower Queuing Delay: Prevents large queue buildup at routers
Higher Network Utilization: Can achieve better throughput with lower packet loss
Reduced Retransmissions: Proactive nature results in fewer lost packets

Disadvantages

Unfairness with RENO: When competing with loss-based algorithms, Vegas backs off too early

BBR (Bottleneck Bandwidth and Round-trip propagation time)#

Developed by Google in 2016, BBR represents a more sophisticated model-based approach to congestion control.

Core Mechanism
- Explicit Modeling: BBR builds an explicit model of the network by estimating two key parameters:
- Bottleneck Bandwidth (BtlBw): Maximum data delivery rate
- Round-trip propagation time (RTprop): Minimum RTT when queues are empty
- Operating Point: BBR aims to operate at the optimal point where:
Sending rate equals the bottleneck bandwidth Amount of in-flight data equals BDP (Bandwidth-Delay Product = BtlBw × RTprop) State Machine: BBR cycles through different states:

Startup: Exponential growth to quickly discover bandwidth (similar to RENO’s slow start) Drain: Reduces in-flight data to drain queues ProbeBW: Periodic bandwidth probing to detect changes ProbeRTT: Periodically reduces in-flight data to measure RTprop accurately Pacing: BBR uses packet pacing to evenly space packet transmissions rather than sending them in bursts.
BBR Advantages
- High Utilization with Low Delay: Achieves near-optimal throughput without filling queues
- Resilience to Random Loss: Performance doesn’t degrade with non-congestion packet loss
- Better Performance on High BDP Networks: Excels on satellite, cellular, and high-speed networks
- Fairness Across Different RTTs: Less biased against long-RTT connections than RENO
BBR Limitations
- Implementation Complexity: More complex to implement than loss-based algorithms
- Measurement Precision: Requires accurate timing mechanisms
- Buffer Issues: Can still cause buffer overflow with extremely shallow buffers
- Fairness Concerns: Early versions had fairness issues when competing with other algorithms

Nagle’s algorithm#

Each TCP packet has almost 40 bytes of header overhead.
This is a special algorithm that combines all those common headers for the shorter message if the message to be sent is not the first message.
This algorithm is a bit slow as it instructts that the messages shotld be heald for a small period before sendig them.

SOCKET DIFFERENT CONFIGURATIONS#

TCP_NODELAY => This option disables Nagel’s algorithm. Every segment written is sent immediately if space is there in window size.
TCP_NOPUSH => It is a more aggressive version of the Nagle algorithm. It will wait until send buffer is full before sending the segment.
PSH bit => this bit can be set by sender to indiacte the receiving device that the data when ready to be immediately sent insead of buffring.

TCP keepalive header#

if there is no data to be sent a empty data segment is sent called keepalive in frequency of around 1 time/min so that the connection is not terminated.
it also forces the NAT routers to keep their port-mappings alive for that ip.

IP/Forwarding#

what do routers do ? how do routers know where to send which packet? how do we navigate the internet??… everything answeerd.

This is layer third of the several layers the IP packet layer.

IP(Internet protocol) delivers packets (datagrams). MTU (Maximum transmission Unit) limits the datagram size.

Routing is the proecess of guiding packets to their destination. It operates in 2 parts.
- Control Plane
- Data Plane

Each router is unknown of the end to end behaviour of the network but only knows to how to forward the packets to its neighbourhood.

IPv4 addressing#

Interface is defined as every device connected to network and havingg an ip address.
the ipv4 addressess consist of 4 numbrs form 0-255 seprated with a period(.) and is 32 bits long in version 4.
there is an ip address for each interface that is communicating to network on the host.

MAC address (Media Access Control)#

these are used during local communication.
These are hard coded form the factory and cannot be changed.
48-bit Identifier: Typically represented as six groups of two hexadecimal digits (e.g., 00:1A:2B:3C:4D:5E)
Manufacturer Assigned: The first 24 bits (OUI - Organizationally Unique Identifier) identify the manufacturer
Physical Layer Addressing: Used for device-to-device communication within the same network segment
Flat Addressing: Unlike hierarchical IP addresses, MAC addresses have no inherent routing structure
Types of MAC Addresses
Unicast: Identifies a specific network interface (most common)
Multicast: Identifies a group of devices (starts with 01:00:5E for IPv4 multicast)
Broadcast: Sent to all devices on a network segment (FF:FF:FF:FF:FF:FF)
Bit 7 of the first byte: 0 for unicast, 1 for multicast
Bit 8 of the first byte: 0 for globally unique, 1 for locally administered

IP subnets#

Subnets are chunks of IP addresses in the network that can communicate directly without travelling through the router.
Routers can send packets betweeent the subnets.
Subnet mask indicates the bit position where subnet is split. the x subnet mask means the number of bits that are common and after that the 32-x bits are for the differnt machine.
To understand why 255.255.255.240 is a subnet mask:
- Converting to binary: 11111111.11111111.11111111.11110000. The 1 bits represent the network portion, The 0 bits represent the host portion.
- CIDR Notation: This subnet mask has 28 consecutive 1 bits, so it’s a /28 in CIDR notation.
- With 4 host bits (the zeros), this subnet provides 2⁴ = 16 addresses.
- Usable addresses: 14 (excluding network address and broadcast address).
- If applied to network 192.168.1.16:
  - Network address: 192.168.1.16
  - Broadcast address: 192.168.1.31
  - Usable host addresses: 192.168.1.17 through 192.168.1.30
- When a device wants to send data, it performs a bitwise AND operation between the destination IP and subnet mask. If the result matches its own network, communication occurs directly.

Address Resolution Protocol (ARP)#

Bridges the gap between Layer 3 (IP) and Layer 2 (MAC)
Maps IP addresses to MAC addresses on a local network
Devices maintain an ARP cache of recently mapped addresses
Example: “Who has IP 192.168.1.5? Tell 192.168.1.1” → “192.168.1.5 is at 00:1A:2B:3C:4D:5E”
CIDR notation (123.100.16.0/28) specifies a range of addresses
Used both for specifying subnets and for routing rules.
/28 or 255.255.255.240 is called a subnet mask.

NAT (Network address translation)#

IPv4 addresses are in short supply (4 billion). ISP often will give you just one address, so how to connect multiple devices?

also called layer 4 load balancer.

below is a sample NAT translation

[Client]                    [NAT Router]                  [Internet Server]
192.168.1.10:4321           Public IP: 203.0.113.5        74.125.193.100:80
     |                              |                            |
     |  Request                     |                            |
     |----------------------------->|                            |
     | Src: 192.168.1.10:4321       | Src: 203.0.113.5:12345     |
     | Dst: 74.125.193.100:80       | Dst: 74.125.193.100:80     |
     |                              |--------------------------->|
     |                              |                            |
     |                              |        Response            |
     |                              |<---------------------------|
     |        Response              | Src: 74.125.193.100:80     |
     |<-----------------------------|  Dst: 203.0.113.5:12345    |
     | Src: 74.125.193.100:80       |                            |
     | Dst: 192.168.1.10:4321       |                            |

plaintext

DHCP (Dynamic Host Configuration Protocol)#

DHCP is a network management protocol that automatically assigns IP addresses and other network configuration parameters to devices on a network.

The steps for dhcp are:-

DHCP Discovery (DHCPDISCOVER) :- Client broadcasts a request for network configuration. Message uses source IP 0.0.0.0 and destination 255.255.255.255 (broadcast). Contains client’s MAC address as identifier
DHCP Offer (DHCPOFFER) :- DHCP server responds with available IP address and configuration. Message includes lease duration and server identifier. Multiple servers may respond with different offers.
DHCP Request (DHCPREQUEST) :- Client broadcasts which offer it accepts.Broadcasting allows other DHCP servers to withdraw their offers. Includes the server identifier of the chosen server.
DHCP Acknowledgment (DHCPACK) :- Selected server confirms configuration parameters.Officially assigns IP address to client. Provides all network parameters. Sets lease duration.

below is a dhcp packet structure

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     op (1)    |   htype (1)   |   hlen (1)    |   hops (1)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            xid (4)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           secs (2)            |           flags (2)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          ciaddr (4)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          yiaddr (4)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          siaddr (4)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          giaddr (4)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          chaddr (16)                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          sname (64)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          file (128)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          options (variable)                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

plaintext

DHCP Options

Option 1: Subnet Mask
Option 3: Router (Default Gateway)
Option 6: DNS Servers
Option 15: Domain Name
Option 51: IP Address Lease Time
Option 53: DHCP Message Type
Option 54: DHCP Server Identifier
Option 58: Renewal Time Value (T1)
Option 59: Rebinding Time Value (T2)
Option 82: Relay Agent Information (used for DHCP snooping)

DHCP Implementation

server side :-
Microsoft DHCP Server: Windows Server
ISC DHCP: Common on Linux/Unix
Dnsmasq: Lightweight option for small networks
Router DHCP: Built into most home/SMB routers

Routing inside routers and algotithms#

Routers use forwarding tables to direct IP packets to the next hop

below is an example

Destination	Subnet Mask	Next Hop	Interface	Metric
0.0.0.0	0.0.0.0	203.0.113.1	eth0	1 (Default route)
192.168.1.0	255.255.255.0	Directly Connected	eth1	0 (Local subnet)
192.168.2.0	255.255.255.0	Directly Connected	eth2	0 (Local subnet)
10.10.0.0	255.255.0.0	192.168.1.20	eth1	10 (Internal network)
172.16.50.0	255.255.255.0	192.168.2.100	eth2	5 (Branch office)
203.0.113.0	255.255.255.0	Directly Connected	eth0	0 (ISP network)
8.8.8.0	255.255.255.0	203.0.113.1	eth0	15 (External specific route)

How routers use this table

When a packet arrives, the router extracts the destination IP address
The router compares this address with each entry in the forwarding table
It performs a “longest prefix match” - finds the entry with the most specific (longest) matching network prefix.
The packet is then forwarded to the next-hop IP or directly to the interface specified in the matching entry.
If no match is found, the default route (0.0.0.0/0) is used

Mordern routers architecture#

INPUT Ports#

Each input port has a copy of the routing table decidied by the control plane using algorithms such as BGP. Using these table the input ports figure out where to send the data to. below is a simplified structure of these.

Switching Fabric#

This is the connection between input and output ports.
Switching Rate -> The theoritical maximum rate of data transfer from all input to output ports. Ideally it should be very fast and near about (number of inputs * average input line rate)
There are two types of switching fabric :-
- Bus -> This is the simplest design and can only be used by one input port at a time. This is genrally used in commercial routers. A longer queue is often maintained.
- Crossbar -> This is a much complex design and can allow multiple inputs to switch at a time by switching connections appropriately. This requires m*n points or switches for m input and n output ports. It is used in the core server routers.

OUTPUT Ports#

The data is sent out from this to required destination.
Buffering is required because switch fabric may be faster than physical output link and hence packets gets buffered.
A queued packet can be scheduled if desired on basis of priority.
- Give higher priority to certain types of packets based on origin/ip/destination etc.
- Net Neutrality is the policy debate about weather the ISPs can do this or not.

Weighted Fair Queuing (WFQ)#

Traffic is divided into separate flows (based on source/destination IP, ports, protocol, etc.)
Flows receive different weights based on priority or requirements
Bandwidth is allocated proportionally to weights
WFQ operates on a theoretical model called Generalized Processor Sharing (GPS)
WFQ simulates a system that could serve one bit from each flow in round-robin fashion (though real packets are indivisible)
Each packet is assigned a “virtual finish time” - when it would complete service in the theoretical bit-by-bit system.
The packet with the smallest virtual finish time is selected for transmission.
Some varients include :- Class-Based WFQ (CBWFQ),Low Latency Queuing (LLQ),Deficit WFQ (DWFQ).

Centralized vs Distributed algorithms#

Centralized/global routing
- Algorithm has full knowledge of the entire network.
- Makes decisions that affect all routers
- In routing, we call these link state algorithms.
- Used within an organization (autonomous system) (eg., OSPF)
- Dijkstra’s Algorithm is a fast centralized algorithm where routers initially flood/broadcast local link information to entire network and them each router solves the shortest path from itself to other routers.
distributed/local routing
- Each router must decide its own routing table using local observations.
- Operates iteratively.
- Routers continually share information with neighbors
- Global information is gradually propagated across the network.
- Used within and between autonomous systems (eg., RIP & BGP, respectively)
- Distance Vector (DV) algorithm is a distributed shortest path algorithm where initially, routers only knows distance to neighbors – broadcast to neighbors and when receive a neighbor’s DV, update own DV, & broadcast if DV changed.
Interiror gateway protocols (Intra AS-routing)
- Centralized IGP
  - Open shortest path first (OSPF)
  - Intermediate system to Intermediate System (IS-IS)
- Distributed IGP
  - Routing information Protocol
  - Enhanced interior gateway routing protocol (EIGRP)
Disatnce Vector Algorithm

Below is a sample table

Destination	Metric	Next Hop	Interface	Timer
192.168.1.0/24	0	Direct	eth0	-
192.168.2.0/24	0	Direct	eth1	-
10.0.0.0/16	1	192.168.2.254	eth1	120s
172.16.0.0/16	2	192.168.2.254	eth1	120s
172.17.0.0/16	3	192.168.1.100	eth0	90s
192.168.5.0/24	1	192.168.1.100	eth0	120s
192.168.6.0/24	2	192.168.1.100	eth0	120s
203.0.113.0/24	1	192.168.2.254	eth1	120s
8.8.8.0/24	3	192.168.2.254	eth1	60s
0.0.0.0/0	1	192.168.2.254	eth1	120s

Destination: Network address with subnet mask
Metric: Distance to the destination (typically hop count in basic DV algorithms)
Next Hop: IP address of the next router to forward packets to
Interface: Local interface to use for forwarding
Timer: Expiration timer for the route (used to detect stale routes)
Directly connected networks (metric 0) are 192.168.1.0/24 and 192.168.2.0/24
Networks with metric 1 are one hop away through adjacent routers
Networks with metrics 2 and 3 are farther away, reached through intermediate routers
The default route (0.0.0.0/0) sends any unmatched traffic to 192.168.2.254
Each network node has its own Distance vector listed as follows <cost,next_hop> to the next destination.
When a distance vector changes at one node after re-calculation it notifies the other nodes.
Re-calculation is done when the local link cost changes.
If a link cost is decreased it is a good news and propogates quickly
The entire table like above is sent to neighbours and received from them.

Count to infinity problem#

The “Count to Infinity” problem is a fundamental issue in distance vector routing protocols like RIP (Routing Information Protocol) where routers can get stuck in a routing loop, continuously incrementing route metrics toward an unreachable destination.

Router A ---- Router B ---- Router C ---- Network X

Initial State
- Router C directly connects to Network X (cost 0)
- Router B can reach X via C (cost 1)
- Router A can reach X via B (cost 2)
When Network X fails
- Router C detects the failure and marks X as unreachable
- Before C informs B, B sends its regular update to A saying X is reachable (cost 1)
- A updates its routing table: X is reachable via B (cost 2)
- B receives C’s update that X is unreachable
- B checks its alternatives and sees A advertising a path to X (cost 2)
- B updates: X is reachable via A (cost 3)
- A receives B’s update and updates: X is reachable via B (cost 4) This cycle continues with metrics incrementing until reaching “infinity”

Solution to count to infinity#

A few solution to above problem are discussed below

Split Horizon with Poison reverse
- Principle: Actively advertise routes learned from a neighbor back to that neighbor, but with an infinite metric
- Example: B tells A that X is unreachable through B (infinity)

Autonomous Systems#

In reality not all routers are equal due to scaling and administrative purposes. The internet is divided into 100k autonoumous systems having AS numbers given by ICANN.Routers in an AS run one interior-gateway (intra-AS) routing protocol. Border Gateway Protocol (BGP) is used by all border gateways to route traffic between AS’s.

OSPF (Open Shortest Path First)#

OSPF is a widely used link-state routing protocol designed for IP networks. As an Interior Gateway Protocol (IGP), it operates within a single Autonomous System (AS) and is one of the most popular protocols for large enterprise networks.

While distance vector protocols (like RIP) share entire routing tables with neighbors, OSPF shares information about the state of directly connected links
Each OSPF router maintains an identical database that represents the entire network topology
Each router independently computes optimal routes using Dijkstra’s Shortest Path First (SPF) algorithm
Messages are authenticated to prevent malicious tampering.
Multiple same-cost paths are allowed.
Links can have different costs for different types of traffic.
Hierarchical OSPF can be used in large AS’s.
Generally, an AS can be divided into multiple private AS’s internally, but look like one AS to outside Internet.

                      | boundary router
                      ▼
          ┌───────────────────────┐
          │                       │
    ┌─────┴─────┐           ┌─────┴─────┐
    │           │           │           │
    ▼           ▼           ▼           ▼
┌─────────┐     │      ┌─────────┐      │
│backbone │◄────┴─────►│backbone │◄─────┘
│ router  │            │ router  │
└────┬────┘            └────┬────┘
     │                      │
     │        backbone      │
     │        (Area 0)      │
     │                      │
┌────▼────┐            ┌────▼────┐
│  area   │            │  area   │
│ border  │            │ border  │
│ router  │            │ router  │
└────┬────┘            └────┬────┘
     │                      │
     ▼                      ▼
  ┌─────┐                ┌─────┐
  │     │                │     │
┌─▼─┐ ┌─▼─┐            ┌─▼─┐ ┌─▼─┐
│   │ │   │            │   │ │   │
▼   ▼ ▼   ▼            ▼   ▼ ▼   ▼
┌───┐ ┌───┐            ┌───┐ ┌───┐
│   │ │   │            │   │ │   │
└───┘ └───┘            └───┘ └───┘
internal routers      internal routers
   area 1                area 2       area 3

plaintext

Border gateway protocol (Inter-AS routing)#

BGP is the Internet-standard routing protocol for connecting AS
eBGP: allows neighboring AS’s to share their subnet reachability information: “I can reach 3.2.4.0/24 in 4 hops”
iBGP: propagates reachability information to all AS-internal routers.
below is a sample BGP advertisement

UPDATE MESSAGE
+----------------------------------------------------+
| Prefix          | 192.0.2.0/24                     |
| NEXT_HOP        | 203.0.113.1                      |
| AS_PATH         | 65001 64512 64513                |
| ORIGIN          | IGP                              |
| LOCAL_PREF      | 100                              |
| MED             | 10                               |
| COMMUNITY       | 65001:200, 65001:3000, no-export |
+----------------------------------------------------+

Info:-

ORIGIN: IGP
Indicates how the route was originally learned.

LOCAL_PREF: 100
Used within an AS to prefer one exit point over another
Higher values are preferred
Not passed to external BGP peers

MED: 10
Multi-Exit Discriminator - suggests to neighboring ASes which entry point to use
Lower values are preferred

COMMUNITY: 65001:200, 65001:3000, no-export
Tags attached to routes for implementing routing policies
"no-export" means this route should not be advertised outside the receiving AS

plaintext

BGP Session#

two BGP routers (“peers”) exchange BGP messages
advertise paths to different destination network prefixes (network prefixes are the nodes/endpoints in this shortest-path problem).
exchanged over semi-permanent TCP connections
AS can aggregate prefixes in its advertisement. (ex:- 20.1.0.0/24 + 20.1.1.0/2 + 20.1.2.0/24 + 20.1.3.0/24 = 20.1.0.0/22)
AS-PATH: AS numbers through which the advertisement has passed. Indicates the AS-path that will be followed to reach the prefix.
NEXT-HOP: IP address of the router beginning the AS-PATH.
Prefix: Act like final destination.
In above example using the 3 fields as-path,next-hop,prefix we can say that “You can send traffic to 192.0.2.0/24 through my router 203.0.113.1, and it will travel through three AS’s before arriving.”
BGP advertise the full AS_PATH instead of just the path length because it allows to prevent loops. Like example if I am AS0 and I receive a route with path [2, 4, 0, 5], I should not use it.

BGP Route Selection in practice#

AS policy determines local preference for various routes. (A hard-coded preference based on financial cost, agreements, etc.)
Among routes with the highest local preference, choose route with shortest AS-PATH. (DV algorithm.)
If multiple options remain, use hot-potato routing, that is, choose the route whose NEXT-HOP is closest. • This considers the within-AS distance using an IGP such as OSPF.
If multiple options still exist, use a random tie-breaker (eg., BGP router id)

Types of Autonomous Systems#

Stub connects to just one other AS
Transitconnects to multiple AS’s and routes between them. example is tier 1 ISP.
Multihomedconnects to multiple other AS’s, but do not route between.

Packet delivery#

Types of delivery#

Unicast :- one to one node
Anycast :- to one of many possible nodes
Broadcast :- to all the nodes
Multicast :- to set of nodes
Geocast :- to many or one node of a specific location

IPv4 anycast hack#

The single IP address 8.8.8.8 maps to may different machines across the world, using a hack called IPv4 anycast.

Broadcast#

Internet generally dosen’t support broadcast as no information is relevant enough for whole internet. Some private networks use broadcast. But how do we broadcast without putting excess load on the network.

Ans) We use a MST(Minimum spanning tree) which allows the message to be received by all the nodes. But this requires all nodes to have a complete knowladge of the network and every node must calculate the same MST.

Summary#

Key Protocols by Layer

Application Layer (Layer 7)
- HTTP/HTTPS: Web browsing, APIs
- SMTP: Email transmission
- DNS: Domain name resolution
- DHCP: Dynamic host configuration
- FTP: File transfer
Transport Layer (Layer 4)
- TCP: Connection-oriented, reliable delivery
- Three-way handshake
- Flow control (receive window)
- Congestion control (RENO, Vegas, BBR)
- Sequence and acknowledgment numbers
- UDP: Connectionless, minimal overhead
- No reliability or ordering guarantees
- Lower latency
Network Layer (Layer 3) IP: Logical addressing and routing
- ICMP: Error reporting and diagnostics
- OSPF: Link-state routing within AS
- BGP: Path-vector routing between AS
- NAT: Address translation for private networks
Data Link Layer (Layer 2)
- Ethernet: Local area networking
- ARP: Maps IP addresses to MAC addresses
- PPP: Point-to-point connections

+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| Layer | Name       | Key Functions              | Protocols                      | Addressing     | Example Devices/Technologies |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 7     | Application| • End-user services         | • HTTP/HTTPS                   | URLs, Domain   | Web browsers, Email clients |
|       |            | • Data generation           | • SMTP                         | Names          |                             |
|       |            | • Service-specific functions| • DNS                          |                |                             |
|       |            |                             | • FTP                          |                |                             |
|       |            |                             | • SSH                          |                |                             |
|       |            |                             | • BitTorrent                   |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 6     | Presentation| • Data format translation   | • SSL/TLS                      | N/A            | Encoding converters         |
|       |            | • Encryption/Decryption     | • MIME                         |                |                             |
|       |            | • Compression               | • JPEG, PNG                    |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 5     | Session    | • Session establishment     | • NetBIOS                      | Session IDs    | APIs, Remote procedure calls|
|       |            | • Session management        | • RPC                          |                |                             |
|       |            | • Dialog control            |                                |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 4     | Transport  | • End-to-end connections    | • TCP (Connection-oriented)    | Ports (0-65535)| Load balancers, Firewalls   |
|       |            | • Reliability               | • UDP (Connectionless)         |                |                             |
|       |            | • Flow control              | • QUIC                         |                |                             |
|       |            | • Congestion control        |                                |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 3     | Network    | • Logical addressing        | • IP (IPv4, IPv6)              | IP addresses   | Routers                     |
|       |            | • Path determination        | • OSPF                         |                |                             |
|       |            | • Packet forwarding         | • BGP                          |                |                             |
|       |            | • Subnetting                | • RIP                          |                |                             |
|       |            |                             | • ICMP                         |                |                             |
|       |            |                             | • NAT                          |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 2     | Data Link  | • Physical addressing       | • Ethernet                     | MAC addresses  | Switches, Network adapters  |
|       |            | • Media access control      | • ARP                          |                |                             |
|       |            | • Error detection           | • PPP                          |                |                             |
|       |            |                             | • VLAN                         |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+
| 1     | Physical   | • Raw bit transmission      | • Ethernet physical            | None           | Cables, Hubs, Repeaters     |
|       |            | • Physical medium specs     | • USB                          |                |                             |
|       |            | • Signal encoding           | • Bluetooth                    |                |                             |
|       |            |                             | • DSL                          |                |                             |
+-------+------------+-----------------------------+--------------------------------+----------------+-----------------------------+

plaintext

INTRODUCTION#

Packet switching#

souces of packet delay#

peering link and internet exchange point#

Packets and layers in the internet#

sockets#

Application laeyer protocol#

Peer-to-peer architecture#

HTTP#

HTTP transaction/flow steps#

SMTP#

DNS#

Dns hirearchy#

DNS Records Types#

Round Robin DNS#

Reiable Transport System Across the Internet#

UDP#

TCP#

Naive ACK corrctness#

Pipelining#

HOW IS PIPELING IMPLEMENTED#

GO BACK N#

Selective repeat#

TCP#

TCP Packet structure#

TCP sequence numbers and acknowladgement numbers#

Different Events cases sumamry#

TCP fast retransmit#

TCP flow and congestion control#

TCP connection setup - three way handshake#

Congestion control in TCP#

TCP “RENO”#

Vegas congestion control#

BBR (Bottleneck Bandwidth and Round-trip propagation time)#

Nagle’s algorithm#

SOCKET DIFFERENT CONFIGURATIONS#

TCP keepalive header#

IP/Forwarding#

IPv4 addressing#

MAC address (Media Access Control)#

IP subnets#

Address Resolution Protocol (ARP)#

NAT (Network address translation)#

DHCP (Dynamic Host Configuration Protocol)#

Routing inside routers and algotithms#

Mordern routers architecture#

INPUT Ports#

Switching Fabric#

OUTPUT Ports#

Weighted Fair Queuing (WFQ)#

Centralized vs Distributed algorithms#

Count to infinity problem#

Solution to count to infinity#

Autonomous Systems#

OSPF (Open Shortest Path First)#

Border gateway protocol (Inter-AS routing)#

BGP Session#

BGP Route Selection in practice#

Types of Autonomous Systems#

Packet delivery#

Types of delivery#

IPv4 anycast hack#

Broadcast#

Summary#