[ixpmanager] SFLOW Under Reporting?

André Grüneberg andre.grueneberg at bcix.de
Fri Jun 23 07:59:45 IST 2023


Hi Ian,

On Thu, 22 Jun 2023 at 17:26, Ian Chilton <ian at lonap.net> wrote:

> On 2023-06-22 14:32, André Grüneberg wrote:
>
> Are you by any chance using vlan translation or L2 sub-interfaces (with
> non default VLAN IDs) on your Arista gear?
>
>
> We have just started using L2 sub-interfaces in the last few months and it
> seems this has been a problem for longer for this.
>
> We're aware that traffic on sub-interfaces won't be counted, but that only
> accounts for a fraction of the discrepancy we are seeing.
>
> In addition, the member who was reporting inconsistency with their ports
> and their peers are not using sub-interfaces so that's not a factor there.
>

This does not matter. All traffic coming from others to this one member is
NOT being measured on the member's port but on others' ports.
Presuming that you have sFlow enabled only on edge ports and your're
generating sFlow for inbound traffic flow (the usual setting for Arista).


>
>
> In these cases the sFlow packets contain the VLAN ID on the wire and will
> not be matched into the right buckets.
> We "enhanced" the sflow collector script with some hack to map the VLAN
> ID. [I may go into details]
>
>
> As I say, a different problem, but one that's on my list to fix so would
> be interested in what you did here.
>
> Presumably it's just a case of extracting VLAN -> VLAN mappings of
> subinterfaces and substituting that in sflow data as it's processed?
>

Yes, it's intoducing a mapping of the tuple (agent, interfaceid, vlanid) ->
peering VLAN ID ... so the rest of the script can digest the flow as
"peering traffic". :)


>
>
> We believe that our results (https://www.bcix.de/ixp/statistics/vlan) are
> very close to reality.
>
> Interesting! - so right now you're doing 507Gbps according to MRTG and
> showing 349G (v4) + 72G (v6) = 421G with sflow.
>
Well, there are some "heavy" PVLANs that can easily account for ~50G at
that time. And the remainder is within our acceptable error margin of 10%.


> Are you using Arista too? - what sample rate?
>
Yes, mostly we are running smapling rate 16384 -- same as yours.


> I have just found another smoking gun - when running sflow-to-rrd-handler
> with debug mode, I see a lot of dropped/rejected flows. Some (most?) of
> these seem to be sub-interfaces, but it turns out that some MACs are not in
> the discovered macs table, so I need to investigate that further, but now
> we are using MAC ACLs, we'd probably be better switching to configured macs.
>

Indeed, using learned MACs was one of our major issues in the beginning.
This always required getting Port-Channel names correctly (had some severe
issues with that on Dell, back in the times). When we moved to Arista, we
immediately switched to configured MACs also working towards config
automation (generation of MAC ACLs).

While using learned MAC table we always had to "amend" the DB table with
some manual additions.

The migration towards configured MACs is rather simple ... so go for it!

André

-- 
André Grüneberg, Managing Director
andre.grueneberg at bcix.de
+49 30 2332195 42

BCIX Management GmbH
Albrechtstr. 110
12103 Berlin
Germany

Geschäftsführer/Managing Directors: Jens Lietzmann, André Grüneberg
Handelsregister: Amtsgericht Charlottenburg, HRB 143581 B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.inex.ie/pipermail/ixpmanager/attachments/20230623/6127ea2a/attachment-0001.htm>


More information about the ixpmanager mailing list