Hey team,
OVMS installed on a 2016 leaf, 24 kwh working flawlessly for over a year.
The 12v battery died a few weeks back, while parked at the airport. Since then I've had a bunch of issues. Power management was turned on with sleeps on a low voltage warning.
Firstly the issues:
- GSM module and modem working fine
- No GPS satellites locked, 0.00, 0.00 location as a result
- wifi also working fine though I have experienced wifi ssids being dropped from config after a crash reboot.
- web server auth drops every 30 seconds or so. Eg I log in successfully and the check status or logs and then when trying something else, web server gets me to login again. Very annoying when going through initial setup.
- seems like there are excessive crash reboots though I have been unable to located any info in logs that might relate. The counts have been high.
- lastly, after a random amount of time, the ovms will shut down and become unresponsive until a full power recycle occurs. I thought this might be because of the GPS loc issue but that's just a guess. No power management settings have been set since replacing the 12v batt so that I can ensure my setup is solid first, then will turn it on again
- haven't checked the gps antenna itself yet though, I know they can be quite sensitive so it's a possibility it got nuked with the 12v dying.
Now the things I have done since replacing 12v batt:
- reflashed the module with the latest main release
- restored from backup config (same issues)
- reflashed and recreated my config manually, twice.
- disconnected and reconnected the GSM and GPS antenna connections external to the module
- reseated the pig tail connectors on the PCB GSM module for both ant1 and ant2
- confirmed antenna is receiving 4v on the GPS connection with multimeter, or module is sending 4v down the connector.
- logged a few days worth of verbose logs but nothing unusual (other than no GPS loc)Now Has anyone else been through a 12v batt die on them or had any of the same issues above?
I really don't want to have to order a new module as it takes months to get to the southern hemisphere, so I'd like to get the kit working as it was where possible. Obvs if it isn't possible then I suppose a new module will have to be the answer.
Cheers for any assistance on debugs or solutions.
Very strange.
Some other users have reported GPS issues over time, but AFAIK not related to 12V battery depletion. Possibly a coincidence? I'd try replacing the GPS antenna.
The web auth drops are most probably due to the frequent crashes. The only other reason I can think of would be the system date/time glitching, e.g. if GPS time is intermittently available and way off standard/network time. I think that would show in the logs though (?)
If changing the GPS antenna doesn't work, maybe the modem needs a factory reset? That would be command AT&F, and you might also check the AT+CFUN? result (should be 1) -- maybe the modem has somehow switched into factory test mode from the low voltage situation.
That doesn't explain the crashes though. You can't see crash log entries in the SD logs, only via USB.
Maybe a full reflash (including bootloader & partitioning) can help?
Regards,
Michael
Ahhh a modem reset sounds like a good call. Thankyou!
I've certainly experienced other linux based systems having amazingly weird issues in the past that were caused by ntp or gps time jumping about, so this is a very good avenue to check, even for the crashes. I did wonder about the bootloader and partitioning memory address mapping may have been corrupted due to low voltage also, but that's just a guess on my part.
I'll check these two things - Modem reset, and GPS antenna replacement next and will report back.
On that, do you have any guides on doing a full reflash that picks up the partition and bootloader as well as the firmware bin files?
OVMS# cellular cmd AT+CFUN?
+CFUN: 1
OK
Info on full reflash is here: https://docs.openvehicles.com/en/latest/userguide/factory.html#full-reflash-via-usb
Thankyou.
I've just had a chance to dig into the logs via USB and noticed in an 60 min time frame the module had 28 reboots.
The log entry is consistent each time (attached below for info and recording) as is the memory address being called, but i feel that a full reflash as you've linked to will be the best next move.
Just to provide a further update:
- Successful reflash of the module using esptool.py and the latest main branch bin files (from sept 2022)
- New GPS antenna tested
- Modem factory reset (AT&F) conducted succesfully
- AT+CFUN? returned a 1
Logs captured below (with personal details removed as far as i can see) with new GPS antenna connected, plugged into the car via ODB port, computer connected via USB tty. These logs are the last two reboots before it commanded a power down of the module. Essentially the same behaviour that i've been experiencing since the low voltage incident.
I'm starting to suspect that the GSM modem board has been corrupted somehow....though wifi connection works fine, it seems reboots occur during a GSM Tx or Rx session.
One thing i might try next is to formate the sd card and try with a fresh mount.
Grateful for your thoughts.
Very helpful debugging info.
That's a stack overflow in the TCP/IP task (tiT), which should never happen, regardless of the modem communication.
Good news is, GPS is working again:
So you get both GPS time and GPS location lock.
The TCP/IP stack size hasn't been changed lately, but check that first using command "module tasks" (shortcut "mo ta"). The "tiT" line normally looks like this:
That means out of the 3584 bytes of stack, 536 were in use at that moment, and a maximum of 2248 was reached over the uptime (.
If you login as quickly as possible, you may be able to watch the stack usage using repeated "mo ta" calls.
I added code to keep the last stack usage info over a crash, that could help here, but that's not in the latest main release. You need to install the latest "edge" release, then command "boot" will output the free stack info of the running tasks at crash time, e.g.:
You should see the tiT task in your case listed there.
If the system manages to connect to the V2 server and reach a stable state after the crash, it will also send that info as a crash debug log entry you can download from the server. But as the crash comes early, you need to quickly login and issue "boot" manually. Btw, after 5 early crashes, the auto init system will automatically be disabled, chances are it won't crash without network init.
You, Sir are an absolute legend. Thank you for this continued support and information, I really appreciate this.
I shall re-flash with the latest 'edge' release instead of the 'main', I shall run the 'boot' command as you've suggested, as well as monitor for 'mo ta' calls. And then hopefully if it reaches the stable, retrieve the crash debug logs.
And to possibly round this off now, all of the above seems to have 'stabilised' the system. I'm getting GPS again via the new antenna, (need to test the old one again) and GSM connection seems to also have settled.
Not so sure on the cause of the crashes, they still exist but to a lot less frequency. So for example, if i use the gui, then it is'semi-stable' if i try and ssh over wifi then it panics. I havent tested a wide range of things yet to fully know what works and what doesn't. But today my trips were recorded successfully, links on v2 and v3 servers were solid, ap/client mode worked solidly and it transferred successfully between wifi to GSM to wifi without issue.
It might be the change to the edge branch, or a confluence of things that you've suggested, but i'm happy to call this one done.
Thank you so very much for your assistance!
You're welcome.
If you find some connection/correlation to the TCP/IP task on further investigation, it's of course an option to raise the stack size.
Regards,
Michael